Inferring Fitness Effects from Time-Resolved Sequence Data with a Delay-Deterministic Model
Nené, Nuno R.; Dunham, Alistair S.; Illingworth, Christopher J. R.
2018-01-01
A common challenge arising from the observation of an evolutionary system over time is to infer the magnitude of selection acting upon a specific genetic variant, or variants, within the population. The inference of selection may be confounded by the effects of genetic drift in a system, leading to the development of inference procedures to account for these effects. However, recent work has suggested that deterministic models of evolution may be effective in capturing the effects of selection even under complex models of demography, suggesting the more general application of deterministic approaches to inference. Responding to this literature, we here note a case in which a deterministic model of evolution may give highly misleading inferences, resulting from the nondeterministic properties of mutation in a finite population. We propose an alternative approach that acts to correct for this error, and which we denote the delay-deterministic model. Applying our model to a simple evolutionary system, we demonstrate its performance in quantifying the extent of selection acting within that system. We further consider the application of our model to sequence data from an evolutionary experiment. We outline scenarios in which our model may produce improved results for the inference of selection, noting that such situations can be easily identified via the use of a regular deterministic model. PMID:29500183
Inferring Fitness Effects from Time-Resolved Sequence Data with a Delay-Deterministic Model.
Nené, Nuno R; Dunham, Alistair S; Illingworth, Christopher J R
2018-05-01
A common challenge arising from the observation of an evolutionary system over time is to infer the magnitude of selection acting upon a specific genetic variant, or variants, within the population. The inference of selection may be confounded by the effects of genetic drift in a system, leading to the development of inference procedures to account for these effects. However, recent work has suggested that deterministic models of evolution may be effective in capturing the effects of selection even under complex models of demography, suggesting the more general application of deterministic approaches to inference. Responding to this literature, we here note a case in which a deterministic model of evolution may give highly misleading inferences, resulting from the nondeterministic properties of mutation in a finite population. We propose an alternative approach that acts to correct for this error, and which we denote the delay-deterministic model. Applying our model to a simple evolutionary system, we demonstrate its performance in quantifying the extent of selection acting within that system. We further consider the application of our model to sequence data from an evolutionary experiment. We outline scenarios in which our model may produce improved results for the inference of selection, noting that such situations can be easily identified via the use of a regular deterministic model. Copyright © 2018 Nené et al.
Grobei, Monica A.; Qeli, Ermir; Brunner, Erich; Rehrauer, Hubert; Zhang, Runxuan; Roschitzki, Bernd; Basler, Konrad; Ahrens, Christian H.; Grossniklaus, Ueli
2009-01-01
Pollen, the male gametophyte of flowering plants, represents an ideal biological system to study developmental processes, such as cell polarity, tip growth, and morphogenesis. Upon hydration, the metabolically quiescent pollen rapidly switches to an active state, exhibiting extremely fast growth. This rapid switch requires relevant proteins to be stored in the mature pollen, where they have to retain functionality in a desiccated environment. Using a shotgun proteomics approach, we unambiguously identified ∼3500 proteins in Arabidopsis pollen, including 537 proteins that were not identified in genetic or transcriptomic studies. To generate this comprehensive reference data set, which extends the previously reported pollen proteome by a factor of 13, we developed a novel deterministic peptide classification scheme for protein inference. This generally applicable approach considers the gene model–protein sequence–protein accession relationships. It allowed us to classify and eliminate ambiguities inherently associated with any shotgun proteomics data set, to report a conservative list of protein identifications, and to seamlessly integrate data from previous transcriptomics studies. Manual validation of proteins unambiguously identified by a single, information-rich peptide enabled us to significantly reduce the false discovery rate, while keeping valuable identifications of shorter and lower abundant proteins. Bioinformatic analyses revealed a higher stability of pollen proteins compared to those of other tissues and implied a protein family of previously unknown function in vesicle trafficking. Interestingly, the pollen proteome is most similar to that of seeds, indicating physiological similarities between these developmentally distinct tissues. PMID:19546170
A grammar inference approach for predicting kinase specific phosphorylation sites.
Datta, Sutapa; Mukhopadhyay, Subhasis
2015-01-01
Kinase mediated phosphorylation site detection is the key mechanism of post translational mechanism that plays an important role in regulating various cellular processes and phenotypes. Many diseases, like cancer are related with the signaling defects which are associated with protein phosphorylation. Characterizing the protein kinases and their substrates enhances our ability to understand the mechanism of protein phosphorylation and extends our knowledge of signaling network; thereby helping us to treat such diseases. Experimental methods for predicting phosphorylation sites are labour intensive and expensive. Also, manifold increase of protein sequences in the databanks over the years necessitates the improvement of high speed and accurate computational methods for predicting phosphorylation sites in protein sequences. Till date, a number of computational methods have been proposed by various researchers in predicting phosphorylation sites, but there remains much scope of improvement. In this communication, we present a simple and novel method based on Grammatical Inference (GI) approach to automate the prediction of kinase specific phosphorylation sites. In this regard, we have used a popular GI algorithm Alergia to infer Deterministic Stochastic Finite State Automata (DSFA) which equally represents the regular grammar corresponding to the phosphorylation sites. Extensive experiments on several datasets generated by us reveal that, our inferred grammar successfully predicts phosphorylation sites in a kinase specific manner. It performs significantly better when compared with the other existing phosphorylation site prediction methods. We have also compared our inferred DSFA with two other GI inference algorithms. The DSFA generated by our method performs superior which indicates that our method is robust and has a potential for predicting the phosphorylation sites in a kinase specific manner.
Stochastic inference with spiking neurons in the high-conductance state
NASA Astrophysics Data System (ADS)
Petrovici, Mihai A.; Bill, Johannes; Bytschok, Ilja; Schemmel, Johannes; Meier, Karlheinz
2016-10-01
The highly variable dynamics of neocortical circuits observed in vivo have been hypothesized to represent a signature of ongoing stochastic inference but stand in apparent contrast to the deterministic response of neurons measured in vitro. Based on a propagation of the membrane autocorrelation across spike bursts, we provide an analytical derivation of the neural activation function that holds for a large parameter space, including the high-conductance state. On this basis, we show how an ensemble of leaky integrate-and-fire neurons with conductance-based synapses embedded in a spiking environment can attain the correct firing statistics for sampling from a well-defined target distribution. For recurrent networks, we examine convergence toward stationarity in computer simulations and demonstrate sample-based Bayesian inference in a mixed graphical model. This points to a new computational role of high-conductance states and establishes a rigorous link between deterministic neuron models and functional stochastic dynamics on the network level.
Roles of factorial noise in inducing bimodal gene expression
NASA Astrophysics Data System (ADS)
Liu, Peijiang; Yuan, Zhanjiang; Huang, Lifang; Zhou, Tianshou
2015-06-01
Some gene regulatory systems can exhibit bimodal distributions of mRNA or protein although the deterministic counterparts are monostable. This noise-induced bimodality is an interesting phenomenon and has important biological implications, but it is unclear how different sources of expression noise (each source creates so-called factorial noise that is defined as a component of the total noise) contribute separately to this stochastic bimodality. Here we consider a minimal model of gene regulation, which is monostable in the deterministic case. Although simple, this system contains factorial noise of two main kinds: promoter noise due to switching between gene states and transcriptional (or translational) noise due to synthesis and degradation of mRNA (or protein). To better trace the roles of factorial noise in inducing bimodality, we also analyze two limit models, continuous and adiabatic approximations, apart from the exact model. We show that in the case of slow gene switching, the continuous model where only promoter noise is considered can exhibit bimodality; in the case of fast switching, the adiabatic model where only transcriptional or translational noise is considered can also exhibit bimodality but the exact model cannot; and in other cases, both promoter noise and transcriptional or translational noise can cooperatively induce bimodality. Since slow gene switching and large protein copy numbers are characteristics of eukaryotic cells, whereas fast gene switching and small protein copy numbers are characteristics of prokaryotic cells, we infer that eukaryotic stochastic bimodality is induced mainly by promoter noise, whereas prokaryotic stochastic bimodality is induced primarily by transcriptional or translational noise.
Theory and applications of a deterministic approximation to the coalescent model
Jewett, Ethan M.; Rosenberg, Noah A.
2014-01-01
Under the coalescent model, the random number nt of lineages ancestral to a sample is nearly deterministic as a function of time when nt is moderate to large in value, and it is well approximated by its expectation E[nt]. In turn, this expectation is well approximated by simple deterministic functions that are easy to compute. Such deterministic functions have been applied to estimate allele age, effective population size, and genetic diversity, and they have been used to study properties of models of infectious disease dynamics. Although a number of simple approximations of E[nt] have been derived and applied to problems of population-genetic inference, the theoretical accuracy of the formulas and the inferences obtained using these approximations is not known, and the range of problems to which they can be applied is not well understood. Here, we demonstrate general procedures by which the approximation nt ≈ E[nt] can be used to reduce the computational complexity of coalescent formulas, and we show that the resulting approximations converge to their true values under simple assumptions. Such approximations provide alternatives to exact formulas that are computationally intractable or numerically unstable when the number of sampled lineages is moderate or large. We also extend an existing class of approximations of E[nt] to the case of multiple populations of time-varying size with migration among them. Our results facilitate the use of the deterministic approximation nt ≈ E[nt] for deriving functionally simple, computationally efficient, and numerically stable approximations of coalescent formulas under complicated demographic scenarios. PMID:24412419
Dini-Andreote, Francisco; Stegen, James C.; van Elsas, Jan D.; ...
2015-03-17
Despite growing recognition that deterministic and stochastic factors simultaneously influence bacterial communities, little is known about mechanisms shifting their relative importance. To better understand underlying mechanisms, we developed a conceptual model linking ecosystem development during primary succession to shifts in the stochastic/deterministic balance. To evaluate the conceptual model we coupled spatiotemporal data on soil bacterial communities with environmental conditions spanning 105 years of salt marsh development. At the local scale there was a progression from stochasticity to determinism due to Na accumulation with increasing ecosystem age, supporting a main element of the conceptual model. At the regional-scale, soil organic mattermore » (SOM) governed the relative influence of stochasticity and the type of deterministic ecological selection, suggesting scale-dependency in how deterministic ecological selection is imposed. Analysis of a new ecological simulation model supported these conceptual inferences. Looking forward, we propose an extended conceptual model that integrates primary and secondary succession in microbial systems.« less
Deng, Wei (Sophia); Sloutsky, Vladimir M.
2015-01-01
Does category representation change in the course of development? And if so, how and why? The current study attempted to answer these questions by examining category learning and category representation. In Experiment 1, 4-year-olds, 6-year-olds, and adults were trained with either a classification task or an inference task and their categorization performance and memory for items were tested. Adults and 6-year-olds exhibited an important asymmetry: they relied on a single deterministic feature during classification training, but not during inference training. In contrast, regardless of the training condition, 4-year-olds relied on multiple probabilistic features. In Experiment 2, 4-year-olds were presented with classification training and their attention was explicitly directed to the deterministic feature. Under this condition, their categorization performance was similar to that of older participants in Experiment 1, yet their memory performance pointed to a similarity-based representation, which was similar to that of 4-year-olds in Experiment 1. These results are discussed in relation to theories of categorization and the role of selective attention in the development of category learning. PMID:25602938
A stochastic model for correlated protein motions
NASA Astrophysics Data System (ADS)
Karain, Wael I.; Qaraeen, Nael I.; Ajarmah, Basem
2006-06-01
A one-dimensional Langevin-type stochastic difference equation is used to find the deterministic and Gaussian contributions of time series representing the projections of a Bovine Pancreatic Trypsin Inhibitor (BPTI) protein molecular dynamics simulation along different eigenvector directions determined using principal component analysis. The deterministic part shows a distinct nonlinear behavior only for eigenvectors contributing significantly to the collective protein motion.
NASA Astrophysics Data System (ADS)
Reeves, Mark
2014-03-01
Entropy changes underlie the physics that dominates biological interactions. Indeed, introductory biology courses often begin with an exploration of the qualities of water that are important to living systems. However, one idea that is not explicitly addressed in most introductory physics or biology textbooks is dominant contribution of the entropy in driving important biological processes towards equilibrium. From diffusion to cell-membrane formation, to electrostatic binding in protein folding, to the functioning of nerve cells, entropic effects often act to counterbalance deterministic forces such as electrostatic attraction and in so doing, allow for effective molecular signaling. A small group of biology, biophysics and computer science faculty have worked together for the past five years to develop curricular modules (based on SCALEUP pedagogy) that enable students to create models of stochastic and deterministic processes. Our students are first-year engineering and science students in the calculus-based physics course and they are not expected to know biology beyond the high-school level. In our class, they learn to reduce seemingly complex biological processes and structures to be described by tractable models that include deterministic processes and simple probabilistic inference. The students test these models in simulations and in laboratory experiments that are biologically relevant. The students are challenged to bridge the gap between statistical parameterization of their data (mean and standard deviation) and simple model-building by inference. This allows the students to quantitatively describe realistic cellular processes such as diffusion, ionic transport, and ligand-receptor binding. Moreover, the students confront ``random'' forces and traditional forces in problems, simulations, and in laboratory exploration throughout the year-long course as they move from traditional kinematics through thermodynamics to electrostatic interactions. This talk will present a number of these exercises, with particular focus on the hands-on experiments done by the students, and will give examples of the tangible material that our students work with throughout the two-semester sequence of their course on introductory physics with a bio focus. Supported by NSF DUE.
Deng, Wei Sophia; Sloutsky, Vladimir M
2015-03-01
Does category representation change in the course of development? And if so, how and why? The current study attempted to answer these questions by examining category learning and category representation. In Experiment 1, 4-year-olds, 6-year-olds, and adults were trained with either a classification task or an inference task and their categorization performance and memory for items were tested. Adults and 6-year-olds exhibited an important asymmetry: they relied on a single deterministic feature during classification training, but not during inference training. In contrast, regardless of the training condition, 4-year-olds relied on multiple probabilistic features. In Experiment 2, 4-year-olds were presented with classification training and their attention was explicitly directed to the deterministic feature. Under this condition, their categorization performance was similar to that of older participants in Experiment 1, yet their memory performance pointed to a similarity-based representation, which was similar to that of 4-year-olds in Experiment 1. These results are discussed in relation to theories of categorization and the role of selective attention in the development of category learning. (PsycINFO Database Record (c) 2015 APA, all rights reserved).
From Weakly Chaotic Dynamics to Deterministic Subdiffusion via Copula Modeling
NASA Astrophysics Data System (ADS)
Nazé, Pierre
2018-03-01
Copula modeling consists in finding a probabilistic distribution, called copula, whereby its coupling with the marginal distributions of a set of random variables produces their joint distribution. The present work aims to use this technique to connect the statistical distributions of weakly chaotic dynamics and deterministic subdiffusion. More precisely, we decompose the jumps distribution of Geisel-Thomae map into a bivariate one and determine the marginal and copula distributions respectively by infinite ergodic theory and statistical inference techniques. We verify therefore that the characteristic tail distribution of subdiffusion is an extreme value copula coupling Mittag-Leffler distributions. We also present a method to calculate the exact copula and joint distributions in the case where weakly chaotic dynamics and deterministic subdiffusion statistical distributions are already known. Numerical simulations and consistency with the dynamical aspects of the map support our results.
Mi, Xiangcheng; Swenson, Nathan G; Jia, Qi; Rao, Mide; Feng, Gang; Ren, Haibao; Bebber, Daniel P; Ma, Keping
2016-09-07
Deterministic and stochastic processes jointly determine the community dynamics of forest succession. However, it has been widely held in previous studies that deterministic processes dominate forest succession. Furthermore, inference of mechanisms for community assembly may be misleading if based on a single axis of diversity alone. In this study, we evaluated the relative roles of deterministic and stochastic processes along a disturbance gradient by integrating species, functional, and phylogenetic beta diversity in a subtropical forest chronosequence in Southeastern China. We found a general pattern of increasing species turnover, but little-to-no change in phylogenetic and functional turnover over succession at two spatial scales. Meanwhile, the phylogenetic and functional beta diversity were not significantly different from random expectation. This result suggested a dominance of stochastic assembly, contrary to the general expectation that deterministic processes dominate forest succession. On the other hand, we found significant interactions of environment and disturbance and limited evidence for significant deviations of phylogenetic or functional turnover from random expectations for different size classes. This result provided weak evidence of deterministic processes over succession. Stochastic assembly of forest succession suggests that post-disturbance restoration may be largely unpredictable and difficult to control in subtropical forests.
Automated Calibration For Numerical Models Of Riverflow
NASA Astrophysics Data System (ADS)
Fernandez, Betsaida; Kopmann, Rebekka; Oladyshkin, Sergey
2017-04-01
Calibration of numerical models is fundamental since the beginning of all types of hydro system modeling, to approximate the parameters that can mimic the overall system behavior. Thus, an assessment of different deterministic and stochastic optimization methods is undertaken to compare their robustness, computational feasibility, and global search capacity. Also, the uncertainty of the most suitable methods is analyzed. These optimization methods minimize the objective function that comprises synthetic measurements and simulated data. Synthetic measurement data replace the observed data set to guarantee an existing parameter solution. The input data for the objective function derivate from a hydro-morphological dynamics numerical model which represents an 180-degree bend channel. The hydro- morphological numerical model shows a high level of ill-posedness in the mathematical problem. The minimization of the objective function by different candidate methods for optimization indicates a failure in some of the gradient-based methods as Newton Conjugated and BFGS. Others reveal partial convergence, such as Nelder-Mead, Polak und Ribieri, L-BFGS-B, Truncated Newton Conjugated, and Trust-Region Newton Conjugated Gradient. Further ones indicate parameter solutions that range outside the physical limits, such as Levenberg-Marquardt and LeastSquareRoot. Moreover, there is a significant computational demand for genetic optimization methods, such as Differential Evolution and Basin-Hopping, as well as for Brute Force methods. The Deterministic Sequential Least Square Programming and the scholastic Bayes Inference theory methods present the optimal optimization results. keywords: Automated calibration of hydro-morphological dynamic numerical model, Bayesian inference theory, deterministic optimization methods.
Streif, Stefan; Oesterhelt, Dieter; Marwan, Wolfgang
2010-03-18
Photo- and chemotaxis of the archaeon Halobacterium salinarum is based on the control of flagellar motor switching through stimulus-specific methyl-accepting transducer proteins that relay the sensory input signal to a two-component system. Certain members of the transducer family function as receptor proteins by directly sensing specific chemical or physical stimuli. Others interact with specific receptor proteins like the phototaxis photoreceptors sensory rhodopsin I and II, or require specific binding proteins as for example some chemotaxis transducers. Receptor activation by light or a change in receptor occupancy by chemical stimuli results in reversible methylation of glutamate residues of the transducer proteins. Both, methylation and demethylation reactions are involved in sensory adaptation and are modulated by the response regulator CheY. By mathematical modeling we infer the kinetic mechanisms of stimulus-induced transducer methylation and adaptation. The model (deterministic and in the form of ordinary differential equations) correctly predicts experimentally observed transducer demethylation (as detected by released methanol) in response to attractant and repellent stimuli of wildtype cells, a cheY deletion mutant, and a mutant in which the stimulated transducer species is methylation-deficient. We provide a kinetic model for signal processing in photo- and chemotaxis in the archaeon H. salinarum suggesting an essential role of receptor cooperativity, antagonistic reversible methylation, and a CheY-dependent feedback on transducer demethylation.
Boskova, Veronika; Bonhoeffer, Sebastian; Stadler, Tanja
2014-01-01
Quantifying epidemiological dynamics is crucial for understanding and forecasting the spread of an epidemic. The coalescent and the birth-death model are used interchangeably to infer epidemiological parameters from the genealogical relationships of the pathogen population under study, which in turn are inferred from the pathogen genetic sequencing data. To compare the performance of these widely applied models, we performed a simulation study. We simulated phylogenetic trees under the constant rate birth-death model and the coalescent model with a deterministic exponentially growing infected population. For each tree, we re-estimated the epidemiological parameters using both a birth-death and a coalescent based method, implemented as an MCMC procedure in BEAST v2.0. In our analyses that estimate the growth rate of an epidemic based on simulated birth-death trees, the point estimates such as the maximum a posteriori/maximum likelihood estimates are not very different. However, the estimates of uncertainty are very different. The birth-death model had a higher coverage than the coalescent model, i.e. contained the true value in the highest posterior density (HPD) interval more often (2–13% vs. 31–75% error). The coverage of the coalescent decreases with decreasing basic reproductive ratio and increasing sampling probability of infecteds. We hypothesize that the biases in the coalescent are due to the assumption of deterministic rather than stochastic population size changes. Both methods performed reasonably well when analyzing trees simulated under the coalescent. The methods can also identify other key epidemiological parameters as long as one of the parameters is fixed to its true value. In summary, when using genetic data to estimate epidemic dynamics, our results suggest that the birth-death method will be less sensitive to population fluctuations of early outbreaks than the coalescent method that assumes a deterministic exponentially growing infected population. PMID:25375100
Northern Hemisphere glaciation and the evolution of Plio-Pleistocene climate noise
NASA Astrophysics Data System (ADS)
Meyers, Stephen R.; Hinnov, Linda A.
2010-08-01
Deterministic orbital controls on climate variability are commonly inferred to dominate across timescales of 104-106 years, although some studies have suggested that stochastic processes may be of equal or greater importance. Here we explicitly quantify changes in deterministic orbital processes (forcing and/or pacing) versus stochastic climate processes during the Plio-Pleistocene, via time-frequency analysis of two prominent foraminifera oxygen isotopic stacks. Our results indicate that development of the Northern Hemisphere ice sheet is paralleled by an overall amplification of both deterministic and stochastic climate energy, but their relative dominance is variable. The progression from a more stochastic early Pliocene to a strongly deterministic late Pleistocene is primarily accommodated during two transitory phases of Northern Hemisphere ice sheet growth. This long-term trend is punctuated by “stochastic events,” which we interpret as evidence for abrupt reorganization of the climate system at the initiation and termination of the mid-Pleistocene transition and at the onset of Northern Hemisphere glaciation. In addition to highlighting a complex interplay between deterministic and stochastic climate change during the Plio-Pleistocene, our results support an early onset for Northern Hemisphere glaciation (between 3.5 and 3.7 Ma) and reveal some new characteristics of the orbital signal response, such as the puzzling emergence of 100 ka and 400 ka cyclic climate variability during theoretical eccentricity nodes.
ERIC Educational Resources Information Center
Denison, Stephanie; Trikutam, Pallavi; Xu, Fei
2014-01-01
A rich tradition in developmental psychology explores physical reasoning in infancy. However, no research to date has investigated whether infants can reason about physical objects that behave probabilistically, rather than deterministically. Physical events are often quite variable, in that similar-looking objects can be placed in similar…
Zbilut, Joseph P.; Colosimo, Alfredo; Conti, Filippo; Colafranceschi, Mauro; Manetti, Cesare; Valerio, MariaCristina; Webber, Charles L.; Giuliani, Alessandro
2003-01-01
The problem of protein folding vs. aggregation was investigated in acylphosphatase and the amyloid protein Aβ(1–40) by means of nonlinear signal analysis of their chain hydrophobicity. Numerical descriptors of recurrence patterns provided the basis for statistical evaluation of folding/aggregation distinctive features. Static and dynamic approaches were used to elucidate conditions coincident with folding vs. aggregation using comparisons with known protein secondary structure classifications, site-directed mutagenesis studies of acylphosphatase, and molecular dynamics simulations of amyloid protein, Aβ(1–40). The results suggest that a feature derived from principal component space characterized by the smoothness of singular, deterministic hydrophobicity patches plays a significant role in the conditions governing protein aggregation. PMID:14645049
Win-Stay, Lose-Sample: a simple sequential algorithm for approximating Bayesian inference.
Bonawitz, Elizabeth; Denison, Stephanie; Gopnik, Alison; Griffiths, Thomas L
2014-11-01
People can behave in a way that is consistent with Bayesian models of cognition, despite the fact that performing exact Bayesian inference is computationally challenging. What algorithms could people be using to make this possible? We show that a simple sequential algorithm "Win-Stay, Lose-Sample", inspired by the Win-Stay, Lose-Shift (WSLS) principle, can be used to approximate Bayesian inference. We investigate the behavior of adults and preschoolers on two causal learning tasks to test whether people might use a similar algorithm. These studies use a "mini-microgenetic method", investigating how people sequentially update their beliefs as they encounter new evidence. Experiment 1 investigates a deterministic causal learning scenario and Experiments 2 and 3 examine how people make inferences in a stochastic scenario. The behavior of adults and preschoolers in these experiments is consistent with our Bayesian version of the WSLS principle. This algorithm provides both a practical method for performing Bayesian inference and a new way to understand people's judgments. Copyright © 2014 Elsevier Inc. All rights reserved.
Hartmann, Christoph; Lazar, Andreea; Nessler, Bernhard; Triesch, Jochen
2015-01-01
Even in the absence of sensory stimulation the brain is spontaneously active. This background “noise” seems to be the dominant cause of the notoriously high trial-to-trial variability of neural recordings. Recent experimental observations have extended our knowledge of trial-to-trial variability and spontaneous activity in several directions: 1. Trial-to-trial variability systematically decreases following the onset of a sensory stimulus or the start of a motor act. 2. Spontaneous activity states in sensory cortex outline the region of evoked sensory responses. 3. Across development, spontaneous activity aligns itself with typical evoked activity patterns. 4. The spontaneous brain activity prior to the presentation of an ambiguous stimulus predicts how the stimulus will be interpreted. At present it is unclear how these observations relate to each other and how they arise in cortical circuits. Here we demonstrate that all of these phenomena can be accounted for by a deterministic self-organizing recurrent neural network model (SORN), which learns a predictive model of its sensory environment. The SORN comprises recurrently coupled populations of excitatory and inhibitory threshold units and learns via a combination of spike-timing dependent plasticity (STDP) and homeostatic plasticity mechanisms. Similar to balanced network architectures, units in the network show irregular activity and variable responses to inputs. Additionally, however, the SORN exhibits sequence learning abilities matching recent findings from visual cortex and the network’s spontaneous activity reproduces the experimental findings mentioned above. Intriguingly, the network’s behaviour is reminiscent of sampling-based probabilistic inference, suggesting that correlates of sampling-based inference can develop from the interaction of STDP and homeostasis in deterministic networks. We conclude that key observations on spontaneous brain activity and the variability of neural responses can be accounted for by a simple deterministic recurrent neural network which learns a predictive model of its sensory environment via a combination of generic neural plasticity mechanisms. PMID:26714277
Kim, Sung-Cheol; Wunsch, Benjamin H; Hu, Huan; Smith, Joshua T; Austin, Robert H; Stolovitzky, Gustavo
2017-06-27
Deterministic lateral displacement (DLD) is a technique for size fractionation of particles in continuous flow that has shown great potential for biological applications. Several theoretical models have been proposed, but experimental evidence has demonstrated that a rich class of intermediate migration behavior exists, which is not predicted. We present a unified theoretical framework to infer the path of particles in the whole array on the basis of trajectories in a unit cell. This framework explains many of the unexpected particle trajectories reported and can be used to design arrays for even nanoscale particle fractionation. We performed experiments that verify these predictions and used our model to develop a condenser array that achieves full particle separation with a single fluidic input.
NASA Astrophysics Data System (ADS)
Montecinos, Alejandra; Davis, Sergio; Peralta, Joaquín
2018-07-01
The kinematics and dynamics of deterministic physical systems have been a foundation of our understanding of the world since Galileo and Newton. For real systems, however, uncertainty is largely present via external forces such as friction or lack of precise knowledge about the initial conditions of the system. In this work we focus on the latter case and describe the use of inference methodologies in solving the statistical properties of classical systems subject to uncertain initial conditions. In particular we describe the application of the formalism of maximum entropy (MaxEnt) inference to the problem of projectile motion, given information about the average horizontal range over many realizations. By using MaxEnt we can invert the problem and use the provided information on the average range to reduce the original uncertainty in the initial conditions. Also, additional insight into the initial condition's probabilities, and the projectile path distribution itself, can be achieved based on the value of the average horizontal range. The wide applicability of this procedure, as well as its ease of use, reveals a useful tool with which to revisit a large number of physics problems, from classrooms to frontier research.
Spatial scaling patterns and functional redundancies in a changing boreal lake landscape
Angeler, David G.; Allen, Craig R.; Uden, Daniel R.; Johnson, Richard K.
2015-01-01
Global transformations extend beyond local habitats; therefore, larger-scale approaches are needed to assess community-level responses and resilience to unfolding environmental changes. Using longterm data (1996–2011), we evaluated spatial patterns and functional redundancies in the littoral invertebrate communities of 85 Swedish lakes, with the objective of assessing their potential resilience to environmental change at regional scales (that is, spatial resilience). Multivariate spatial modeling was used to differentiate groups of invertebrate species exhibiting spatial patterns in composition and abundance (that is, deterministic species) from those lacking spatial patterns (that is, stochastic species). We then determined the functional feeding attributes of the deterministic and stochastic invertebrate species, to infer resilience. Between one and three distinct spatial patterns in invertebrate composition and abundance were identified in approximately one-third of the species; the remainder were stochastic. We observed substantial differences in metrics between deterministic and stochastic species. Functional richness and diversity decreased over time in the deterministic group, suggesting a loss of resilience in regional invertebrate communities. However, taxon richness and redundancy increased monotonically in the stochastic group, indicating the capacity of regional invertebrate communities to adapt to change. Our results suggest that a refined picture of spatial resilience emerges if patterns of both the deterministic and stochastic species are accounted for. Spatially extensive monitoring may help increase our mechanistic understanding of community-level responses and resilience to regional environmental change, insights that are critical for developing management and conservation agendas in this current period of rapid environmental transformation.
Sex differences in the brain: implications for explaining autism.
Baron-Cohen, Simon; Knickmeyer, Rebecca C; Belmonte, Matthew K
2005-11-04
Empathizing is the capacity to predict and to respond to the behavior of agents (usually people) by inferring their mental states and responding to these with an appropriate emotion. Systemizing is the capacity to predict and to respond to the behavior of nonagentive deterministic systems by analyzing input-operation-output relations and inferring the rules that govern such systems. At a population level, females are stronger empathizers and males are stronger systemizers. The "extreme male brain" theory posits that autism represents an extreme of the male pattern (impaired empathizing and enhanced systemizing). Here we suggest that specific aspects of autistic neuroanatomy may also be extremes of typical male neuroanatomy.
Soil pH mediates the balance between stochastic and deterministic assembly of bacteria
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tripathi, Binu M.; Stegen, James C.; Kim, Mincheol
Little is known about the factors affecting the relative influence of stochastic and deterministic processes that governs the assembly of microbial communities in successional soils. Here, we conducted a meta-analysis of bacterial communities using six different successional soils data sets, scattered across different regions, with different pH conditions in early and late successional soils. We found that soil pH was the best predictor of bacterial community assembly and the relative importance of stochastic and deterministic processes along successional soils. Extreme acidic or alkaline pH conditions lead to assembly of phylogenetically more clustered bacterial communities through deterministic processes, whereas pH conditionsmore » close to neutral lead to phylogenetically less clustered bacterial communities with more stochasticity. We suggest that the influence of pH, rather than successional age, is the main driving force in producing trends in phylogenetic assembly of bacteria, and that pH also influences the relative balance of stochastic and deterministic processes along successional soils. Given that pH had a much stronger association with community assembly than did successional age, we evaluated whether the inferred influence of pH was maintained when studying globally-distributed samples collected without regard for successional age. This dataset confirmed the strong influence of pH, suggesting that the influence of soil pH on community assembly processes occurs globally. Extreme pH conditions likely exert more stringent limits on survival and fitness, imposing strong selective pressures through ecological and evolutionary time. Taken together, these findings suggest that the degree to which stochastic vs. deterministic processes shape soil bacterial community assembly is a consequence of soil pH rather than successional age.« less
Generic comparison of protein inference engines.
Claassen, Manfred; Reiter, Lukas; Hengartner, Michael O; Buhmann, Joachim M; Aebersold, Ruedi
2012-04-01
Protein identifications, instead of peptide-spectrum matches, constitute the biologically relevant result of shotgun proteomics studies. How to appropriately infer and report protein identifications has triggered a still ongoing debate. This debate has so far suffered from the lack of appropriate performance measures that allow us to objectively assess protein inference approaches. This study describes an intuitive, generic and yet formal performance measure and demonstrates how it enables experimentalists to select an optimal protein inference strategy for a given collection of fragment ion spectra. We applied the performance measure to systematically explore the benefit of excluding possibly unreliable protein identifications, such as single-hit wonders. Therefore, we defined a family of protein inference engines by extending a simple inference engine by thousands of pruning variants, each excluding a different specified set of possibly unreliable identifications. We benchmarked these protein inference engines on several data sets representing different proteomes and mass spectrometry platforms. Optimally performing inference engines retained all high confidence spectral evidence, without posterior exclusion of any type of protein identifications. Despite the diversity of studied data sets consistently supporting this rule, other data sets might behave differently. In order to ensure maximal reliable proteome coverage for data sets arising in other studies we advocate abstaining from rigid protein inference rules, such as exclusion of single-hit wonders, and instead consider several protein inference approaches and assess these with respect to the presented performance measure in the specific application context.
Kim, Sung-Cheol; Wunsch, Benjamin H.; Hu, Huan; Smith, Joshua T.; Stolovitzky, Gustavo
2017-01-01
Deterministic lateral displacement (DLD) is a technique for size fractionation of particles in continuous flow that has shown great potential for biological applications. Several theoretical models have been proposed, but experimental evidence has demonstrated that a rich class of intermediate migration behavior exists, which is not predicted. We present a unified theoretical framework to infer the path of particles in the whole array on the basis of trajectories in a unit cell. This framework explains many of the unexpected particle trajectories reported and can be used to design arrays for even nanoscale particle fractionation. We performed experiments that verify these predictions and used our model to develop a condenser array that achieves full particle separation with a single fluidic input. PMID:28607075
Deep Unfolding for Topic Models.
Chien, Jen-Tzung; Lee, Chao-Hsi
2018-02-01
Deep unfolding provides an approach to integrate the probabilistic generative models and the deterministic neural networks. Such an approach is benefited by deep representation, easy interpretation, flexible learning and stochastic modeling. This study develops the unsupervised and supervised learning of deep unfolded topic models for document representation and classification. Conventionally, the unsupervised and supervised topic models are inferred via the variational inference algorithm where the model parameters are estimated by maximizing the lower bound of logarithm of marginal likelihood using input documents without and with class labels, respectively. The representation capability or classification accuracy is constrained by the variational lower bound and the tied model parameters across inference procedure. This paper aims to relax these constraints by directly maximizing the end performance criterion and continuously untying the parameters in learning process via deep unfolding inference (DUI). The inference procedure is treated as the layer-wise learning in a deep neural network. The end performance is iteratively improved by using the estimated topic parameters according to the exponentiated updates. Deep learning of topic models is therefore implemented through a back-propagation procedure. Experimental results show the merits of DUI with increasing number of layers compared with variational inference in unsupervised as well as supervised topic models.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Graham, Emily B.; Crump, Alex R.; Resch, Charles T.
2017-03-28
Subsurface zones of groundwater and surface water mixing (hyporheic zones) are regions of enhanced rates of biogeochemical cycling, yet ecological processes governing hyporheic microbiome composition and function through space and time remain unknown. We sampled attached and planktonic microbiomes in the Columbia River hyporheic zone across seasonal hydrologic change, and employed statistical null models to infer mechanisms generating temporal changes in microbiomes within three hydrologically-connected, physicochemically-distinct geographic zones (inland, nearshore, river). We reveal that microbiomes remain dissimilar through time across all zones and habitat types (attached vs. planktonic) and that deterministic assembly processes regulate microbiome composition in all data subsets.more » The consistent presence of heterotrophic taxa and members of the Planctomycetes-Verrucomicrobia-Chlamydiae (PVC) superphylum nonetheless suggests common selective pressures for physiologies represented in these groups. Further, co-occurrence networks were used to provide insight into taxa most affected by deterministic assembly processes. We identified network clusters to represent groups of organisms that correlated with seasonal and physicochemical change. Extended network analyses identified keystone taxa within each cluster that we propose are central in microbiome composition and function. Finally, the abundance of one network cluster of nearshore organisms exhibited a seasonal shift from heterotrophic to autotrophic metabolisms and correlated with microbial metabolism, possibly indicating an ecological role for these organisms as foundational species in driving biogeochemical reactions within the hyporheic zone. Taken together, our research demonstrates a predominant role for deterministic assembly across highly-connected environments and provides insight into niche dynamics associated with seasonal changes in hyporheic microbiome composition and metabolism.« less
Audain, Enrique; Uszkoreit, Julian; Sachsenberg, Timo; Pfeuffer, Julianus; Liang, Xiao; Hermjakob, Henning; Sanchez, Aniel; Eisenacher, Martin; Reinert, Knut; Tabb, David L; Kohlbacher, Oliver; Perez-Riverol, Yasset
2017-01-06
In mass spectrometry-based shotgun proteomics, protein identifications are usually the desired result. However, most of the analytical methods are based on the identification of reliable peptides and not the direct identification of intact proteins. Thus, assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is a critical step in proteomics research. Currently, different protein inference algorithms and tools are available for the proteomics community. Here, we evaluated five software tools for protein inference (PIA, ProteinProphet, Fido, ProteinLP, MSBayesPro) using three popular database search engines: Mascot, X!Tandem, and MS-GF+. All the algorithms were evaluated using a highly customizable KNIME workflow using four different public datasets with varying complexities (different sample preparation, species and analytical instruments). We defined a set of quality control metrics to evaluate the performance of each combination of search engines, protein inference algorithm, and parameters on each dataset. We show that the results for complex samples vary not only regarding the actual numbers of reported protein groups but also concerning the actual composition of groups. Furthermore, the robustness of reported proteins when using databases of differing complexities is strongly dependant on the applied inference algorithm. Finally, merging the identifications of multiple search engines does not necessarily increase the number of reported proteins, but does increase the number of peptides per protein and thus can generally be recommended. Protein inference is one of the major challenges in MS-based proteomics nowadays. Currently, there are a vast number of protein inference algorithms and implementations available for the proteomics community. Protein assembly impacts in the final results of the research, the quantitation values and the final claims in the research manuscript. Even though protein inference is a crucial step in proteomics data analysis, a comprehensive evaluation of the many different inference methods has never been performed. Previously Journal of proteomics has published multiple studies about other benchmark of bioinformatics algorithms (PMID: 26585461; PMID: 22728601) in proteomics studies making clear the importance of those studies for the proteomics community and the journal audience. This manuscript presents a new bioinformatics solution based on the KNIME/OpenMS platform that aims at providing a fair comparison of protein inference algorithms (https://github.com/KNIME-OMICS). Six different algorithms - ProteinProphet, MSBayesPro, ProteinLP, Fido and PIA- were evaluated using the highly customizable workflow on four public datasets with varying complexities. Five popular database search engines Mascot, X!Tandem, MS-GF+ and combinations thereof were evaluated for every protein inference tool. In total >186 proteins lists were analyzed and carefully compare using three metrics for quality assessments of the protein inference results: 1) the numbers of reported proteins, 2) peptides per protein, and the 3) number of uniquely reported proteins per inference method, to address the quality of each inference method. We also examined how many proteins were reported by choosing each combination of search engines, protein inference algorithms and parameters on each dataset. The results show that using 1) PIA or Fido seems to be a good choice when studying the results of the analyzed workflow, regarding not only the reported proteins and the high-quality identifications, but also the required runtime. 2) Merging the identifications of multiple search engines gives almost always more confident results and increases the number of peptides per protein group. 3) The usage of databases containing not only the canonical, but also known isoforms of proteins has a small impact on the number of reported proteins. The detection of specific isoforms could, concerning the question behind the study, compensate for slightly shorter reports using the parsimonious reports. 4) The current workflow can be easily extended to support new algorithms and search engine combinations. Copyright © 2016. Published by Elsevier B.V.
Dopamine reward prediction errors reflect hidden state inference across time
Starkweather, Clara Kwon; Babayan, Benedicte M.; Uchida, Naoshige; Gershman, Samuel J.
2017-01-01
Midbrain dopamine neurons signal reward prediction error (RPE), or actual minus expected reward. The temporal difference (TD) learning model has been a cornerstone in understanding how dopamine RPEs could drive associative learning. Classically, TD learning imparts value to features that serially track elapsed time relative to observable stimuli. In the real world, however, sensory stimuli provide ambiguous information about the hidden state of the environment, leading to the proposal that TD learning might instead compute a value signal based on an inferred distribution of hidden states (a ‘belief state’). In this work, we asked whether dopaminergic signaling supports a TD learning framework that operates over hidden states. We found that dopamine signaling exhibited a striking difference between two tasks that differed only with respect to whether reward was delivered deterministically. Our results favor an associative learning rule that combines cached values with hidden state inference. PMID:28263301
Dopamine reward prediction errors reflect hidden-state inference across time.
Starkweather, Clara Kwon; Babayan, Benedicte M; Uchida, Naoshige; Gershman, Samuel J
2017-04-01
Midbrain dopamine neurons signal reward prediction error (RPE), or actual minus expected reward. The temporal difference (TD) learning model has been a cornerstone in understanding how dopamine RPEs could drive associative learning. Classically, TD learning imparts value to features that serially track elapsed time relative to observable stimuli. In the real world, however, sensory stimuli provide ambiguous information about the hidden state of the environment, leading to the proposal that TD learning might instead compute a value signal based on an inferred distribution of hidden states (a 'belief state'). Here we asked whether dopaminergic signaling supports a TD learning framework that operates over hidden states. We found that dopamine signaling showed a notable difference between two tasks that differed only with respect to whether reward was delivered in a deterministic manner. Our results favor an associative learning rule that combines cached values with hidden-state inference.
Mistaking geography for biology: inferring processes from species distributions.
Warren, Dan L; Cardillo, Marcel; Rosauer, Dan F; Bolnick, Daniel I
2014-10-01
Over the past few decades, there has been a rapid proliferation of statistical methods that infer evolutionary and ecological processes from data on species distributions. These methods have led to considerable new insights, but they often fail to account for the effects of historical biogeography on present-day species distributions. Because the geography of speciation can lead to patterns of spatial and temporal autocorrelation in the distributions of species within a clade, this can result in misleading inferences about the importance of deterministic processes in generating spatial patterns of biodiversity. In this opinion article, we discuss ways in which patterns of species distributions driven by historical biogeography are often interpreted as evidence of particular evolutionary or ecological processes. We focus on three areas that are especially prone to such misinterpretations: community phylogenetics, environmental niche modelling, and analyses of beta diversity (compositional turnover of biodiversity). Crown Copyright © 2014. Published by Elsevier Ltd. All rights reserved.
A Protein Standard That Emulates Homology for the Characterization of Protein Inference Algorithms.
The, Matthew; Edfors, Fredrik; Perez-Riverol, Yasset; Payne, Samuel H; Hoopmann, Michael R; Palmblad, Magnus; Forsström, Björn; Käll, Lukas
2018-05-04
A natural way to benchmark the performance of an analytical experimental setup is to use samples of known composition and see to what degree one can correctly infer the content of such a sample from the data. For shotgun proteomics, one of the inherent problems of interpreting data is that the measured analytes are peptides and not the actual proteins themselves. As some proteins share proteolytic peptides, there might be more than one possible causative set of proteins resulting in a given set of peptides and there is a need for mechanisms that infer proteins from lists of detected peptides. A weakness of commercially available samples of known content is that they consist of proteins that are deliberately selected for producing tryptic peptides that are unique to a single protein. Unfortunately, such samples do not expose any complications in protein inference. Hence, for a realistic benchmark of protein inference procedures, there is a need for samples of known content where the present proteins share peptides with known absent proteins. Here, we present such a standard, that is based on E. coli expressed human protein fragments. To illustrate the application of this standard, we benchmark a set of different protein inference procedures on the data. We observe that inference procedures excluding shared peptides provide more accurate estimates of errors compared to methods that include information from shared peptides, while still giving a reasonable performance in terms of the number of identified proteins. We also demonstrate that using a sample of known protein content without proteins with shared tryptic peptides can give a false sense of accuracy for many protein inference methods.
A linear programming model for protein inference problem in shotgun proteomics.
Huang, Ting; He, Zengyou
2012-11-15
Assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is an important issue in shotgun proteomics. The objective of protein inference is to find a subset of proteins that are truly present in the sample. Although many methods have been proposed for protein inference, several issues such as peptide degeneracy still remain unsolved. In this article, we present a linear programming model for protein inference. In this model, we use a transformation of the joint probability that each peptide/protein pair is present in the sample as the variable. Then, both the peptide probability and protein probability can be expressed as a formula in terms of the linear combination of these variables. Based on this simple fact, the protein inference problem is formulated as an optimization problem: minimize the number of proteins with non-zero probabilities under the constraint that the difference between the calculated peptide probability and the peptide probability generated from peptide identification algorithms should be less than some threshold. This model addresses the peptide degeneracy issue by forcing some joint probability variables involving degenerate peptides to be zero in a rigorous manner. The corresponding inference algorithm is named as ProteinLP. We test the performance of ProteinLP on six datasets. Experimental results show that our method is competitive with the state-of-the-art protein inference algorithms. The source code of our algorithm is available at: https://sourceforge.net/projects/prolp/. zyhe@dlut.edu.cn. Supplementary data are available at Bioinformatics Online.
Strasser, Michael; Theis, Fabian J.; Marr, Carsten
2012-01-01
A toggle switch consists of two genes that mutually repress each other. This regulatory motif is active during cell differentiation and is thought to act as a memory device, being able to choose and maintain cell fate decisions. Commonly, this switch has been modeled in a deterministic framework where transcription and translation are lumped together. In this description, bistability occurs for transcription factor cooperativity, whereas autoactivation leads to a tristable system with an additional undecided state. In this contribution, we study the stability and dynamics of a two-stage gene expression switch within a probabilistic framework inspired by the properties of the Pu/Gata toggle switch in myeloid progenitor cells. We focus on low mRNA numbers, high protein abundance, and monomeric transcription-factor binding. Contrary to the expectation from a deterministic description, this switch shows complex multiattractor dynamics without autoactivation and cooperativity. Most importantly, the four attractors of the system, which only emerge in a probabilistic two-stage description, can be identified with committed and primed states in cell differentiation. To begin, we study the dynamics of the system and infer the mechanisms that move the system between attractors using both the quasipotential and the probability flux of the system. Next, we show that the residence times of the system in one of the committed attractors are geometrically distributed. We derive an analytical expression for the parameter of the geometric distribution, therefore completely describing the statistics of the switching process and elucidate the influence of the system parameters on the residence time. Moreover, we find that the mean residence time increases linearly with the mean protein level. This scaling also holds for a one-stage scenario and for autoactivation. Finally, we study the implications of this distribution for the stability of a switch and discuss the influence of the stability on a specific cell differentiation mechanism. Our model explains lineage priming and proposes the need of either high protein numbers or long-term modifications such as chromatin remodeling to achieve stable cell fate decisions. Notably, we present a system with high protein abundance that nevertheless requires a probabilistic description to exhibit multistability, complex switching dynamics, and lineage priming. PMID:22225794
Faust: Flexible Acquistion and Understanding System for Text
2013-07-01
second version is still underway and it will continue in development as part of the DARPA DEFT program; it is written in Java and Clojure with MySQL and...SUTime, a Java library that recognizes and normalizes temporal expressions using deterministic patterns [101]. UIUC made another such framework... Java -based, large-scale inference engine called Tuffy. It leverages the full power of a relational optimizer in an RDBMS to perform the grounding of MLN
Protein and gene model inference based on statistical modeling in k-partite graphs.
Gerster, Sarah; Qeli, Ermir; Ahrens, Christian H; Bühlmann, Peter
2010-07-06
One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.
Structure-related statistical singularities along protein sequences: a correlation study.
Colafranceschi, Mauro; Colosimo, Alfredo; Zbilut, Joseph P; Uversky, Vladimir N; Giuliani, Alessandro
2005-01-01
A data set composed of 1141 proteins representative of all eukaryotic protein sequences in the Swiss-Prot Protein Knowledge base was coded by seven physicochemical properties of amino acid residues. The resulting numerical profiles were submitted to correlation analysis after the application of a linear (simple mean) and a nonlinear (Recurrence Quantification Analysis, RQA) filter. The main RQA variables, Recurrence and Determinism, were subsequently analyzed by Principal Component Analysis. The RQA descriptors showed that (i) within protein sequences is embedded specific information neither present in the codes nor in the amino acid composition and (ii) the most sensitive code for detecting ordered recurrent (deterministic) patterns of residues in protein sequences is the Miyazawa-Jernigan hydrophobicity scale. The most deterministic proteins in terms of autocorrelation properties of primary structures were found (i) to be involved in protein-protein and protein-DNA interactions and (ii) to display a significantly higher proportion of structural disorder with respect to the average data set. A study of the scaling behavior of the average determinism with the setting parameters of RQA (embedding dimension and radius) allows for the identification of patterns of minimal length (six residues) as possible markers of zones specifically prone to inter- and intramolecular interactions.
Substrate growth dynamics and biomineralization of an Ediacaran encrusting poriferan.
Wood, Rachel; Penny, Amelia
2018-01-10
The ability to encrust in order to secure and maintain growth on a substrate is a key competitive innovation in benthic metazoans. Here we describe the substrate growth dynamics, mode of biomineralization and possible affinity of Namapoikia rietoogensis , a large (up to 1 m), robustly skeletal, and modular Ediacaran metazoan which encrusted the walls of synsedimentary fissures within microbial-metazoan reefs. Namapoikia formed laminar or domal morphologies with an internal structure of open tubules and transverse elements, and had a very plastic, non-deterministic growth form which could encrust both fully lithified surfaces as well as living microbial substrates, the latter via modified skeletal holdfasts. Namapoikia shows complex growth interactions and substrate competition with contemporary living microbialites and thrombolites, including the production of plate-like dissepiments in response to microbial overgrowth which served to elevate soft tissue above the microbial surface. Namapoikia could also recover from partial mortality due to microbial fouling. We infer initial skeletal growth to have propagated via the rapid formation of an organic scaffold via a basal pinacoderm prior to calcification. This is likely an ancient mode of biomineralization with similarities to the living calcified demosponge Vaceletia. Namapoikia also shows inferred skeletal growth banding which, combined with its large size, implies notable individual longevity. In sum, Namapoikia was a large, relatively long-lived Ediacaran clonal skeletal metazoan that propagated via an organic scaffold prior to calcification, enabling rapid, effective and dynamic substrate occupation and competition in cryptic reef settings. The open tubular internal structure, highly flexible, non-deterministic skeletal organization, and inferred style of biomineralization of Namapoikia places probable affinity within total-group poriferans. © 2018 The Author(s).
Deterministic entanglement generation from driving through quantum phase transitions.
Luo, Xin-Yu; Zou, Yi-Quan; Wu, Ling-Na; Liu, Qi; Han, Ming-Fei; Tey, Meng Khoon; You, Li
2017-02-10
Many-body entanglement is often created through the system evolution, aided by nonlinear interactions between the constituting particles. These very dynamics, however, can also lead to fluctuations and degradation of the entanglement if the interactions cannot be controlled. Here, we demonstrate near-deterministic generation of an entangled twin-Fock condensate of ~11,000 atoms by driving a arubidium-87 Bose-Einstein condensate undergoing spin mixing through two consecutive quantum phase transitions (QPTs). We directly observe number squeezing of 10.7 ± 0.6 decibels and normalized collective spin length of 0.99 ± 0.01. Together, these observations allow us to infer an entanglement-enhanced phase sensitivity of ~6 decibels beyond the standard quantum limit and an entanglement breadth of ~910 atoms. Our work highlights the power of generating large-scale useful entanglement by taking advantage of the different entanglement landscapes separated by QPTs. Copyright © 2017, American Association for the Advancement of Science.
NASA Astrophysics Data System (ADS)
Hunziker, Jürg; Laloy, Eric; Linde, Niklas
2016-04-01
Deterministic inversion procedures can often explain field data, but they only deliver one final subsurface model that depends on the initial model and regularization constraints. This leads to poor insights about the uncertainties associated with the inferred model properties. In contrast, probabilistic inversions can provide an ensemble of model realizations that accurately span the range of possible models that honor the available calibration data and prior information allowing a quantitative description of model uncertainties. We reconsider the problem of inferring the dielectric permittivity (directly related to radar velocity) structure of the subsurface by inversion of first-arrival travel times from crosshole ground penetrating radar (GPR) measurements. We rely on the DREAM_(ZS) algorithm that is a state-of-the-art Markov chain Monte Carlo (MCMC) algorithm. Such algorithms need several orders of magnitude more forward simulations than deterministic algorithms and often become infeasible in high parameter dimensions. To enable high-resolution imaging with MCMC, we use a recently proposed dimensionality reduction approach that allows reproducing 2D multi-Gaussian fields with far fewer parameters than a classical grid discretization. We consider herein a dimensionality reduction from 5000 to 257 unknowns. The first 250 parameters correspond to a spectral representation of random and uncorrelated spatial fluctuations while the remaining seven geostatistical parameters are (1) the standard deviation of the data error, (2) the mean and (3) the variance of the relative electric permittivity, (4) the integral scale along the major axis of anisotropy, (5) the anisotropy angle, (6) the ratio of the integral scale along the minor axis of anisotropy to the integral scale along the major axis of anisotropy and (7) the shape parameter of the Matérn function. The latter essentially defines the type of covariance function (e.g., exponential, Whittle, Gaussian). We present an improved formulation of the dimensionality reduction, and numerically show how it reduces artifacts in the generated models and provides better posterior estimation of the subsurface geostatistical structure. We next show that the results of the method compare very favorably against previous deterministic and stochastic inversion results obtained at the South Oyster Bacterial Transport Site in Virginia, USA. The long-term goal of this work is to enable MCMC-based full waveform inversion of crosshole GPR data.
BagReg: Protein inference through machine learning.
Zhao, Can; Liu, Dao; Teng, Ben; He, Zengyou
2015-08-01
Protein inference from the identified peptides is of primary importance in the shotgun proteomics. The target of protein inference is to identify whether each candidate protein is truly present in the sample. To date, many computational methods have been proposed to solve this problem. However, there is still no method that can fully utilize the information hidden in the input data. In this article, we propose a learning-based method named BagReg for protein inference. The method firstly artificially extracts five features from the input data, and then chooses each feature as the class feature to separately build models to predict the presence probabilities of proteins. Finally, the weak results from five prediction models are aggregated to obtain the final result. We test our method on six public available data sets. The experimental results show that our method is superior to the state-of-the-art protein inference algorithms. Copyright © 2015 Elsevier Ltd. All rights reserved.
Deterministic Methods of Seismic Source Identification
1983-09-30
activity is implied by Figure 7 , compared to that inferred from Fig- ure 6 . We expect that the residual scatter, about the one to one slope line...side of the boundary, and in this case the general forms of the conservation laws expressed by (3). (4) and ( 6 ), or ( 6 ) and ( 7 ). are the appropriate...such as given in (8) and ( 7 ). to obtain an integral equation for the unknown alastodynamic displacement field in an elastic (or anelastic) medium. Such
Serang, Oliver; Noble, William Stafford
2012-01-01
The problem of identifying the proteins in a complex mixture using tandem mass spectrometry can be framed as an inference problem on a graph that connects peptides to proteins. Several existing protein identification methods make use of statistical inference methods for graphical models, including expectation maximization, Markov chain Monte Carlo, and full marginalization coupled with approximation heuristics. We show that, for this problem, the majority of the cost of inference usually comes from a few highly connected subgraphs. Furthermore, we evaluate three different statistical inference methods using a common graphical model, and we demonstrate that junction tree inference substantially improves rates of convergence compared to existing methods. The python code used for this paper is available at http://noble.gs.washington.edu/proj/fido. PMID:22331862
Bayes and blickets: Effects of knowledge on causal induction in children and adults
Griffiths, Thomas L.; Sobel, David M.; Tenenbaum, Joshua B.; Gopnik, Alison
2011-01-01
People are adept at inferring novel causal relations, even from only a few observations. Prior knowledge about the probability of encountering causal relations of various types and the nature of the mechanisms relating causes and effects plays a crucial role in these inferences. We test a formal account of how this knowledge can be used and acquired, based on analyzing causal induction as Bayesian inference. Five studies explored the predictions of this account with adults and 4-year-olds, using tasks in which participants learned about the causal properties of a set of objects. The studies varied the two factors that our Bayesian approach predicted should be relevant to causal induction: the prior probability with which causal relations exist, and the assumption of a deterministic or a probabilistic relation between cause and effect. Adults’ judgments (Experiments 1, 2, and 4) were in close correspondence with the quantitative predictions of the model, and children’s judgments (Experiments 3 and 5) agreed qualitatively with this account. PMID:21972897
Fast model updating coupling Bayesian inference and PGD model reduction
NASA Astrophysics Data System (ADS)
Rubio, Paul-Baptiste; Louf, François; Chamoin, Ludovic
2018-04-01
The paper focuses on a coupled Bayesian-Proper Generalized Decomposition (PGD) approach for the real-time identification and updating of numerical models. The purpose is to use the most general case of Bayesian inference theory in order to address inverse problems and to deal with different sources of uncertainties (measurement and model errors, stochastic parameters). In order to do so with a reasonable CPU cost, the idea is to replace the direct model called for Monte-Carlo sampling by a PGD reduced model, and in some cases directly compute the probability density functions from the obtained analytical formulation. This procedure is first applied to a welding control example with the updating of a deterministic parameter. In the second application, the identification of a stochastic parameter is studied through a glued assembly example.
Distinguishing signatures of determinism and stochasticity in spiking complex systems
Aragoneses, Andrés; Rubido, Nicolás; Tiana-Alsina, Jordi; Torrent, M. C.; Masoller, Cristina
2013-01-01
We describe a method to infer signatures of determinism and stochasticity in the sequence of apparently random intensity dropouts emitted by a semiconductor laser with optical feedback. The method uses ordinal time-series analysis to classify experimental data of inter-dropout-intervals (IDIs) in two categories that display statistically significant different features. Despite the apparent randomness of the dropout events, one IDI category is consistent with waiting times in a resting state until noise triggers a dropout, and the other is consistent with dropouts occurring during the return to the resting state, which have a clear deterministic component. The method we describe can be a powerful tool for inferring signatures of determinism in the dynamics of complex systems in noisy environments, at an event-level description of their dynamics.
NASA Astrophysics Data System (ADS)
Li, Fei; Subramanian, Kartik; Chen, Minghan; Tyson, John J.; Cao, Yang
2016-06-01
The asymmetric cell division cycle in Caulobacter crescentus is controlled by an elaborate molecular mechanism governing the production, activation and spatial localization of a host of interacting proteins. In previous work, we proposed a deterministic mathematical model for the spatiotemporal dynamics of six major regulatory proteins. In this paper, we study a stochastic version of the model, which takes into account molecular fluctuations of these regulatory proteins in space and time during early stages of the cell cycle of wild-type Caulobacter cells. We test the stochastic model with regard to experimental observations of increased variability of cycle time in cells depleted of the divJ gene product. The deterministic model predicts that overexpression of the divK gene blocks cell cycle progression in the stalked stage; however, stochastic simulations suggest that a small fraction of the mutants cells do complete the cell cycle normally.
Protein Inference from the Integration of Tandem MS Data and Interactome Networks.
Zhong, Jiancheng; Wang, Jianxing; Ding, Xiaojun; Zhang, Zhen; Li, Min; Wu, Fang-Xiang; Pan, Yi
2017-01-01
Since proteins are digested into a mixture of peptides in the preprocessing step of tandem mass spectrometry (MS), it is difficult to determine which specific protein a shared peptide belongs to. In recent studies, besides tandem MS data and peptide identification information, some other information is exploited to infer proteins. Different from the methods which first use only tandem MS data to infer proteins and then use network information to refine them, this study proposes a protein inference method named TMSIN, which uses interactome networks directly. As two interacting proteins should co-exist, it is reasonable to assume that if one of the interacting proteins is confidently inferred in a sample, its interacting partners should have a high probability in the same sample, too. Therefore, we can use the neighborhood information of a protein in an interactome network to adjust the probability that the shared peptide belongs to the protein. In TMSIN, a multi-weighted graph is constructed by incorporating the bipartite graph with interactome network information, where the bipartite graph is built with the peptide identification information. Based on multi-weighted graphs, TMSIN adopts an iterative workflow to infer proteins. At each iterative step, the probability that a shared peptide belongs to a specific protein is calculated by using the Bayes' law based on the neighbor protein support scores of each protein which are mapped by the shared peptides. We carried out experiments on yeast data and human data to evaluate the performance of TMSIN in terms of ROC, q-value, and accuracy. The experimental results show that AUC scores yielded by TMSIN are 0.742 and 0.874 in yeast dataset and human dataset, respectively, and TMSIN yields the maximum number of true positives when q-value less than or equal to 0.05. The overlap analysis shows that TMSIN is an effective complementary approach for protein inference.
Martinez, Alexander S.; Faist, Akasha M.
2016-01-01
Background Understanding patterns of biodiversity is a longstanding challenge in ecology. Similar to other biotic groups, arthropod community structure can be shaped by deterministic and stochastic processes, with limited understanding of what moderates the relative influence of these processes. Disturbances have been noted to alter the relative influence of deterministic and stochastic processes on community assembly in various study systems, implicating ecological disturbances as a potential moderator of these forces. Methods Using a disturbance gradient along a 5-year chronosequence of insect-induced tree mortality in a subalpine forest of the southern Rocky Mountains, Colorado, USA, we examined changes in community structure and relative influences of deterministic and stochastic processes in the assembly of aboveground (surface and litter-active species) and belowground (species active in organic and mineral soil layers) arthropod communities. Arthropods were sampled for all years of the chronosequence via pitfall traps (aboveground community) and modified Winkler funnels (belowground community) and sorted to morphospecies. Community structure of both communities were assessed via comparisons of morphospecies abundance, diversity, and composition. Assembly processes were inferred from a mixture of linear models and matrix correlations testing for community associations with environmental properties, and from null-deviation models comparing observed vs. expected levels of species turnover (Beta diversity) among samples. Results Tree mortality altered community structure in both aboveground and belowground arthropod communities, but null models suggested that aboveground communities experienced greater relative influences of deterministic processes, while the relative influence of stochastic processes increased for belowground communities. Additionally, Mantel tests and linear regression models revealed significant associations between the aboveground arthropod communities and vegetation and soil properties, but no significant association among belowground arthropod communities and environmental factors. Discussion Our results suggest context-dependent influences of stochastic and deterministic community assembly processes across different fractions of a spatially co-occurring ground-dwelling arthropod community following disturbance. This variation in assembly may be linked to contrasting ecological strategies and dispersal rates within above- and below-ground communities. Our findings add to a growing body of evidence indicating concurrent influences of stochastic and deterministic processes in community assembly, and highlight the need to consider potential variation across different fractions of biotic communities when testing community ecology theory and considering conservation strategies. PMID:27761333
Feinauer, Christoph; Procaccini, Andrea; Zecchina, Riccardo; Weigt, Martin; Pagnani, Andrea
2014-01-01
In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/code. PMID:24663061
Stochastic Community Assembly: Does It Matter in Microbial Ecology?
Zhou, Jizhong; Ning, Daliang
2017-12-01
Understanding the mechanisms controlling community diversity, functions, succession, and biogeography is a central, but poorly understood, topic in ecology, particularly in microbial ecology. Although stochastic processes are believed to play nonnegligible roles in shaping community structure, their importance relative to deterministic processes is hotly debated. The importance of ecological stochasticity in shaping microbial community structure is far less appreciated. Some of the main reasons for such heavy debates are the difficulty in defining stochasticity and the diverse methods used for delineating stochasticity. Here, we provide a critical review and synthesis of data from the most recent studies on stochastic community assembly in microbial ecology. We then describe both stochastic and deterministic components embedded in various ecological processes, including selection, dispersal, diversification, and drift. We also describe different approaches for inferring stochasticity from observational diversity patterns and highlight experimental approaches for delineating ecological stochasticity in microbial communities. In addition, we highlight research challenges, gaps, and future directions for microbial community assembly research. Copyright © 2017 American Society for Microbiology.
Topological chaos of the spatial prisoner's dilemma game on regular networks.
Jin, Weifeng; Chen, Fangyue
2016-02-21
The spatial version of evolutionary prisoner's dilemma on infinitely large regular lattice with purely deterministic strategies and no memories among players is investigated in this paper. Based on the statistical inferences, it is pertinent to confirm that the frequency of cooperation for characterizing its macroscopic behaviors is very sensitive to the initial conditions, which is the most practically significant property of chaos. Its intrinsic complexity is then justified on firm ground from the theory of symbolic dynamics; that is, this game is topologically mixing and possesses positive topological entropy on its subsystems. It is demonstrated therefore that its frequency of cooperation could not be adopted by simply averaging over several steps after the game reaches the equilibrium state. Furthermore, the chaotically changing spatial patterns via empirical observations can be defined and justified in view of symbolic dynamics. It is worth mentioning that the procedure proposed in this work is also applicable to other deterministic spatial evolutionary games therein. Copyright © 2015 Elsevier Ltd. All rights reserved.
Jewett, Ethan M.; Steinrücken, Matthias; Song, Yun S.
2016-01-01
Many approaches have been developed for inferring selection coefficients from time series data while accounting for genetic drift. These approaches have been motivated by the intuition that properly accounting for the population size history can significantly improve estimates of selective strengths. However, the improvement in inference accuracy that can be attained by modeling drift has not been characterized. Here, by comparing maximum likelihood estimates of selection coefficients that account for the true population size history with estimates that ignore drift by assuming allele frequencies evolve deterministically in a population of infinite size, we address the following questions: how much can modeling the population size history improve estimates of selection coefficients? How much can mis-inferred population sizes hurt inferences of selection coefficients? We conduct our analysis under the discrete Wright–Fisher model by deriving the exact probability of an allele frequency trajectory in a population of time-varying size and we replicate our results under the diffusion model. For both models, we find that ignoring drift leads to estimates of selection coefficients that are nearly as accurate as estimates that account for the true population history, even when population sizes are small and drift is high. This result is of interest because inference methods that ignore drift are widely used in evolutionary studies and can be many orders of magnitude faster than methods that account for population sizes. PMID:27550904
Smeal, Steven W; Schmitt, Margaret A; Pereira, Ronnie Rodrigues; Prasad, Ashok; Fisk, John D
2017-01-01
To expand the quantitative, systems level understanding and foster the expansion of the biotechnological applications of the filamentous bacteriophage M13, we have unified the accumulated quantitative information on M13 biology into a genetically-structured, experimentally-based computational simulation of the entire phage life cycle. The deterministic chemical kinetic simulation explicitly includes the molecular details of DNA replication, mRNA transcription, protein translation and particle assembly, as well as the competing protein-protein and protein-nucleic acid interactions that control the timing and extent of phage production. The simulation reproduces the holistic behavior of M13, closely matching experimentally reported values of the intracellular levels of phage species and the timing of events in the M13 life cycle. The computational model provides a quantitative description of phage biology, highlights gaps in the present understanding of M13, and offers a framework for exploring alternative mechanisms of regulation in the context of the complete M13 life cycle. Copyright © 2016 Elsevier Inc. All rights reserved.
A combinatorial perspective of the protein inference problem.
Yang, Chao; He, Zengyou; Yu, Weichuan
2013-01-01
In a shotgun proteomics experiment, proteins are the most biologically meaningful output. The success of proteomics studies depends on the ability to accurately and efficiently identify proteins. Many methods have been proposed to facilitate the identification of proteins from peptide identification results. However, the relationship between protein identification and peptide identification has not been thoroughly explained before. In this paper, we devote ourselves to a combinatorial perspective of the protein inference problem. We employ combinatorial mathematics to calculate the conditional protein probabilities (protein probability means the probability that a protein is correctly identified) under three assumptions, which lead to a lower bound, an upper bound, and an empirical estimation of protein probabilities, respectively. The combinatorial perspective enables us to obtain an analytical expression for protein inference. Our method achieves comparable results with ProteinProphet in a more efficient manner in experiments on two data sets of standard protein mixtures and two data sets of real samples. Based on our model, we study the impact of unique peptides and degenerate peptides (degenerate peptides are peptides shared by at least two proteins) on protein probabilities. Meanwhile, we also study the relationship between our model and ProteinProphet. We name our program ProteinInfer. Its Java source code, our supplementary document and experimental results are available at: >http://bioinformatics.ust.hk/proteininfer.
A building block for hardware belief networks.
Behin-Aein, Behtash; Diep, Vinh; Datta, Supriyo
2016-07-21
Belief networks represent a powerful approach to problems involving probabilistic inference, but much of the work in this area is software based utilizing standard deterministic hardware based on the transistor which provides the gain and directionality needed to interconnect billions of them into useful networks. This paper proposes a transistor like device that could provide an analogous building block for probabilistic networks. We present two proof-of-concept examples of belief networks, one reciprocal and one non-reciprocal, implemented using the proposed device which is simulated using experimentally benchmarked models.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bianchini, G.; Burgio, N.; Carta, M.
The GUINEVERE experiment (Generation of Uninterrupted Intense Neutrons at the lead Venus Reactor) is an experimental program in support of the ADS technology presently carried out at SCK-CEN in Mol (Belgium). In the experiment a modified lay-out of the original thermal VENUS critical facility is coupled to an accelerator, built by the French body CNRS in Grenoble, working in both continuous and pulsed mode and delivering 14 MeV neutrons by bombardment of deuterons on a tritium-target. The modified lay-out of the facility consists of a fast subcritical core made of 30% U-235 enriched metallic Uranium in a lead matrix. Severalmore » off-line and on-line reactivity measurement techniques will be investigated during the experimental campaign. This report is focused on the simulation by deterministic (ERANOS French code) and Monte Carlo (MCNPX US code) calculations of three reactivity measurement techniques, Slope ({alpha}-fitting), Area-ratio and Source-jerk, applied to a GUINEVERE subcritical configuration (namely SC1). The inferred reactivity, in dollar units, by the Area-ratio method shows an overall agreement between the two deterministic and Monte Carlo computational approaches, whereas the MCNPX Source-jerk results are affected by large uncertainties and allow only partial conclusions about the comparison. Finally, no particular spatial dependence of the results is observed in the case of the GUINEVERE SC1 subcritical configuration. (authors)« less
Tveito, Aslak; Lines, Glenn T; Edwards, Andrew G; McCulloch, Andrew
2016-07-01
Markov models are ubiquitously used to represent the function of single ion channels. However, solving the inverse problem to construct a Markov model of single channel dynamics from bilayer or patch-clamp recordings remains challenging, particularly for channels involving complex gating processes. Methods for solving the inverse problem are generally based on data from voltage clamp measurements. Here, we describe an alternative approach to this problem based on measurements of voltage traces. The voltage traces define probability density functions of the functional states of an ion channel. These probability density functions can also be computed by solving a deterministic system of partial differential equations. The inversion is based on tuning the rates of the Markov models used in the deterministic system of partial differential equations such that the solution mimics the properties of the probability density function gathered from (pseudo) experimental data as well as possible. The optimization is done by defining a cost function to measure the difference between the deterministic solution and the solution based on experimental data. By evoking the properties of this function, it is possible to infer whether the rates of the Markov model are identifiable by our method. We present applications to Markov model well-known from the literature. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Heck, Daniel W; Hilbig, Benjamin E; Moshagen, Morten
2017-08-01
Decision strategies explain how people integrate multiple sources of information to make probabilistic inferences. In the past decade, increasingly sophisticated methods have been developed to determine which strategy explains decision behavior best. We extend these efforts to test psychologically more plausible models (i.e., strategies), including a new, probabilistic version of the take-the-best (TTB) heuristic that implements a rank order of error probabilities based on sequential processing. Within a coherent statistical framework, deterministic and probabilistic versions of TTB and other strategies can directly be compared using model selection by minimum description length or the Bayes factor. In an experiment with inferences from given information, only three of 104 participants were best described by the psychologically plausible, probabilistic version of TTB. Similar as in previous studies, most participants were classified as users of weighted-additive, a strategy that integrates all available information and approximates rational decisions. Copyright © 2017 Elsevier Inc. All rights reserved.
Modeling heterogeneous responsiveness of intrinsic apoptosis pathway
2013-01-01
Background Apoptosis is a cell suicide mechanism that enables multicellular organisms to maintain homeostasis and to eliminate individual cells that threaten the organism’s survival. Dependent on the type of stimulus, apoptosis can be propagated by extrinsic pathway or intrinsic pathway. The comprehensive understanding of the molecular mechanism of apoptotic signaling allows for development of mathematical models, aiming to elucidate dynamical and systems properties of apoptotic signaling networks. There have been extensive efforts in modeling deterministic apoptosis network accounting for average behavior of a population of cells. Cellular networks, however, are inherently stochastic and significant cell-to-cell variability in apoptosis response has been observed at single cell level. Results To address the inevitable randomness in the intrinsic apoptosis mechanism, we develop a theoretical and computational modeling framework of intrinsic apoptosis pathway at single-cell level, accounting for both deterministic and stochastic behavior. Our deterministic model, adapted from the well-accepted Fussenegger model, shows that an additional positive feedback between the executioner caspase and the initiator caspase plays a fundamental role in yielding the desired property of bistability. We then examine the impact of intrinsic fluctuations of biochemical reactions, viewed as intrinsic noise, and natural variation of protein concentrations, viewed as extrinsic noise, on behavior of the intrinsic apoptosis network. Histograms of the steady-state output at varying input levels show that the intrinsic noise could elicit a wider region of bistability over that of the deterministic model. However, the system stochasticity due to intrinsic fluctuations, such as the noise of steady-state response and the randomness of response delay, shows that the intrinsic noise in general is insufficient to produce significant cell-to-cell variations at physiologically relevant level of molecular numbers. Furthermore, the extrinsic noise represented by random variations of two key apoptotic proteins, namely Cytochrome C and inhibitor of apoptosis proteins (IAP), is modeled separately or in combination with intrinsic noise. The resultant stochasticity in the timing of intrinsic apoptosis response shows that the fluctuating protein variations can induce cell-to-cell stochastic variability at a quantitative level agreeing with experiments. Finally, simulations illustrate that the mean abundance of fluctuating IAP protein is positively correlated with the degree of cellular stochasticity of the intrinsic apoptosis pathway. Conclusions Our theoretical and computational study shows that the pronounced non-genetic heterogeneity in intrinsic apoptosis responses among individual cells plausibly arises from extrinsic rather than intrinsic origin of fluctuations. In addition, it predicts that the IAP protein could serve as a potential therapeutic target for suppression of the cell-to-cell variation in the intrinsic apoptosis responsiveness. PMID:23875784
Computational approaches to protein inference in shotgun proteomics
2012-01-01
Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughput nature, shotgun proteomics faces challenges with respect to the analysis and interpretation of experimental data. Among such challenges, the identification of proteins present in a sample has been recognized as an important computational task. This task generally consists of (1) assigning experimental tandem mass spectra to peptides derived from a protein database, and (2) mapping assigned peptides to proteins and quantifying the confidence of identified proteins. Protein identification is fundamentally a statistical inference problem with a number of methods proposed to address its challenges. In this review we categorize current approaches into rule-based, combinatorial optimization and probabilistic inference techniques, and present them using integer programing and Bayesian inference frameworks. We also discuss the main challenges of protein identification and propose potential solutions with the goal of spurring innovative research in this area. PMID:23176300
NASA Astrophysics Data System (ADS)
Moore, P.; Williams, S. D. P.
2014-12-01
Terrestrial water storage (TWS) change for 2003-2011 is estimated over Africa from GRACE gravimetric data. The signatures from change in water of the major lakes are removed by utilizing kernel functions with lake heights recovered from retracked ENVISAT satellite altimetry. In addition, the contribution of gravimetric change due to soil moisture and biomass is removed from the total GRACE signal by utilizing the GLDAS land surface model. The residual TWS time series, namely groundwater and the surface waters in rivers, wetlands, and small lakes, are investigated for trends and the seasonal cycle using linear regression. Typically, such analyses assume that the data are temporally uncorrelated but this has been shown to lead to erroneous inferences in related studies concerning the linear rate and acceleration. In this study, we utilize autocorrelation and investigate the appropriate stochastic model. The results show the proper distribution of TWS change and identify the spatial distribution of significant rates and accelerations. The effect of surface water in the major lakes is shown to contribute significantly to the trend and seasonal variation in TWS in the lake basin. Lake Volta, a managed reservoir in Ghana, is seen to have a contribution to the linear trend that is a factor of three greater than that of Lake Victoria despite having a surface area one-eighth of that of Lake Victoria. Analysis also shows the confidence levels of the deterministic trend and acceleration identifying areas where the signatures are most likely due to a physical deterministic cause and not simply stochastic variations.
Approximation and inference methods for stochastic biochemical kinetics—a tutorial review
NASA Astrophysics Data System (ADS)
Schnoerr, David; Sanguinetti, Guido; Grima, Ramon
2017-03-01
Stochastic fluctuations of molecule numbers are ubiquitous in biological systems. Important examples include gene expression and enzymatic processes in living cells. Such systems are typically modelled as chemical reaction networks whose dynamics are governed by the chemical master equation. Despite its simple structure, no analytic solutions to the chemical master equation are known for most systems. Moreover, stochastic simulations are computationally expensive, making systematic analysis and statistical inference a challenging task. Consequently, significant effort has been spent in recent decades on the development of efficient approximation and inference methods. This article gives an introduction to basic modelling concepts as well as an overview of state of the art methods. First, we motivate and introduce deterministic and stochastic methods for modelling chemical networks, and give an overview of simulation and exact solution methods. Next, we discuss several approximation methods, including the chemical Langevin equation, the system size expansion, moment closure approximations, time-scale separation approximations and hybrid methods. We discuss their various properties and review recent advances and remaining challenges for these methods. We present a comparison of several of these methods by means of a numerical case study and highlight some of their respective advantages and disadvantages. Finally, we discuss the problem of inference from experimental data in the Bayesian framework and review recent methods developed the literature. In summary, this review gives a self-contained introduction to modelling, approximations and inference methods for stochastic chemical kinetics.
Jewett, Ethan M; Steinrücken, Matthias; Song, Yun S
2016-11-01
Many approaches have been developed for inferring selection coefficients from time series data while accounting for genetic drift. These approaches have been motivated by the intuition that properly accounting for the population size history can significantly improve estimates of selective strengths. However, the improvement in inference accuracy that can be attained by modeling drift has not been characterized. Here, by comparing maximum likelihood estimates of selection coefficients that account for the true population size history with estimates that ignore drift by assuming allele frequencies evolve deterministically in a population of infinite size, we address the following questions: how much can modeling the population size history improve estimates of selection coefficients? How much can mis-inferred population sizes hurt inferences of selection coefficients? We conduct our analysis under the discrete Wright-Fisher model by deriving the exact probability of an allele frequency trajectory in a population of time-varying size and we replicate our results under the diffusion model. For both models, we find that ignoring drift leads to estimates of selection coefficients that are nearly as accurate as estimates that account for the true population history, even when population sizes are small and drift is high. This result is of interest because inference methods that ignore drift are widely used in evolutionary studies and can be many orders of magnitude faster than methods that account for population sizes. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Modeling Protein Expression and Protein Signaling Pathways
Telesca, Donatello; Müller, Peter; Kornblau, Steven M.; Suchard, Marc A.; Ji, Yuan
2015-01-01
High-throughput functional proteomic technologies provide a way to quantify the expression of proteins of interest. Statistical inference centers on identifying the activation state of proteins and their patterns of molecular interaction formalized as dependence structure. Inference on dependence structure is particularly important when proteins are selected because they are part of a common molecular pathway. In that case, inference on dependence structure reveals properties of the underlying pathway. We propose a probability model that represents molecular interactions at the level of hidden binary latent variables that can be interpreted as indicators for active versus inactive states of the proteins. The proposed approach exploits available expert knowledge about the target pathway to define an informative prior on the hidden conditional dependence structure. An important feature of this prior is that it provides an instrument to explicitly anchor the model space to a set of interactions of interest, favoring a local search approach to model determination. We apply our model to reverse-phase protein array data from a study on acute myeloid leukemia. Our inference identifies relevant subpathways in relation to the unfolding of the biological process under study. PMID:26246646
Dinov, Martin; Leech, Robert
2017-01-01
Part of the process of EEG microstate estimation involves clustering EEG channel data at the global field power (GFP) maxima, very commonly using a modified K-means approach. Clustering has also been done deterministically, despite there being uncertainties in multiple stages of the microstate analysis, including the GFP peak definition, the clustering itself and in the post-clustering assignment of microstates back onto the EEG timecourse of interest. We perform a fully probabilistic microstate clustering and labeling, to account for these sources of uncertainty using the closest probabilistic analog to KM called Fuzzy C-means (FCM). We train softmax multi-layer perceptrons (MLPs) using the KM and FCM-inferred cluster assignments as target labels, to then allow for probabilistic labeling of the full EEG data instead of the usual correlation-based deterministic microstate label assignment typically used. We assess the merits of the probabilistic analysis vs. the deterministic approaches in EEG data recorded while participants perform real or imagined motor movements from a publicly available data set of 109 subjects. Though FCM group template maps that are almost topographically identical to KM were found, there is considerable uncertainty in the subsequent assignment of microstate labels. In general, imagined motor movements are less predictable on a time point-by-time point basis, possibly reflecting the more exploratory nature of the brain state during imagined, compared to during real motor movements. We find that some relationships may be more evident using FCM than using KM and propose that future microstate analysis should preferably be performed probabilistically rather than deterministically, especially in situations such as with brain computer interfaces, where both training and applying models of microstates need to account for uncertainty. Probabilistic neural network-driven microstate assignment has a number of advantages that we have discussed, which are likely to be further developed and exploited in future studies. In conclusion, probabilistic clustering and a probabilistic neural network-driven approach to microstate analysis is likely to better model and reveal details and the variability hidden in current deterministic and binarized microstate assignment and analyses.
Dinov, Martin; Leech, Robert
2017-01-01
Part of the process of EEG microstate estimation involves clustering EEG channel data at the global field power (GFP) maxima, very commonly using a modified K-means approach. Clustering has also been done deterministically, despite there being uncertainties in multiple stages of the microstate analysis, including the GFP peak definition, the clustering itself and in the post-clustering assignment of microstates back onto the EEG timecourse of interest. We perform a fully probabilistic microstate clustering and labeling, to account for these sources of uncertainty using the closest probabilistic analog to KM called Fuzzy C-means (FCM). We train softmax multi-layer perceptrons (MLPs) using the KM and FCM-inferred cluster assignments as target labels, to then allow for probabilistic labeling of the full EEG data instead of the usual correlation-based deterministic microstate label assignment typically used. We assess the merits of the probabilistic analysis vs. the deterministic approaches in EEG data recorded while participants perform real or imagined motor movements from a publicly available data set of 109 subjects. Though FCM group template maps that are almost topographically identical to KM were found, there is considerable uncertainty in the subsequent assignment of microstate labels. In general, imagined motor movements are less predictable on a time point-by-time point basis, possibly reflecting the more exploratory nature of the brain state during imagined, compared to during real motor movements. We find that some relationships may be more evident using FCM than using KM and propose that future microstate analysis should preferably be performed probabilistically rather than deterministically, especially in situations such as with brain computer interfaces, where both training and applying models of microstates need to account for uncertainty. Probabilistic neural network-driven microstate assignment has a number of advantages that we have discussed, which are likely to be further developed and exploited in future studies. In conclusion, probabilistic clustering and a probabilistic neural network-driven approach to microstate analysis is likely to better model and reveal details and the variability hidden in current deterministic and binarized microstate assignment and analyses. PMID:29163110
A solution to the biodiversity paradox by logical deterministic cellular automata.
Kalmykov, Lev V; Kalmykov, Vyacheslav L
2015-06-01
The paradox of biological diversity is the key problem of theoretical ecology. The paradox consists in the contradiction between the competitive exclusion principle and the observed biodiversity. The principle is important as the basis for ecological theory. On a relatively simple model we show a mechanism of indefinite coexistence of complete competitors which violates the known formulations of the competitive exclusion principle. This mechanism is based on timely recovery of limiting resources and their spatio-temporal allocation between competitors. Because of limitations of the black-box modeling there was a problem to formulate the exclusion principle correctly. Our white-box multiscale model of two-species competition is based on logical deterministic individual-based cellular automata. This approach provides an automatic deductive inference on the basis of a system of axioms, and gives a direct insight into mechanisms of the studied system. It is one of the most promising methods of artificial intelligence. We reformulate and generalize the competitive exclusion principle and explain why this formulation provides a solution of the biodiversity paradox. In addition, we propose a principle of competitive coexistence.
Disentangling the stochastic behavior of complex time series
NASA Astrophysics Data System (ADS)
Anvari, Mehrnaz; Tabar, M. Reza Rahimi; Peinke, Joachim; Lehnertz, Klaus
2016-10-01
Complex systems involving a large number of degrees of freedom, generally exhibit non-stationary dynamics, which can result in either continuous or discontinuous sample paths of the corresponding time series. The latter sample paths may be caused by discontinuous events - or jumps - with some distributed amplitudes, and disentangling effects caused by such jumps from effects caused by normal diffusion processes is a main problem for a detailed understanding of stochastic dynamics of complex systems. Here we introduce a non-parametric method to address this general problem. By means of a stochastic dynamical jump-diffusion modelling, we separate deterministic drift terms from different stochastic behaviors, namely diffusive and jumpy ones, and show that all of the unknown functions and coefficients of this modelling can be derived directly from measured time series. We demonstrate appli- cability of our method to empirical observations by a data-driven inference of the deterministic drift term and of the diffusive and jumpy behavior in brain dynamics from ten epilepsy patients. Particularly these different stochastic behaviors provide extra information that can be regarded valuable for diagnostic purposes.
Suratanee, Apichat; Plaimas, Kitiporn
2017-01-01
The associations between proteins and diseases are crucial information for investigating pathological mechanisms. However, the number of known and reliable protein-disease associations is quite small. In this study, an analysis framework to infer associations between proteins and diseases was developed based on a large data set of a human protein-protein interaction network integrating an effective network search, namely, the reverse k -nearest neighbor (R k NN) search. The R k NN search was used to identify an impact of a protein on other proteins. Then, associations between proteins and diseases were inferred statistically. The method using the R k NN search yielded a much higher precision than a random selection, standard nearest neighbor search, or when applying the method to a random protein-protein interaction network. All protein-disease pair candidates were verified by a literature search. Supporting evidence for 596 pairs was identified. In addition, cluster analysis of these candidates revealed 10 promising groups of diseases to be further investigated experimentally. This method can be used to identify novel associations to better understand complex relationships between proteins and diseases.
Chen, Hua
2013-03-01
Tracing back to a specific time T in the past, the genealogy of a sample of haplotypes may not have reached their common ancestor and may leave m lineages extant. For such an incomplete genealogy truncated at a specific time T in the past, the distribution and expectation of the intercoalescence times conditional on T are derived in an exact form in this paper for populations of deterministically time-varying sizes, specifically, for populations growing exponentially. The derived intercoalescence time distribution can be integrated to the coalescent-based joint allele frequency spectrum (JAFS) theory, and is useful for population genetic inference from large-scale genomic data, without relying on computationally intensive approaches, such as importance sampling and Markov Chain Monte Carlo (MCMC) methods. The inference of several important parameters relying on this derived conditional distribution is demonstrated: quantifying population growth rate and onset time, and estimating the number of ancestral lineages at a specific ancient time. Simulation studies confirm validity of the derivation and statistical efficiency of the methods using the derived intercoalescence time distribution. Two examples of real data are given to show the inference of the population growth rate of a European sample from the NIEHS Environmental Genome Project, and the number of ancient lineages of 31 mitochondrial genomes from Tibetan populations. © 2013 Blackwell Publishing Ltd/University College London.
Exact Bayesian Inference for Phylogenetic Birth-Death Models.
Parag, K V; Pybus, O G
2018-04-26
Inferring the rates of change of a population from a reconstructed phylogeny of genetic sequences is a central problem in macro-evolutionary biology, epidemiology, and many other disciplines. A popular solution involves estimating the parameters of a birth-death process (BDP), which links the shape of the phylogeny to its birth and death rates. Modern BDP estimators rely on random Markov chain Monte Carlo (MCMC) sampling to infer these rates. Such methods, while powerful and scalable, cannot be guaranteed to converge, leading to results that may be hard to replicate or difficult to validate. We present a conceptually and computationally different parametric BDP inference approach using flexible and easy to implement Snyder filter (SF) algorithms. This method is deterministic so its results are provable, guaranteed, and reproducible. We validate the SF on constant rate BDPs and find that it solves BDP likelihoods known to produce robust estimates. We then examine more complex BDPs with time-varying rates. Our estimates compare well with a recently developed parametric MCMC inference method. Lastly, we performmodel selection on an empirical Agamid species phylogeny, obtaining results consistent with the literature. The SF makes no approximations, beyond those required for parameter quantisation and numerical integration, and directly computes the posterior distribution of model parameters. It is a promising alternative inference algorithm that may serve either as a standalone Bayesian estimator or as a useful diagnostic reference for validating more involved MCMC strategies. The Snyder filter is implemented in Matlab and the time-varying BDP models are simulated in R. The source code and data are freely available at https://github.com/kpzoo/snyder-birth-death-code. kris.parag@zoo.ox.ac.uk. Supplementary material is available at Bioinformatics online.
Haplotype Phasing and Inheritance of Copy Number Variants in Nuclear Families
Palta, Priit; Kaplinski, Lauris; Nagirnaja, Liina; Veidenberg, Andres; Möls, Märt; Nelis, Mari; Esko, Tõnu; Metspalu, Andres; Laan, Maris; Remm, Maido
2015-01-01
DNA copy number variants (CNVs) that alter the copy number of a particular DNA segment in the genome play an important role in human phenotypic variability and disease susceptibility. A number of CNVs overlapping with genes have been shown to confer risk to a variety of human diseases thus highlighting the relevance of addressing the variability of CNVs at a higher resolution. So far, it has not been possible to deterministically infer the allelic composition of different haplotypes present within the CNV regions. We have developed a novel computational method, called PiCNV, which enables to resolve the haplotype sequence composition within CNV regions in nuclear families based on SNP genotyping microarray data. The algorithm allows to i) phase normal and CNV-carrying haplotypes in the copy number variable regions, ii) resolve the allelic copies of rearranged DNA sequence within the haplotypes and iii) infer the heritability of identified haplotypes in trios or larger nuclear families. To our knowledge this is the first program available that can deterministically phase null, mono-, di-, tri- and tetraploid genotypes in CNV loci. We applied our method to study the composition and inheritance of haplotypes in CNV regions of 30 HapMap Yoruban trios and 34 Estonian families. For 93.6% of the CNV loci, PiCNV enabled to unambiguously phase normal and CNV-carrying haplotypes and follow their transmission in the corresponding families. Furthermore, allelic composition analysis identified the co-occurrence of alternative allelic copies within 66.7% of haplotypes carrying copy number gains. We also observed less frequent transmission of CNV-carrying haplotypes from parents to children compared to normal haplotypes and identified an emergence of several de novo deletions and duplications in the offspring. PMID:25853576
Haplotype phasing and inheritance of copy number variants in nuclear families.
Palta, Priit; Kaplinski, Lauris; Nagirnaja, Liina; Veidenberg, Andres; Möls, Märt; Nelis, Mari; Esko, Tõnu; Metspalu, Andres; Laan, Maris; Remm, Maido
2015-01-01
DNA copy number variants (CNVs) that alter the copy number of a particular DNA segment in the genome play an important role in human phenotypic variability and disease susceptibility. A number of CNVs overlapping with genes have been shown to confer risk to a variety of human diseases thus highlighting the relevance of addressing the variability of CNVs at a higher resolution. So far, it has not been possible to deterministically infer the allelic composition of different haplotypes present within the CNV regions. We have developed a novel computational method, called PiCNV, which enables to resolve the haplotype sequence composition within CNV regions in nuclear families based on SNP genotyping microarray data. The algorithm allows to i) phase normal and CNV-carrying haplotypes in the copy number variable regions, ii) resolve the allelic copies of rearranged DNA sequence within the haplotypes and iii) infer the heritability of identified haplotypes in trios or larger nuclear families. To our knowledge this is the first program available that can deterministically phase null, mono-, di-, tri- and tetraploid genotypes in CNV loci. We applied our method to study the composition and inheritance of haplotypes in CNV regions of 30 HapMap Yoruban trios and 34 Estonian families. For 93.6% of the CNV loci, PiCNV enabled to unambiguously phase normal and CNV-carrying haplotypes and follow their transmission in the corresponding families. Furthermore, allelic composition analysis identified the co-occurrence of alternative allelic copies within 66.7% of haplotypes carrying copy number gains. We also observed less frequent transmission of CNV-carrying haplotypes from parents to children compared to normal haplotypes and identified an emergence of several de novo deletions and duplications in the offspring.
Quantifying Biomass from Point Clouds by Connecting Representations of Ecosystem Structure
NASA Astrophysics Data System (ADS)
Hendryx, S. M.; Barron-Gafford, G.
2017-12-01
Quantifying terrestrial ecosystem biomass is an essential part of monitoring carbon stocks and fluxes within the global carbon cycle and optimizing natural resource management. Point cloud data such as from lidar and structure from motion can be effective for quantifying biomass over large areas, but significant challenges remain in developing effective models that allow for such predictions. Inference models that estimate biomass from point clouds are established in many environments, yet, are often scale-dependent, needing to be fitted and applied at the same spatial scale and grid size at which they were developed. Furthermore, training such models typically requires large in situ datasets that are often prohibitively costly or time-consuming to obtain. We present here a scale- and sensor-invariant framework for efficiently estimating biomass from point clouds. Central to this framework, we present a new algorithm, assignPointsToExistingClusters, that has been developed for finding matches between in situ data and clusters in remotely-sensed point clouds. The algorithm can be used for assessing canopy segmentation accuracy and for training and validating machine learning models for predicting biophysical variables. We demonstrate the algorithm's efficacy by using it to train a random forest model of above ground biomass in a shrubland environment in Southern Arizona. We show that by learning a nonlinear function to estimate biomass from segmented canopy features we can reduce error, especially in the presence of inaccurate clusterings, when compared to a traditional, deterministic technique to estimate biomass from remotely measured canopies. Our random forest on cluster features model extends established methods of training random forest regressions to predict biomass of subplots but requires significantly less training data and is scale invariant. The random forest on cluster features model reduced mean absolute error, when evaluated on all test data in leave one out cross validation, by 40.6% from deterministic mesquite allometry and 35.9% from the inferred ecosystem-state allometric function. Our framework should allow for the inference of biomass more efficiently than common subplot methods and more accurately than individual tree segmentation methods in densely vegetated environments.
Jacquin, Hugo; Gilson, Amy; Shakhnovich, Eugene; Cocco, Simona; Monasson, Rémi
2016-05-01
Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effective pairwise Potts Hamiltonians from those MSA. We find that inferred Potts Hamiltonians reproduce many important aspects of 'true' LP structures and energetics. Careful analysis reveals that effective pairwise couplings in inferred Potts Hamiltonians depend not only on the energetics of the native structure but also on competing folds; in particular, the coupling values reflect both positive design (stabilization of native conformation) and negative design (destabilization of competing folds). In addition to providing detailed structural information, the inferred Potts models used as protein Hamiltonian for design of new sequences are able to generate with high probability completely new sequences with the desired folds, which is not possible using independent-site models. Those are remarkable results as the effective LP Hamiltonians used to generate MSA are not simple pairwise models due to the competition between the folds. Our findings elucidate the reasons for the success of inverse approaches to the modelling of proteins from sequence data, and their limitations.
From laws of inference to protein folding dynamics.
Tseng, Chih-Yuan; Yu, Chun-Ping; Lee, H C
2010-08-01
Protein folding dynamics is one of major issues constantly investigated in the study of protein functions. The molecular dynamic (MD) simulation with the replica exchange method (REM) is a common theoretical approach considered. Yet a trade-off in applying the REM is that the dynamics toward the native configuration in the simulations seems lost. In this work, we show that given REM-MD simulation results, protein folding dynamics can be directly derived from laws of inference. The applicability of the resulting approach, the entropic folding dynamics, is illustrated by investigating a well-studied Trp-cage peptide. Our results are qualitatively comparable with those from other studies. The current studies suggest that the incorporation of laws of inference and physics brings in a comprehensive perspective on exploring the protein folding dynamics.
Bhadra, Pratiti; Pal, Debnath
2017-04-01
Dynamics is integral to the function of proteins, yet the use of molecular dynamics (MD) simulation as a technique remains under-explored for molecular function inference. This is more important in the context of genomics projects where novel proteins are determined with limited evolutionary information. Recently we developed a method to match the query protein's flexible segments to infer function using a novel approach combining analysis of residue fluctuation-graphs and auto-correlation vectors derived from coarse-grained (CG) MD trajectory. The method was validated on a diverse dataset with sequence identity between proteins as low as 3%, with high function-recall rates. Here we share its implementation as a publicly accessible web service, named DynFunc (Dynamics Match for Function) to query protein function from ≥1 µs long CG dynamics trajectory information of protein subunits. Users are provided with the custom-developed coarse-grained molecular mechanics (CGMM) forcefield to generate the MD trajectories for their protein of interest. On upload of trajectory information, the DynFunc web server identifies specific flexible regions of the protein linked to putative molecular function. Our unique application does not use evolutionary information to infer molecular function from MD information and can, therefore, work for all proteins, including moonlighting and the novel ones, whenever structural information is available. Our pipeline is expected to be of utility to all structural biologists working with novel proteins and interested in moonlighting functions. Copyright © 2017 Elsevier Ltd. All rights reserved.
Kalman filter approach for uncertainty quantification in time-resolved laser-induced incandescence.
Hadwin, Paul J; Sipkens, Timothy A; Thomson, Kevin A; Liu, Fengshan; Daun, Kyle J
2018-03-01
Time-resolved laser-induced incandescence (TiRe-LII) data can be used to infer spatially and temporally resolved volume fractions and primary particle size distributions of soot-laden aerosols, but these estimates are corrupted by measurement noise as well as uncertainties in the spectroscopic and heat transfer submodels used to interpret the data. Estimates of the temperature, concentration, and size distribution of soot primary particles within a sample aerosol are typically made by nonlinear regression of modeled spectral incandescence decay, or effective temperature decay, to experimental data. In this work, we employ nonstationary Bayesian estimation techniques to infer aerosol properties from simulated and experimental LII signals, specifically the extended Kalman filter and Schmidt-Kalman filter. These techniques exploit the time-varying nature of both the measurements and the models, and they reveal how uncertainty in the estimates computed from TiRe-LII data evolves over time. Both techniques perform better when compared with standard deterministic estimates; however, we demonstrate that the Schmidt-Kalman filter produces more realistic uncertainty estimates.
Identifying the multiple dysregulated oncoproteins that contribute to tumorigenesis in a given patient is crucial for developing personalized treatment plans. However, accurate inference of aberrant protein activity in biological samples is still challenging as genetic alterations are only partially predictive and direct measurements of protein activity are generally not feasible.
Inferring the palaeoenvironment of ancient bacteria on the basis of resurrected proteins
NASA Technical Reports Server (NTRS)
Gaucher, Eric A.; Thomson, J. Michael; Burgan, Michelle F.; Benner, Steven A.
2003-01-01
Features of the physical environment surrounding an ancestral organism can be inferred by reconstructing sequences of ancient proteins made by those organisms, resurrecting these proteins in the laboratory, and measuring their properties. Here, we resurrect candidate sequences for elongation factors of the Tu family (EF-Tu) found at ancient nodes in the bacterial evolutionary tree, and measure their activities as a function of temperature. The ancient EF-Tu proteins have temperature optima of 55-65 degrees C. This value seems to be robust with respect to uncertainties in the ancestral reconstruction. This suggests that the ancient bacteria that hosted these particular genes were thermophiles, and neither hyperthermophiles nor mesophiles. This conclusion can be compared and contrasted with inferences drawn from an analysis of the lengths of branches in trees joining proteins from contemporary bacteria, the distribution of thermophily in derived bacterial lineages, the inferred G + C content of ancient ribosomal RNA, and the geological record combined with assumptions concerning molecular clocks. The study illustrates the use of experimental palaeobiochemistry and assumptions about deep phylogenetic relationships between bacteria to explore the character of ancient life.
PIA: An Intuitive Protein Inference Engine with a Web-Based User Interface.
Uszkoreit, Julian; Maerkens, Alexandra; Perez-Riverol, Yasset; Meyer, Helmut E; Marcus, Katrin; Stephan, Christian; Kohlbacher, Oliver; Eisenacher, Martin
2015-07-02
Protein inference connects the peptide spectrum matches (PSMs) obtained from database search engines back to proteins, which are typically at the heart of most proteomics studies. Different search engines yield different PSMs and thus different protein lists. Analysis of results from one or multiple search engines is often hampered by different data exchange formats and lack of convenient and intuitive user interfaces. We present PIA, a flexible software suite for combining PSMs from different search engine runs and turning these into consistent results. PIA can be integrated into proteomics data analysis workflows in several ways. A user-friendly graphical user interface can be run either locally or (e.g., for larger core facilities) from a central server. For automated data processing, stand-alone tools are available. PIA implements several established protein inference algorithms and can combine results from different search engines seamlessly. On several benchmark data sets, we show that PIA can identify a larger number of proteins at the same protein FDR when compared to that using inference based on a single search engine. PIA supports the majority of established search engines and data in the mzIdentML standard format. It is implemented in Java and freely available at https://github.com/mpc-bioinformatics/pia.
Zhang, Wangshu; Coba, Marcelo P; Sun, Fengzhu
2016-01-11
Protein domains can be viewed as portable units of biological function that defines the functional properties of proteins. Therefore, if a protein is associated with a disease, protein domains might also be associated and define disease endophenotypes. However, knowledge about such domain-disease relationships is rarely available. Thus, identification of domains associated with human diseases would greatly improve our understanding of the mechanism of human complex diseases and further improve the prevention, diagnosis and treatment of these diseases. Based on phenotypic similarities among diseases, we first group diseases into overlapping modules. We then develop a framework to infer associations between domains and diseases through known relationships between diseases and modules, domains and proteins, as well as proteins and disease modules. Different methods including Association, Maximum likelihood estimation (MLE), Domain-disease pair exclusion analysis (DPEA), Bayesian, and Parsimonious explanation (PE) approaches are developed to predict domain-disease associations. We demonstrate the effectiveness of all the five approaches via a series of validation experiments, and show the robustness of the MLE, Bayesian and PE approaches to the involved parameters. We also study the effects of disease modularization in inferring novel domain-disease associations. Through validation, the AUC (Area Under the operating characteristic Curve) scores for Bayesian, MLE, DPEA, PE, and Association approaches are 0.86, 0.84, 0.83, 0.83 and 0.79, respectively, indicating the usefulness of these approaches for predicting domain-disease relationships. Finally, we choose the Bayesian approach to infer domains associated with two common diseases, Crohn's disease and type 2 diabetes. The Bayesian approach has the best performance for the inference of domain-disease relationships. The predicted landscape between domains and diseases provides a more detailed view about the disease mechanisms.
1986-10-01
organic acids using the Hammett equation , has been called the hydrophobic effect.’ Water adjusts its geometry to maximize the number of intact hydrogen...understanding both structural stability with respect to the underlying equations (not initial values) and phase transitions in these dynamical hierarchies...for quantitative characterization. Although the complicated behavior is gen- erated by deterministic equations , its description in entropies leads to
Hybrid deterministic/stochastic simulation of complex biochemical systems.
Lecca, Paola; Bagagiolo, Fabio; Scarpa, Marina
2017-11-21
In a biological cell, cellular functions and the genetic regulatory apparatus are implemented and controlled by complex networks of chemical reactions involving genes, proteins, and enzymes. Accurate computational models are indispensable means for understanding the mechanisms behind the evolution of a complex system, not always explored with wet lab experiments. To serve their purpose, computational models, however, should be able to describe and simulate the complexity of a biological system in many of its aspects. Moreover, it should be implemented by efficient algorithms requiring the shortest possible execution time, to avoid enlarging excessively the time elapsing between data analysis and any subsequent experiment. Besides the features of their topological structure, the complexity of biological networks also refers to their dynamics, that is often non-linear and stiff. The stiffness is due to the presence of molecular species whose abundance fluctuates by many orders of magnitude. A fully stochastic simulation of a stiff system is computationally time-expensive. On the other hand, continuous models are less costly, but they fail to capture the stochastic behaviour of small populations of molecular species. We introduce a new efficient hybrid stochastic-deterministic computational model and the software tool MoBioS (MOlecular Biology Simulator) implementing it. The mathematical model of MoBioS uses continuous differential equations to describe the deterministic reactions and a Gillespie-like algorithm to describe the stochastic ones. Unlike the majority of current hybrid methods, the MoBioS algorithm divides the reactions' set into fast reactions, moderate reactions, and slow reactions and implements a hysteresis switching between the stochastic model and the deterministic model. Fast reactions are approximated as continuous-deterministic processes and modelled by deterministic rate equations. Moderate reactions are those whose reaction waiting time is greater than the fast reaction waiting time but smaller than the slow reaction waiting time. A moderate reaction is approximated as a stochastic (deterministic) process if it was classified as a stochastic (deterministic) process at the time at which it crosses the threshold of low (high) waiting time. A Gillespie First Reaction Method is implemented to select and execute the slow reactions. The performances of MoBios were tested on a typical example of hybrid dynamics: that is the DNA transcription regulation. The simulated dynamic profile of the reagents' abundance and the estimate of the error introduced by the fully deterministic approach were used to evaluate the consistency of the computational model and that of the software tool.
Efficient Characterization of Parametric Uncertainty of Complex (Bio)chemical Networks.
Schillings, Claudia; Sunnåker, Mikael; Stelling, Jörg; Schwab, Christoph
2015-08-01
Parametric uncertainty is a particularly challenging and relevant aspect of systems analysis in domains such as systems biology where, both for inference and for assessing prediction uncertainties, it is essential to characterize the system behavior globally in the parameter space. However, current methods based on local approximations or on Monte-Carlo sampling cope only insufficiently with high-dimensional parameter spaces associated with complex network models. Here, we propose an alternative deterministic methodology that relies on sparse polynomial approximations. We propose a deterministic computational interpolation scheme which identifies most significant expansion coefficients adaptively. We present its performance in kinetic model equations from computational systems biology with several hundred parameters and state variables, leading to numerical approximations of the parametric solution on the entire parameter space. The scheme is based on adaptive Smolyak interpolation of the parametric solution at judiciously and adaptively chosen points in parameter space. As Monte-Carlo sampling, it is "non-intrusive" and well-suited for massively parallel implementation, but affords higher convergence rates. This opens up new avenues for large-scale dynamic network analysis by enabling scaling for many applications, including parameter estimation, uncertainty quantification, and systems design.
Efficient Characterization of Parametric Uncertainty of Complex (Bio)chemical Networks
Schillings, Claudia; Sunnåker, Mikael; Stelling, Jörg; Schwab, Christoph
2015-01-01
Parametric uncertainty is a particularly challenging and relevant aspect of systems analysis in domains such as systems biology where, both for inference and for assessing prediction uncertainties, it is essential to characterize the system behavior globally in the parameter space. However, current methods based on local approximations or on Monte-Carlo sampling cope only insufficiently with high-dimensional parameter spaces associated with complex network models. Here, we propose an alternative deterministic methodology that relies on sparse polynomial approximations. We propose a deterministic computational interpolation scheme which identifies most significant expansion coefficients adaptively. We present its performance in kinetic model equations from computational systems biology with several hundred parameters and state variables, leading to numerical approximations of the parametric solution on the entire parameter space. The scheme is based on adaptive Smolyak interpolation of the parametric solution at judiciously and adaptively chosen points in parameter space. As Monte-Carlo sampling, it is “non-intrusive” and well-suited for massively parallel implementation, but affords higher convergence rates. This opens up new avenues for large-scale dynamic network analysis by enabling scaling for many applications, including parameter estimation, uncertainty quantification, and systems design. PMID:26317784
Efficient Integrative Multi-SNP Association Analysis via Deterministic Approximation of Posteriors.
Wen, Xiaoquan; Lee, Yeji; Luca, Francesca; Pique-Regi, Roger
2016-06-02
With the increasing availability of functional genomic data, incorporating genomic annotations into genetic association analysis has become a standard procedure. However, the existing methods often lack rigor and/or computational efficiency and consequently do not maximize the utility of functional annotations. In this paper, we propose a rigorous inference procedure to perform integrative association analysis incorporating genomic annotations for both traditional GWASs and emerging molecular QTL mapping studies. In particular, we propose an algorithm, named deterministic approximation of posteriors (DAP), which enables highly efficient and accurate joint enrichment analysis and identification of multiple causal variants. We use a series of simulation studies to highlight the power and computational efficiency of our proposed approach and further demonstrate it by analyzing the cross-population eQTL data from the GEUVADIS project and the multi-tissue eQTL data from the GTEx project. In particular, we find that genetic variants predicted to disrupt transcription factor binding sites are enriched in cis-eQTLs across all tissues. Moreover, the enrichment estimates obtained across the tissues are correlated with the cell types for which the annotations are derived. Copyright © 2016 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Dynamical inference: where phase synchronization and generalized synchronization meet.
Stankovski, Tomislav; McClintock, Peter V E; Stefanovska, Aneta
2014-06-01
Synchronization is a widespread phenomenon that occurs among interacting oscillatory systems. It facilitates their temporal coordination and can lead to the emergence of spontaneous order. The detection of synchronization from the time series of such systems is of great importance for the understanding and prediction of their dynamics, and several methods for doing so have been introduced. However, the common case where the interacting systems have time-variable characteristic frequencies and coupling parameters, and may also be subject to continuous external perturbation and noise, still presents a major challenge. Here we apply recent developments in dynamical Bayesian inference to tackle these problems. In particular, we discuss how to detect phase slips and the existence of deterministic coupling from measured data, and we unify the concepts of phase synchronization and general synchronization. Starting from phase or state observables, we present methods for the detection of both phase and generalized synchronization. The consistency and equivalence of phase and generalized synchronization are further demonstrated, by the analysis of time series from analog electronic simulations of coupled nonautonomous van der Pol oscillators. We demonstrate that the detection methods work equally well on numerically simulated chaotic systems. In all the cases considered, we show that dynamical Bayesian inference can clearly identify noise-induced phase slips and distinguish coherence from intrinsic coupling-induced synchronization.
Hierarchial mark-recapture models: a framework for inference about demographic processes
Link, W.A.; Barker, R.J.
2004-01-01
The development of sophisticated mark-recapture models over the last four decades has provided fundamental tools for the study of wildlife populations, allowing reliable inference about population sizes and demographic rates based on clearly formulated models for the sampling processes. Mark-recapture models are now routinely described by large numbers of parameters. These large models provide the next challenge to wildlife modelers: the extraction of signal from noise in large collections of parameters. Pattern among parameters can be described by strong, deterministic relations (as in ultrastructural models) but is more flexibly and credibly modeled using weaker, stochastic relations. Trend in survival rates is not likely to be manifest by a sequence of values falling precisely on a given parametric curve; rather, if we could somehow know the true values, we might anticipate a regression relation between parameters and explanatory variables, in which true value equals signal plus noise. Hierarchical models provide a useful framework for inference about collections of related parameters. Instead of regarding parameters as fixed but unknown quantities, we regard them as realizations of stochastic processes governed by hyperparameters. Inference about demographic processes is based on investigation of these hyperparameters. We advocate the Bayesian paradigm as a natural, mathematically and scientifically sound basis for inference about hierarchical models. We describe analysis of capture-recapture data from an open population based on hierarchical extensions of the Cormack-Jolly-Seber model. In addition to recaptures of marked animals, we model first captures of animals and losses on capture, and are thus able to estimate survival probabilities w (i.e., the complement of death or permanent emigration) and per capita growth rates f (i.e., the sum of recruitment and immigration rates). Covariation in these rates, a feature of demographic interest, is explicitly described in the model.
Equilibrium reconstruction in an iron core tokamak using a deterministic magnetisation model
NASA Astrophysics Data System (ADS)
Appel, L. C.; Lupelli, I.; JET Contributors
2018-02-01
In many tokamaks ferromagnetic material, usually referred to as an iron-core, is present in order to improve the magnetic coupling between the solenoid and the plasma. The presence of the iron core in proximity to the plasma changes the magnetic topology with consequent effects on the magnetic field structure and the plasma boundary. This paper considers the problem of obtaining the free-boundary plasma equilibrium solution in the presence of ferromagnetic material based on measured constraints. The current approach employs a model described by O'Brien et al. (1992) in which the magnetisation currents at the iron-air boundary are represented by a set of free parameters and appropriate boundary conditions are enforced via a set of quasi-measurements on the material boundary. This can lead to the possibility of overfitting the data and hiding underlying issues with the measured signals. Although the model typically achieves good fits to measured magnetic signals there are significant discrepancies in the inferred magnetic topology compared with other plasma diagnostic measurements that are independent of the magnetic field. An alternative approach for equilibrium reconstruction in iron-core tokamaks, termed the deterministic magnetisation model is developed and implemented in EFIT++. The iron is represented by a boundary current with the gradients in the magnetisation dipole state generating macroscopic internal magnetisation currents. A model for the boundary magnetisation currents at the iron-air interface is developed using B-Splines enabling continuity to arbitrary order; internal magnetisation currents are allocated to triangulated regions within the iron, and a method to enable adaptive refinement is implemented. The deterministic model has been validated by comparing it with a synthetic 2-D electromagnetic model of JET. It is established that the maximum field discrepancy is less than 1.5 mT throughout the vacuum region enclosing the plasma. The discrepancies of simulated magnetic probe signals are accurate to within 1% for signals with absolute magnitude greater than 100mT; in all other cases agreement is to within 1mT. The effect of neglecting the internal magnetisation currents increases the maximum discrepancy in the vacuum region to >20mT, resulting in errors of 5%-10% in the simulated probe signals. The fact that the previous model neglects the internal magnetisation currents (and also has additional free parameters when fitting the measured data) makes it unsuitable for analysing data in the absence of plasma current. The discrepancy of the poloidal magnetic flux within the vacuum vessel is to within 0.1Wb. Finally the deterministic model is applied to an equilibrium force-balance solution of a JET discharge using experimental data. It is shown that the discrepancies of the outboard separatrix position, and the outer strike-point position inferred from Thomson Scattering and Infrared camera data are much improved beyond the routine equilibrium reconstruction, whereas the discrepancy of the inner strike-point position is similar.
Krishnan, Neeraja M; Seligmann, Hervé; Stewart, Caro-Beth; De Koning, A P Jason; Pollock, David D
2004-10-01
Reconstruction of ancestral DNA and amino acid sequences is an important means of inferring information about past evolutionary events. Such reconstructions suggest changes in molecular function and evolutionary processes over the course of evolution and are used to infer adaptation and convergence. Maximum likelihood (ML) is generally thought to provide relatively accurate reconstructed sequences compared to parsimony, but both methods lead to the inference of multiple directional changes in nucleotide frequencies in primate mitochondrial DNA (mtDNA). To better understand this surprising result, as well as to better understand how parsimony and ML differ, we constructed a series of computationally simple "conditional pathway" methods that differed in the number of substitutions allowed per site along each branch, and we also evaluated the entire Bayesian posterior frequency distribution of reconstructed ancestral states. We analyzed primate mitochondrial cytochrome b (Cyt-b) and cytochrome oxidase subunit I (COI) genes and found that ML reconstructs ancestral frequencies that are often more different from tip sequences than are parsimony reconstructions. In contrast, frequency reconstructions based on the posterior ensemble more closely resemble extant nucleotide frequencies. Simulations indicate that these differences in ancestral sequence inference are probably due to deterministic bias caused by high uncertainty in the optimization-based ancestral reconstruction methods (parsimony, ML, Bayesian maximum a posteriori). In contrast, ancestral nucleotide frequencies based on an average of the Bayesian set of credible ancestral sequences are much less biased. The methods involving simpler conditional pathway calculations have slightly reduced likelihood values compared to full likelihood calculations, but they can provide fairly unbiased nucleotide reconstructions and may be useful in more complex phylogenetic analyses than considered here due to their speed and flexibility. To determine whether biased reconstructions using optimization methods might affect inferences of functional properties, ancestral primate mitochondrial tRNA sequences were inferred and helix-forming propensities for conserved pairs were evaluated in silico. For ambiguously reconstructed nucleotides at sites with high base composition variability, ancestral tRNA sequences from Bayesian analyses were more compatible with canonical base pairing than were those inferred by other methods. Thus, nucleotide bias in reconstructed sequences apparently can lead to serious bias and inaccuracies in functional predictions.
Iwata, Hiroaki; Mizutani, Sayaka; Tabei, Yasuo; Kotera, Masaaki; Goto, Susumu; Yamanishi, Yoshihiro
2013-01-01
Most phenotypic effects of drugs are involved in the interactions between drugs and their target proteins, however, our knowledge about the molecular mechanism of the drug-target interactions is very limited. One of challenging issues in recent pharmaceutical science is to identify the underlying molecular features which govern drug-target interactions. In this paper, we make a systematic analysis of the correlation between drug side effects and protein domains, which we call "pharmacogenomic features," based on the drug-target interaction network. We detect drug side effects and protein domains that appear jointly in known drug-target interactions, which is made possible by using classifiers with sparse models. It is shown that the inferred pharmacogenomic features can be used for predicting potential drug-target interactions. We also discuss advantages and limitations of the pharmacogenomic features, compared with the chemogenomic features that are the associations between drug chemical substructures and protein domains. The inferred side effect-domain association network is expected to be useful for estimating common drug side effects for different protein families and characteristic drug side effects for specific protein domains.
Huang, Yi-Fei; Golding, G Brian
2015-02-15
A number of statistical phylogenetic methods have been developed to infer conserved functional sites or regions in proteins. Many methods, e.g. Rate4Site, apply the standard phylogenetic models to infer site-specific substitution rates and totally ignore the spatial correlation of substitution rates in protein tertiary structures, which may reduce their power to identify conserved functional patches in protein tertiary structures when the sequences used in the analysis are highly similar. The 3D sliding window method has been proposed to infer conserved functional patches in protein tertiary structures, but the window size, which reflects the strength of the spatial correlation, must be predefined and is not inferred from data. We recently developed GP4Rate to solve these problems under the Bayesian framework. Unfortunately, GP4Rate is computationally slow. Here, we present an intuitive web server, FuncPatch, to perform a fast approximate Bayesian inference of conserved functional patches in protein tertiary structures. Both simulations and four case studies based on empirical data suggest that FuncPatch is a good approximation to GP4Rate. However, FuncPatch is orders of magnitudes faster than GP4Rate. In addition, simulations suggest that FuncPatch is potentially a useful tool complementary to Rate4Site, but the 3D sliding window method is less powerful than FuncPatch and Rate4Site. The functional patches predicted by FuncPatch in the four case studies are supported by experimental evidence, which corroborates the usefulness of FuncPatch. The software FuncPatch is freely available at the web site, http://info.mcmaster.ca/yifei/FuncPatch golding@mcmaster.ca Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
In silico prediction of protein-protein interactions in human macrophages
2014-01-01
Background Protein-protein interaction (PPI) network analyses are highly valuable in deciphering and understanding the intricate organisation of cellular functions. Nevertheless, the majority of available protein-protein interaction networks are context-less, i.e. without any reference to the spatial, temporal or physiological conditions in which the interactions may occur. In this work, we are proposing a protocol to infer the most likely protein-protein interaction (PPI) network in human macrophages. Results We integrated the PPI dataset from the Agile Protein Interaction DataAnalyzer (APID) with different meta-data to infer a contextualized macrophage-specific interactome using a combination of statistical methods. The obtained interactome is enriched in experimentally verified interactions and in proteins involved in macrophage-related biological processes (i.e. immune response activation, regulation of apoptosis). As a case study, we used the contextualized interactome to highlight the cellular processes induced upon Mycobacterium tuberculosis infection. Conclusion Our work confirms that contextualizing interactomes improves the biological significance of bioinformatic analyses. More specifically, studying such inferred network rather than focusing at the gene expression level only, is informative on the processes involved in the host response. Indeed, important immune features such as apoptosis are solely highlighted when the spotlight is on the protein interaction level. PMID:24636261
Causal reasoning with mental models
Khemlani, Sangeet S.; Barbey, Aron K.; Johnson-Laird, Philip N.
2014-01-01
This paper outlines the model-based theory of causal reasoning. It postulates that the core meanings of causal assertions are deterministic and refer to temporally-ordered sets of possibilities: A causes B to occur means that given A, B occurs, whereas A enables B to occur means that given A, it is possible for B to occur. The paper shows how mental models represent such assertions, and how these models underlie deductive, inductive, and abductive reasoning yielding explanations. It reviews evidence both to corroborate the theory and to account for phenomena sometimes taken to be incompatible with it. Finally, it reviews neuroscience evidence indicating that mental models for causal inference are implemented within lateral prefrontal cortex. PMID:25389398
Optimal nonlinear filtering using the finite-volume method
NASA Astrophysics Data System (ADS)
Fox, Colin; Morrison, Malcolm E. K.; Norton, Richard A.; Molteno, Timothy C. A.
2018-01-01
Optimal sequential inference, or filtering, for the state of a deterministic dynamical system requires simulation of the Frobenius-Perron operator, that can be formulated as the solution of a continuity equation. For low-dimensional, smooth systems, the finite-volume numerical method provides a solution that conserves probability and gives estimates that converge to the optimal continuous-time values, while a Courant-Friedrichs-Lewy-type condition assures that intermediate discretized solutions remain positive density functions. This method is demonstrated in an example of nonlinear filtering for the state of a simple pendulum, with comparison to results using the unscented Kalman filter, and for a case where rank-deficient observations lead to multimodal probability distributions.
Causal reasoning with mental models.
Khemlani, Sangeet S; Barbey, Aron K; Johnson-Laird, Philip N
2014-01-01
This paper outlines the model-based theory of causal reasoning. It postulates that the core meanings of causal assertions are deterministic and refer to temporally-ordered sets of possibilities: A causes B to occur means that given A, B occurs, whereas A enables B to occur means that given A, it is possible for B to occur. The paper shows how mental models represent such assertions, and how these models underlie deductive, inductive, and abductive reasoning yielding explanations. It reviews evidence both to corroborate the theory and to account for phenomena sometimes taken to be incompatible with it. Finally, it reviews neuroscience evidence indicating that mental models for causal inference are implemented within lateral prefrontal cortex.
Strain engineering of the silicon-vacancy center in diamond
NASA Astrophysics Data System (ADS)
Meesala, Srujan; Sohn, Young-Ik; Pingault, Benjamin; Shao, Linbo; Atikian, Haig A.; Holzgrafe, Jeffrey; Gündoǧan, Mustafa; Stavrakas, Camille; Sipahigil, Alp; Chia, Cleaven; Evans, Ruffin; Burek, Michael J.; Zhang, Mian; Wu, Lue; Pacheco, Jose L.; Abraham, John; Bielejec, Edward; Lukin, Mikhail D.; Atatüre, Mete; Lončar, Marko
2018-05-01
We control the electronic structure of the silicon-vacancy (SiV) color-center in diamond by changing its static strain environment with a nano-electro-mechanical system. This allows deterministic and local tuning of SiV optical and spin transition frequencies over a wide range, an essential step towards multiqubit networks. In the process, we infer the strain Hamiltonian of the SiV revealing large strain susceptibilities of order 1 PHz/strain for the electronic orbital states. We identify regimes where the spin-orbit interaction results in a large strain susceptibility of order 100 THz/strain for spin transitions, and propose an experiment where the SiV spin is strongly coupled to a nanomechanical resonator.
A null model for Pearson coexpression networks.
Gobbi, Andrea; Jurman, Giuseppe
2015-01-01
Gene coexpression networks inferred by correlation from high-throughput profiling such as microarray data represent simple but effective structures for discovering and interpreting linear gene relationships. In recent years, several approaches have been proposed to tackle the problem of deciding when the resulting correlation values are statistically significant. This is most crucial when the number of samples is small, yielding a non-negligible chance that even high correlation values are due to random effects. Here we introduce a novel hard thresholding solution based on the assumption that a coexpression network inferred by randomly generated data is expected to be empty. The threshold is theoretically derived by means of an analytic approach and, as a deterministic independent null model, it depends only on the dimensions of the starting data matrix, with assumptions on the skewness of the data distribution compatible with the structure of gene expression levels data. We show, on synthetic and array datasets, that the proposed threshold is effective in eliminating all false positive links, with an offsetting cost in terms of false negative detected edges.
Nondeterministic self-assembly of two tile types on a lattice.
Tesoro, S; Ahnert, S E
2016-04-01
Self-assembly is ubiquitous in nature, particularly in biology, where it underlies the formation of protein quaternary structure and protein aggregation. Quaternary structure assembles deterministically and performs a wide range of important functions in the cell, whereas protein aggregation is the hallmark of a number of diseases and represents a nondeterministic self-assembly process. Here we build on previous work on a lattice model of deterministic self-assembly to investigate nondeterministic self-assembly of single lattice tiles and mixtures of two tiles at varying relative concentrations. Despite limiting the simplicity of the model to two interface types, which results in 13 topologically distinct single tiles and 106 topologically distinct sets of two tiles, we observe a wide variety of concentration-dependent behaviors. Several two-tile sets display critical behaviors in the form of a sharp transition from bound to unbound structures as the relative concentration of one tile to another increases. Other sets exhibit gradual monotonic changes in structural density, or nonmonotonic changes, while again others show no concentration dependence at all. We catalog this extensive range of behaviors and present a model that provides a reasonably good estimate of the critical concentrations for a subset of the critical transitions. In addition, we show that the structures resulting from these tile sets are fractal, with one of two different fractal dimensions.
Protein 3D Structure Computed from Evolutionary Sequence Variation
Sheridan, Robert; Hopf, Thomas A.; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris
2011-01-01
The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes. PMID:22163331
Chen, Hsin-Ying; Chang, Joseph Tung-Chieh; Chien, Kun-Yi; Lee, Yun-Shien; You, Guo-Rung; Cheng, Ann-Joy
2018-01-11
Cell surface glucose regulated protein 78 (GRP78), an endoplasmic reticulum (ER) chaperone, was suggested to be a cancer stem cell marker, but the influence of this molecule on cancer stemness is poorly characterized. In this study, we developed a mass spectrometry platform to detect the endogenous interactome of GRP78 and investigated its role in cancer stemness. The interactome results showed that cell surface GRP78 associates with multiple molecules. The influence of cell population heterogeneity of head and neck cancer cell lines (OECM1, FaDu, and BM2) according to the cell surface expression levels of GRP78 and the GRP78 interactome protein, Progranulin, was investigated. The four sorted cell groups exhibited distinct cell cycle distributions, asymmetric/symmetric cell divisions, and different relative expression levels of stemness markers. Our results demonstrate that cell surface GRP78 promotes cancer stemness, whereas drives cells toward a non-stemlike phenotype when it chaperones Progranulin. We conclude that cell surface GRP78 is a chaperone exerting a deterministic influence on cancer stemness.
Cytoprophet: a Cytoscape plug-in for protein and domain interaction networks inference.
Morcos, Faruck; Lamanna, Charles; Sikora, Marcin; Izaguirre, Jesús
2008-10-01
Cytoprophet is a software tool that allows prediction and visualization of protein and domain interaction networks. It is implemented as a plug-in of Cytoscape, an open source software framework for analysis and visualization of molecular networks. Cytoprophet implements three algorithms that predict new potential physical interactions using the domain composition of proteins and experimental assays. The algorithms for protein and domain interaction inference include maximum likelihood estimation (MLE) using expectation maximization (EM); the set cover approach maximum specificity set cover (MSSC) and the sum-product algorithm (SPA). After accepting an input set of proteins with Uniprot ID/Accession numbers and a selected prediction algorithm, Cytoprophet draws a network of potential interactions with probability scores and GO distances as edge attributes. A network of domain interactions between the domains of the initial protein list can also be generated. Cytoprophet was designed to take advantage of the visual capabilities of Cytoscape and be simple to use. An example of inference in a signaling network of myxobacterium Myxococcus xanthus is presented and available at Cytoprophet's website. http://cytoprophet.cse.nd.edu.
Ashworth, Justin; Plaisier, Christopher L.; Lo, Fang Yin; Reiss, David J.; Baliga, Nitin S.
2014-01-01
Widespread microbial genome sequencing presents an opportunity to understand the gene regulatory networks of non-model organisms. This requires knowledge of the binding sites for transcription factors whose DNA-binding properties are unknown or difficult to infer. We adapted a protein structure-based method to predict the specificities and putative regulons of homologous transcription factors across diverse species. As a proof-of-concept we predicted the specificities and transcriptional target genes of divergent archaeal feast/famine regulatory proteins, several of which are encoded in the genome of Halobacterium salinarum. This was validated by comparison to experimentally determined specificities for transcription factors in distantly related extremophiles, chromatin immunoprecipitation experiments, and cis-regulatory sequence conservation across eighteen related species of halobacteria. Through this analysis we were able to infer that Halobacterium salinarum employs a divergent local trans-regulatory strategy to regulate genes (carA and carB) involved in arginine and pyrimidine metabolism, whereas Escherichia coli employs an operon. The prediction of gene regulatory binding sites using structure-based methods is useful for the inference of gene regulatory relationships in new species that are otherwise difficult to infer. PMID:25255272
Ashworth, Justin; Plaisier, Christopher L; Lo, Fang Yin; Reiss, David J; Baliga, Nitin S
2014-01-01
Widespread microbial genome sequencing presents an opportunity to understand the gene regulatory networks of non-model organisms. This requires knowledge of the binding sites for transcription factors whose DNA-binding properties are unknown or difficult to infer. We adapted a protein structure-based method to predict the specificities and putative regulons of homologous transcription factors across diverse species. As a proof-of-concept we predicted the specificities and transcriptional target genes of divergent archaeal feast/famine regulatory proteins, several of which are encoded in the genome of Halobacterium salinarum. This was validated by comparison to experimentally determined specificities for transcription factors in distantly related extremophiles, chromatin immunoprecipitation experiments, and cis-regulatory sequence conservation across eighteen related species of halobacteria. Through this analysis we were able to infer that Halobacterium salinarum employs a divergent local trans-regulatory strategy to regulate genes (carA and carB) involved in arginine and pyrimidine metabolism, whereas Escherichia coli employs an operon. The prediction of gene regulatory binding sites using structure-based methods is useful for the inference of gene regulatory relationships in new species that are otherwise difficult to infer.
How to talk about protein-level false discovery rates in shotgun proteomics.
The, Matthew; Tasnim, Ayesha; Käll, Lukas
2016-09-01
A frequently sought output from a shotgun proteomics experiment is a list of proteins that we believe to have been present in the analyzed sample before proteolytic digestion. The standard technique to control for errors in such lists is to enforce a preset threshold for the false discovery rate (FDR). Many consider protein-level FDRs a difficult and vague concept, as the measurement entities, spectra, are manifestations of peptides and not proteins. Here, we argue that this confusion is unnecessary and provide a framework on how to think about protein-level FDRs, starting from its basic principle: the null hypothesis. Specifically, we point out that two competing null hypotheses are used concurrently in today's protein inference methods, which has gone unnoticed by many. Using simulations of a shotgun proteomics experiment, we show how confusing one null hypothesis for the other can lead to serious discrepancies in the FDR. Furthermore, we demonstrate how the same simulations can be used to verify FDR estimates of protein inference methods. In particular, we show that, for a simple protein inference method, decoy models can be used to accurately estimate protein-level FDRs for both competing null hypotheses. © 2016 The Authors. Proteomics Published by Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Haas, Kevin R; Yang, Haw; Chu, Jhih-Wei
2013-12-12
The dynamics of a protein along a well-defined coordinate can be formally projected onto the form of an overdamped Lagevin equation. Here, we present a comprehensive statistical-learning framework for simultaneously quantifying the deterministic force (the potential of mean force, PMF) and the stochastic force (characterized by the diffusion coefficient, D) from single-molecule Förster-type resonance energy transfer (smFRET) experiments. The likelihood functional of the Langevin parameters, PMF and D, is expressed by a path integral of the latent smFRET distance that follows Langevin dynamics and realized by the donor and the acceptor photon emissions. The solution is made possible by an eigen decomposition of the time-symmetrized form of the corresponding Fokker-Planck equation coupled with photon statistics. To extract the Langevin parameters from photon arrival time data, we advance the expectation-maximization algorithm in statistical learning, originally developed for and mostly used in discrete-state systems, to a general form in the continuous space that allows for a variational calculus on the continuous PMF function. We also introduce the regularization of the solution space in this Bayesian inference based on a maximum trajectory-entropy principle. We use a highly nontrivial example with realistically simulated smFRET data to illustrate the application of this new method.
PAnalyzer: a software tool for protein inference in shotgun proteomics.
Prieto, Gorka; Aloria, Kerman; Osinalde, Nerea; Fullaondo, Asier; Arizmendi, Jesus M; Matthiesen, Rune
2012-11-05
Protein inference from peptide identifications in shotgun proteomics must deal with ambiguities that arise due to the presence of peptides shared between different proteins, which is common in higher eukaryotes. Recently data independent acquisition (DIA) approaches have emerged as an alternative to the traditional data dependent acquisition (DDA) in shotgun proteomics experiments. MSE is the term used to name one of the DIA approaches used in QTOF instruments. MSE data require specialized software to process acquired spectra and to perform peptide and protein identifications. However the software available at the moment does not group the identified proteins in a transparent way by taking into account peptide evidence categories. Furthermore the inspection, comparison and report of the obtained results require tedious manual intervention. Here we report a software tool to address these limitations for MSE data. In this paper we present PAnalyzer, a software tool focused on the protein inference process of shotgun proteomics. Our approach considers all the identified proteins and groups them when necessary indicating their confidence using different evidence categories. PAnalyzer can read protein identification files in the XML output format of the ProteinLynx Global Server (PLGS) software provided by Waters Corporation for their MSE data, and also in the mzIdentML format recently standardized by HUPO-PSI. Multiple files can also be read simultaneously and are considered as technical replicates. Results are saved to CSV, HTML and mzIdentML (in the case of a single mzIdentML input file) files. An MSE analysis of a real sample is presented to compare the results of PAnalyzer and ProteinLynx Global Server. We present a software tool to deal with the ambiguities that arise in the protein inference process. Key contributions are support for MSE data analysis by ProteinLynx Global Server and technical replicates integration. PAnalyzer is an easy to use multiplatform and free software tool.
PAnalyzer: A software tool for protein inference in shotgun proteomics
2012-01-01
Background Protein inference from peptide identifications in shotgun proteomics must deal with ambiguities that arise due to the presence of peptides shared between different proteins, which is common in higher eukaryotes. Recently data independent acquisition (DIA) approaches have emerged as an alternative to the traditional data dependent acquisition (DDA) in shotgun proteomics experiments. MSE is the term used to name one of the DIA approaches used in QTOF instruments. MSE data require specialized software to process acquired spectra and to perform peptide and protein identifications. However the software available at the moment does not group the identified proteins in a transparent way by taking into account peptide evidence categories. Furthermore the inspection, comparison and report of the obtained results require tedious manual intervention. Here we report a software tool to address these limitations for MSE data. Results In this paper we present PAnalyzer, a software tool focused on the protein inference process of shotgun proteomics. Our approach considers all the identified proteins and groups them when necessary indicating their confidence using different evidence categories. PAnalyzer can read protein identification files in the XML output format of the ProteinLynx Global Server (PLGS) software provided by Waters Corporation for their MSE data, and also in the mzIdentML format recently standardized by HUPO-PSI. Multiple files can also be read simultaneously and are considered as technical replicates. Results are saved to CSV, HTML and mzIdentML (in the case of a single mzIdentML input file) files. An MSE analysis of a real sample is presented to compare the results of PAnalyzer and ProteinLynx Global Server. Conclusions We present a software tool to deal with the ambiguities that arise in the protein inference process. Key contributions are support for MSE data analysis by ProteinLynx Global Server and technical replicates integration. PAnalyzer is an easy to use multiplatform and free software tool. PMID:23126499
Inferring protein domains associated with drug side effects based on drug-target interaction network
2013-01-01
Background Most phenotypic effects of drugs are involved in the interactions between drugs and their target proteins, however, our knowledge about the molecular mechanism of the drug-target interactions is very limited. One of challenging issues in recent pharmaceutical science is to identify the underlying molecular features which govern drug-target interactions. Results In this paper, we make a systematic analysis of the correlation between drug side effects and protein domains, which we call "pharmacogenomic features," based on the drug-target interaction network. We detect drug side effects and protein domains that appear jointly in known drug-target interactions, which is made possible by using classifiers with sparse models. It is shown that the inferred pharmacogenomic features can be used for predicting potential drug-target interactions. We also discuss advantages and limitations of the pharmacogenomic features, compared with the chemogenomic features that are the associations between drug chemical substructures and protein domains. Conclusion The inferred side effect-domain association network is expected to be useful for estimating common drug side effects for different protein families and characteristic drug side effects for specific protein domains. PMID:24565527
Algorithms for database-dependent search of MS/MS data.
Matthiesen, Rune
2013-01-01
The frequent used bottom-up strategy for identification of proteins and their associated modifications generate nowadays typically thousands of MS/MS spectra that normally are matched automatically against a protein sequence database. Search engines that take as input MS/MS spectra and a protein sequence database are referred as database-dependent search engines. Many programs both commercial and freely available exist for database-dependent search of MS/MS spectra and most of the programs have excellent user documentation. The aim here is therefore to outline the algorithm strategy behind different search engines rather than providing software user manuals. The process of database-dependent search can be divided into search strategy, peptide scoring, protein scoring, and finally protein inference. Most efforts in the literature have been put in to comparing results from different software rather than discussing the underlining algorithms. Such practical comparisons can be cluttered by suboptimal implementation and the observed differences are frequently caused by software parameters settings which have not been set proper to allow even comparison. In other words an algorithmic idea can still be worth considering even if the software implementation has been demonstrated to be suboptimal. The aim in this chapter is therefore to split the algorithms for database-dependent searching of MS/MS data into the above steps so that the different algorithmic ideas become more transparent and comparable. Most search engines provide good implementations of the first three data analysis steps mentioned above, whereas the final step of protein inference are much less developed for most search engines and is in many cases performed by an external software. The final part of this chapter illustrates how protein inference is built into the VEMS search engine and discusses a stand-alone program SIR for protein inference that can import a Mascot search result.
Stochastic flux analysis of chemical reaction networks
2013-01-01
Background Chemical reaction networks provide an abstraction scheme for a broad range of models in biology and ecology. The two common means for simulating these networks are the deterministic and the stochastic approaches. The traditional deterministic approach, based on differential equations, enjoys a rich set of analysis techniques, including a treatment of reaction fluxes. However, the discrete stochastic simulations, which provide advantages in some cases, lack a quantitative treatment of network fluxes. Results We describe a method for flux analysis of chemical reaction networks, where flux is given by the flow of species between reactions in stochastic simulations of the network. Extending discrete event simulation algorithms, our method constructs several data structures, and thereby reveals a variety of statistics about resource creation and consumption during the simulation. We use these structures to quantify the causal interdependence and relative importance of the reactions at arbitrary time intervals with respect to the network fluxes. This allows us to construct reduced networks that have the same flux-behavior, and compare these networks, also with respect to their time series. We demonstrate our approach on an extended example based on a published ODE model of the same network, that is, Rho GTP-binding proteins, and on other models from biology and ecology. Conclusions We provide a fully stochastic treatment of flux analysis. As in deterministic analysis, our method delivers the network behavior in terms of species transformations. Moreover, our stochastic analysis can be applied, not only at steady state, but at arbitrary time intervals, and used to identify the flow of specific species between specific reactions. Our cases study of Rho GTP-binding proteins reveals the role played by the cyclic reverse fluxes in tuning the behavior of this network. PMID:24314153
Stochastic flux analysis of chemical reaction networks.
Kahramanoğulları, Ozan; Lynch, James F
2013-12-07
Chemical reaction networks provide an abstraction scheme for a broad range of models in biology and ecology. The two common means for simulating these networks are the deterministic and the stochastic approaches. The traditional deterministic approach, based on differential equations, enjoys a rich set of analysis techniques, including a treatment of reaction fluxes. However, the discrete stochastic simulations, which provide advantages in some cases, lack a quantitative treatment of network fluxes. We describe a method for flux analysis of chemical reaction networks, where flux is given by the flow of species between reactions in stochastic simulations of the network. Extending discrete event simulation algorithms, our method constructs several data structures, and thereby reveals a variety of statistics about resource creation and consumption during the simulation. We use these structures to quantify the causal interdependence and relative importance of the reactions at arbitrary time intervals with respect to the network fluxes. This allows us to construct reduced networks that have the same flux-behavior, and compare these networks, also with respect to their time series. We demonstrate our approach on an extended example based on a published ODE model of the same network, that is, Rho GTP-binding proteins, and on other models from biology and ecology. We provide a fully stochastic treatment of flux analysis. As in deterministic analysis, our method delivers the network behavior in terms of species transformations. Moreover, our stochastic analysis can be applied, not only at steady state, but at arbitrary time intervals, and used to identify the flow of specific species between specific reactions. Our cases study of Rho GTP-binding proteins reveals the role played by the cyclic reverse fluxes in tuning the behavior of this network.
Statistical Inferences from Formaldehyde Dna-Protein Cross-Link Data
Physiologically-based pharmacokinetic (PBPK) modeling has reached considerable sophistication in its application in the pharmacological and environmental health areas. Yet, mature methodologies for making statistical inferences have not been routinely incorporated in these applic...
Chance, destiny, and the inner workings of ClpXP.
Russell, Rick; Matouschek, Andreas
2014-07-31
AAA+ proteases are responsible for protein degradation in all branches of life. Using single-molecule and ensemble assays, Cordova et al. investigate how the bacterial protease ClpXP steps through a substrate's polypeptide chain and construct a quantitative kinetic model that recapitulates the interplay between stochastic and deterministic behaviors of ClpXP. Copyright © 2014 Elsevier Inc. All rights reserved.
Protein-driven inference of miRNA–disease associations
Mørk, Søren; Pletscher-Frankild, Sune; Palleja Caro, Albert; Gorodkin, Jan; Jensen, Lars Juhl
2014-01-01
Motivation: MicroRNAs (miRNAs) are a highly abundant class of non-coding RNA genes involved in cellular regulation and thus also diseases. Despite miRNAs being important disease factors, miRNA–disease associations remain low in number and of variable reliability. Furthermore, existing databases and prediction methods do not explicitly facilitate forming hypotheses about the possible molecular causes of the association, thereby making the path to experimental follow-up longer. Results: Here we present miRPD in which miRNA–Protein–Disease associations are explicitly inferred. Besides linking miRNAs to diseases, it directly suggests the underlying proteins involved, which can be used to form hypotheses that can be experimentally tested. The inference of miRNAs and diseases is made by coupling known and predicted miRNA–protein associations with protein–disease associations text mined from the literature. We present scoring schemes that allow us to rank miRNA–disease associations inferred from both curated and predicted miRNA targets by reliability and thereby to create high- and medium-confidence sets of associations. Analyzing these, we find statistically significant enrichment for proteins involved in pathways related to cancer and type I diabetes mellitus, suggesting either a literature bias or a genuine biological trend. We show by example how the associations can be used to extract proteins for disease hypothesis. Availability and implementation: All datasets, software and a searchable Web site are available at http://mirpd.jensenlab.org. Contact: lars.juhl.jensen@cpr.ku.dk or gorodkin@rth.dk PMID:24273243
Inferring subunit stoichiometry from single molecule photobleaching
2013-01-01
Single molecule photobleaching is a powerful tool for determining the stoichiometry of protein complexes. By attaching fluorophores to proteins of interest, the number of associated subunits in a complex can be deduced by imaging single molecules and counting fluorophore photobleaching steps. Because some bleaching steps might be unobserved, the ensemble of steps will be binomially distributed. In this work, it is shown that inferring the true composition of a complex from such data is nontrivial because binomially distributed observations present an ill-posed inference problem. That is, a unique and optimal estimate of the relevant parameters cannot be extracted from the observations. Because of this, a method has not been firmly established to quantify confidence when using this technique. This paper presents a general inference model for interpreting such data and provides methods for accurately estimating parameter confidence. The formalization and methods presented here provide a rigorous analytical basis for this pervasive experimental tool. PMID:23712552
Morabia, Alfredo
2005-01-01
Epidemiological methods, which combine population thinking and group comparisons, can primarily identify causes of disease in populations. There is therefore a tension between our intuitive notion of a cause, which we want to be deterministic and invariant at the individual level, and the epidemiological notion of causes, which are invariant only at the population level. Epidemiologists have given heretofore a pragmatic solution to this tension. Causal inference in epidemiology consists in checking the logical coherence of a causality statement and determining whether what has been found grossly contradicts what we think we already know: how strong is the association? Is there a dose-response relationship? Does the cause precede the effect? Is the effect biologically plausible? Etc. This approach to causal inference can be traced back to the English philosophers David Hume and John Stuart Mill. On the other hand, the mode of establishing causality, devised by Jakob Henle and Robert Koch, which has been fruitful in bacteriology, requires that in every instance the effect invariably follows the cause (e.g., inoculation of Koch bacillus and tuberculosis). This is incompatible with epidemiological causality which has to deal with probabilistic effects (e.g., smoking and lung cancer), and is therefore invariant only for the population.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chittenden, J. P., E-mail: j.chittenden@imperial.ac.uk; Appelbe, B. D.; Manke, F.
2016-05-15
We present the results of 3D simulations of indirect drive inertial confinement fusion capsules driven by the “high-foot” radiation pulse on the National Ignition Facility. The results are post-processed using a semi-deterministic ray tracing model to generate synthetic deuterium-tritium (DT) and deuterium-deuterium (DD) neutron spectra as well as primary and down scattered neutron images. Results with low-mode asymmetries are used to estimate the magnitude of anisotropy in the neutron spectra shift, width, and shape. Comparisons of primary and down scattered images highlight the lack of alignment between the neutron sources, scatter sites, and detector plane, which limits the ability tomore » infer the ρr of the fuel from a down scattered ratio. Further calculations use high bandwidth multi-mode perturbations to induce multiple short scale length flows in the hotspot. The results indicate that the effect of fluid velocity is to produce a DT neutron spectrum with an apparently higher temperature than that inferred from the DD spectrum and which is also higher than the temperature implied by the DT to DD yield ratio.« less
Variational Bayesian identification and prediction of stochastic nonlinear dynamic causal models.
Daunizeau, J; Friston, K J; Kiebel, S J
2009-11-01
In this paper, we describe a general variational Bayesian approach for approximate inference on nonlinear stochastic dynamic models. This scheme extends established approximate inference on hidden-states to cover: (i) nonlinear evolution and observation functions, (ii) unknown parameters and (precision) hyperparameters and (iii) model comparison and prediction under uncertainty. Model identification or inversion entails the estimation of the marginal likelihood or evidence of a model. This difficult integration problem can be finessed by optimising a free-energy bound on the evidence using results from variational calculus. This yields a deterministic update scheme that optimises an approximation to the posterior density on the unknown model variables. We derive such a variational Bayesian scheme in the context of nonlinear stochastic dynamic hierarchical models, for both model identification and time-series prediction. The computational complexity of the scheme is comparable to that of an extended Kalman filter, which is critical when inverting high dimensional models or long time-series. Using Monte-Carlo simulations, we assess the estimation efficiency of this variational Bayesian approach using three stochastic variants of chaotic dynamic systems. We also demonstrate the model comparison capabilities of the method, its self-consistency and its predictive power.
Malmström, Erik; Kilsgård, Ola; Hauri, Simon; Smeds, Emanuel; Herwald, Heiko; Malmström, Lars; Malmström, Johan
2016-01-01
The plasma proteome is highly dynamic and variable, composed of proteins derived from surrounding tissues and cells. To investigate the complex processes that control the composition of the plasma proteome, we developed a mass spectrometry-based proteomics strategy to infer the origin of proteins detected in murine plasma. The strategy relies on the construction of a comprehensive protein tissue atlas from cells and highly vascularized organs using shotgun mass spectrometry. The protein tissue atlas was transformed to a spectral library for highly reproducible quantification of tissue-specific proteins directly in plasma using SWATH-like data-independent mass spectrometry analysis. We show that the method can determine drastic changes of tissue-specific protein profiles in blood plasma from mouse animal models with sepsis. The strategy can be extended to several other species advancing our understanding of the complex processes that contribute to the plasma proteome dynamics. PMID:26732734
How to talk about protein‐level false discovery rates in shotgun proteomics
The, Matthew; Tasnim, Ayesha
2016-01-01
A frequently sought output from a shotgun proteomics experiment is a list of proteins that we believe to have been present in the analyzed sample before proteolytic digestion. The standard technique to control for errors in such lists is to enforce a preset threshold for the false discovery rate (FDR). Many consider protein‐level FDRs a difficult and vague concept, as the measurement entities, spectra, are manifestations of peptides and not proteins. Here, we argue that this confusion is unnecessary and provide a framework on how to think about protein‐level FDRs, starting from its basic principle: the null hypothesis. Specifically, we point out that two competing null hypotheses are used concurrently in today's protein inference methods, which has gone unnoticed by many. Using simulations of a shotgun proteomics experiment, we show how confusing one null hypothesis for the other can lead to serious discrepancies in the FDR. Furthermore, we demonstrate how the same simulations can be used to verify FDR estimates of protein inference methods. In particular, we show that, for a simple protein inference method, decoy models can be used to accurately estimate protein‐level FDRs for both competing null hypotheses. PMID:27503675
We and others have shown that transition and maintenance of biological states is controlled by master regulator proteins, which can be inferred by interrogating tissue-specific regulatory models (interactomes) with transcriptional signatures, using the VIPER algorithm. Yet, some tissues may lack molecular profiles necessary for interactome inference (orphan tissues), or, as for single cells isolated from heterogeneous samples, their tissue context may be undetermined.
Successional convergence in experimentally disturbed intertidal communities.
Martins, Gustavo M; Arenas, Francisco; Tuya, Fernando; Ramírez, Rubén; Neto, Ana I; Jenkins, Stuart R
2018-02-01
Determining the causes of variation in community assembly is a central question in ecology. Analysis of β-diversity can provide insight by relating the extent of regional to local variation in diversity, allowing inference of the relative importance of deterministic versus stochastic processes. We investigated the effects of disturbance timing on community assembly at three distinct regions with varying environmental conditions: Northern Portugal, Azores and Canaries. On the lower rocky intertidal, quadrats were experimentally cleared of biota at three distinct times of the year and community assembly followed for 1 year. Similar levels of α- and γ-diversity were found in all regions, which remained constant throughout succession. When Jaccard (incidence-based) and Bray-Curtis (abundance-based) metrics were used, β-diversity (the mean dissimilarity among plots cleared at the different times) was larger during early stages of community assembly but decreased over time. The adaptation of the Raup-Crick's metric, which accounts for changes in species richness, showed that the structure of assemblages disturbed at different times of the year was similar to the null model of random community assembly during early stages of succession but became more similar than expected by chance. This pattern was observed in all regions despite differences in the regional species pool, suggesting that priority effects are likely weak and deterministic processes determine community structure despite stochasticity during early stages of community assembly.
Incorporating GIS and remote sensing for census population disaggregation
NASA Astrophysics Data System (ADS)
Wu, Shuo-Sheng'derek'
Census data are the primary source of demographic data for a variety of researches and applications. For confidentiality issues and administrative purposes, census data are usually released to the public by aggregated areal units. In the United States, the smallest census unit is census blocks. Due to data aggregation, users of census data may have problems in visualizing population distribution within census blocks and estimating population counts for areas not coinciding with census block boundaries. The main purpose of this study is to develop methodology for estimating sub-block areal populations and assessing the estimation errors. The City of Austin, Texas was used as a case study area. Based on tax parcel boundaries and parcel attributes derived from ancillary GIS and remote sensing data, detailed urban land use classes were first classified using a per-field approach. After that, statistical models by land use classes were built to infer population density from other predictor variables, including four census demographic statistics (the Hispanic percentage, the married percentage, the unemployment rate, and per capita income) and three physical variables derived from remote sensing images and building footprints vector data (a landscape heterogeneity statistics, a building pattern statistics, and a building volume statistics). In addition to statistical models, deterministic models were proposed to directly infer populations from building volumes and three housing statistics, including the average space per housing unit, the housing unit occupancy rate, and the average household size. After population models were derived or proposed, how well the models predict populations for another set of sample blocks was assessed. The results show that deterministic models were more accurate than statistical models. Further, by simulating the base unit for modeling from aggregating blocks, I assessed how well the deterministic models estimate sub-unit-level populations. I also assessed the aggregation effects and the resealing effects on sub-unit estimates. Lastly, from another set of mixed-land-use sample blocks, a mixed-land-use model was derived and compared with a residential-land-use model. The results of per-field land use classification are satisfactory with a Kappa accuracy statistics of 0.747. Model Assessments by land use show that population estimates for multi-family land use areas have higher errors than those for single-family land use areas, and population estimates for mixed land use areas have higher errors than those for residential land use areas. The assessments of sub-unit estimates using a simulation approach indicate that smaller areas show higher estimation errors, estimation errors do not relate to the base unit size, and resealing improves all levels of sub-unit estimates.
A self-agency bias in preschoolers' causal inferences
Kushnir, Tamar; Wellman, Henry M.; Gelman, Susan A.
2013-01-01
Preschoolers' causal learning from intentional actions – causal interventions – is subject to a self-agency bias. We propose that this bias is evidence-based; it is responsive to causal uncertainty. In the current studies, two causes (one child-controlled, one experimenter-controlled) were associated with one or two effects, first independently, then simultaneously. When initial independent effects were probabilistic, and thus subsequent simultaneous actions were causally ambiguous, children showed a self-agency bias. Children showed no bias when initial effects were deterministic. Further controls establish that children's self-agency bias is not a wholesale preference but rather is influenced by uncertainty in causal evidence. These results demonstrate that children's own experience of action influences their causal learning, and suggest possible benefits in uncertain and ambiguous everyday learning contexts. PMID:19271843
De novo inference of protein function from coarse-grained dynamics.
Bhadra, Pratiti; Pal, Debnath
2014-10-01
Inference of molecular function of proteins is the fundamental task in the quest for understanding cellular processes. The task is getting increasingly difficult with thousands of new proteins discovered each day. The difficulty arises primarily due to lack of high-throughput experimental technique for assessing protein molecular function, a lacunae that computational approaches are trying hard to fill. The latter too faces a major bottleneck in absence of clear evidence based on evolutionary information. Here we propose a de novo approach to annotate protein molecular function through structural dynamics match for a pair of segments from two dissimilar proteins, which may share even <10% sequence identity. To screen these matches, corresponding 1 µs coarse-grained (CG) molecular dynamics trajectories were used to compute normalized root-mean-square-fluctuation graphs and select mobile segments, which were, thereafter, matched for all pairs using unweighted three-dimensional autocorrelation vectors. Our in-house custom-built forcefield (FF), extensively validated against dynamics information obtained from experimental nuclear magnetic resonance data, was specifically used to generate the CG dynamics trajectories. The test for correspondence of dynamics-signature of protein segments and function revealed 87% true positive rate and 93.5% true negative rate, on a dataset of 60 experimentally validated proteins, including moonlighting proteins and those with novel functional motifs. A random test against 315 unique fold/function proteins for a negative test gave >99% true recall. A blind prediction on a novel protein appears consistent with additional evidences retrieved therein. This is the first proof-of-principle of generalized use of structural dynamics for inferring protein molecular function leveraging our custom-made CG FF, useful to all. © 2014 Wiley Periodicals, Inc.
Converting differential-equation models of biological systems to membrane computing.
Muniyandi, Ravie Chandren; Zin, Abdullah Mohd; Sanders, J W
2013-12-01
This paper presents a method to convert the deterministic, continuous representation of a biological system by ordinary differential equations into a non-deterministic, discrete membrane computation. The dynamics of the membrane computation is governed by rewrite rules operating at certain rates. That has the advantage of applying accurately to small systems, and to expressing rates of change that are determined locally, by region, but not necessary globally. Such spatial information augments the standard differentiable approach to provide a more realistic model. A biological case study of the ligand-receptor network of protein TGF-β is used to validate the effectiveness of the conversion method. It demonstrates the sense in which the behaviours and properties of the system are better preserved in the membrane computing model, suggesting that the proposed conversion method may prove useful for biological systems in particular. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Reinforce: An Ensemble Approach for Inferring PPI Network from AP-MS Data.
Tian, Bo; Duan, Qiong; Zhao, Can; Teng, Ben; He, Zengyou
2017-05-17
Affinity Purification-Mass Spectrometry (AP-MS) is one of the most important technologies for constructing protein-protein interaction (PPI) networks. In this paper, we propose an ensemble method, Reinforce, for inferring PPI network from AP-MS data set. The new algorithm named Reinforce is based on rank aggregation and false discovery rate control. Under the null hypothesis that the interaction scores from different scoring methods are randomly generated, Reinforce follows three steps to integrate multiple ranking results from different algorithms or different data sets. The experimental results show that Reinforce can get more stable and accurate inference results than existing algorithms. The source codes of Reinforce and data sets used in the experiments are available at: https://sourceforge.net/projects/reinforce/.
A Prize-Collecting Steiner Tree Approach for Transduction Network Inference
NASA Astrophysics Data System (ADS)
Bailly-Bechet, Marc; Braunstein, Alfredo; Zecchina, Riccardo
Into the cell, information from the environment is mainly propagated via signaling pathways which form a transduction network. Here we propose a new algorithm to infer transduction networks from heterogeneous data, using both the protein interaction network and expression datasets. We formulate the inference problem as an optimization task, and develop a message-passing, probabilistic and distributed formalism to solve it. We apply our algorithm to the pheromone response in the baker’s yeast S. cerevisiae. We are able to find the backbone of the known structure of the MAPK cascade of pheromone response, validating our algorithm. More importantly, we make biological predictions about some proteins whose role could be at the interface between pheromone response and other cellular functions.
Golan-Lavi, Roni; Giacomelli, Chiara; Fuks, Garold; Zeisel, Amit; Sonntag, Johanna; Sinha, Sanchari; Köstler, Wolfgang; Wiemann, Stefan; Korf, Ulrike; Yarden, Yosef; Domany, Eytan
2017-03-28
Protein responses to extracellular cues are governed by gene transcription, mRNA degradation and translation, and protein degradation. In order to understand how these time-dependent processes cooperate to generate dynamic responses, we analyzed the response of human mammary cells to the epidermal growth factor (EGF). Integrating time-dependent transcript and protein data into a mathematical model, we inferred for several proteins their pre-and post-stimulus translation and degradation coefficients and found that they exhibit complex, time-dependent variation. Specifically, we identified strategies of protein production and degradation acting in concert to generate rapid, transient protein bursts in response to EGF. Remarkably, for some proteins, for which the response necessitates rapidly decreased abundance, cells exhibit a transient increase in the corresponding degradation coefficient. Our model and analysis allow inference of the kinetics of mRNA translation and protein degradation, without perturbing cells, and open a way to understanding the fundamental processes governing time-dependent protein abundance profiles. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.
Evol and ProDy for bridging protein sequence evolution and structural dynamics
Mao, Wenzhi; Liu, Ying; Chennubhotla, Chakra; Lezon, Timothy R.; Bahar, Ivet
2014-01-01
Correlations between sequence evolution and structural dynamics are of utmost importance in understanding the molecular mechanisms of function and their evolution. We have integrated Evol, a new package for fast and efficient comparative analysis of evolutionary patterns and conformational dynamics, into ProDy, a computational toolbox designed for inferring protein dynamics from experimental and theoretical data. Using information-theoretic approaches, Evol coanalyzes conservation and coevolution profiles extracted from multiple sequence alignments of protein families with their inferred dynamics. Availability and implementation: ProDy and Evol are open-source and freely available under MIT License from http://prody.csb.pitt.edu/. Contact: bahar@pitt.edu PMID:24849577
NASA Astrophysics Data System (ADS)
Weigt, Martin
Over the last years, biological research has been revolutionized by experimental high-throughput techniques, in particular by next-generation sequencing technology. Unprecedented amounts of data are accumulating, and there is a growing request for computational methods unveiling the information hidden in raw data, thereby increasing our understanding of complex biological systems. Statistical-physics models based on the maximum-entropy principle have, in the last few years, played an important role in this context. To give a specific example, proteins and many non-coding RNA show a remarkable degree of structural and functional conservation in the course of evolution, despite a large variability in amino acid sequences. We have developed a statistical-mechanics inspired inference approach - called Direct-Coupling Analysis - to link this sequence variability (easy to observe in sequence alignments, which are available in public sequence databases) to bio-molecular structure and function. In my presentation I will show, how this methodology can be used (i) to infer contacts between residues and thus to guide tertiary and quaternary protein structure prediction and RNA structure prediction, (ii) to discriminate interacting from non-interacting protein families, and thus to infer conserved protein-protein interaction networks, and (iii) to reconstruct mutational landscapes and thus to predict the phenotypic effect of mutations. References [1] M. Figliuzzi, H. Jacquier, A. Schug, O. Tenaillon and M. Weigt ''Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1'', Mol. Biol. Evol. (2015), doi: 10.1093/molbev/msv211 [2] E. De Leonardis, B. Lutz, S. Ratz, S. Cocco, R. Monasson, A. Schug, M. Weigt ''Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction'', Nucleic Acids Research (2015), doi: 10.1093/nar/gkv932 [3] F. Morcos, A. Pagnani, B. Lunt, A. Bertolino, D. Marks, C. Sander, R. Zecchina, J.N. Onuchic, T. Hwa, M. Weigt, ''Direct-coupling analysis of residue co-evolution captures native contacts across many protein families'', Proc. Natl. Acad. Sci. 108, E1293-E1301 (2011).
Prediction of virus-host protein-protein interactions mediated by short linear motifs.
Becerra, Andrés; Bucheli, Victor A; Moreno, Pedro A
2017-03-09
Short linear motifs in host organisms proteins can be mimicked by viruses to create protein-protein interactions that disable or control metabolic pathways. Given that viral linear motif instances of host motif regular expressions can be found by chance, it is necessary to develop filtering methods of functional linear motifs. We conduct a systematic comparison of linear motifs filtering methods to develop a computational approach for predicting motif-mediated protein-protein interactions between human and the human immunodeficiency virus 1 (HIV-1). We implemented three filtering methods to obtain linear motif sets: 1) conserved in viral proteins (C), 2) located in disordered regions (D) and 3) rare or scarce in a set of randomized viral sequences (R). The sets C,D,R are united and intersected. The resulting sets are compared by the number of protein-protein interactions correctly inferred with them - with experimental validation. The comparison is done with HIV-1 sequences and interactions from the National Institute of Allergy and Infectious Diseases (NIAID). The number of correctly inferred interactions allows to rank the interactions by the sets used to deduce them: D∪R and C. The ordering of the sets is descending on the probability of capturing functional interactions. With respect to HIV-1, the sets C∪R, D∪R, C∪D∪R infer all known interactions between HIV1 and human proteins mediated by linear motifs. We found that the majority of conserved linear motifs in the virus are located in disordered regions. We have developed a method for predicting protein-protein interactions mediated by linear motifs between HIV-1 and human proteins. The method only use protein sequences as inputs. We can extend the software developed to any other eukaryotic virus and host in order to find and rank candidate interactions. In future works we will use it to explore possible viral attack mechanisms based on linear motif mimicry.
Relative evolutionary rate inference in HyPhy with LEISR.
Spielman, Stephanie J; Kosakovsky Pond, Sergei L
2018-01-01
We introduce LEISR (Likehood Estimation of Individual Site Rates, pronounced "laser"), a tool to infer relative evolutionary rates from protein and nucleotide data, implemented in HyPhy. LEISR is based on the popular Rate4Site (Pupko et al., 2002) approach for inferring relative site-wise evolutionary rates, primarily from protein data. We extend the original method for more general use in several key ways: (i) we increase the support for nucleotide data with additional models, (ii) we allow for datasets of arbitrary size, (iii) we support analysis of site-partitioned datasets to correct for the presence of recombination breakpoints, (iv) we produce rate estimates at all sites rather than at just a subset of sites, and (v) we implemented LEISR as MPI-enabled to support rapid, high-throughput analysis. LEISR is available in HyPhy starting with version 2.3.8, and it is accessible as an option in the HyPhy analysis menu ("Relative evolutionary rate inference"), which calls the HyPhy batchfile LEISR.bf.
NASA Astrophysics Data System (ADS)
Shekhar, Karthik; Ruberman, Claire F.; Ferguson, Andrew L.; Barton, John P.; Kardar, Mehran; Chakraborty, Arup K.
2013-12-01
Mutational escape from vaccine-induced immune responses has thwarted the development of a successful vaccine against AIDS, whose causative agent is HIV, a highly mutable virus. Knowing the virus' fitness as a function of its proteomic sequence can enable rational design of potent vaccines, as this information can focus vaccine-induced immune responses to target mutational vulnerabilities of the virus. Spin models have been proposed as a means to infer intrinsic fitness landscapes of HIV proteins from patient-derived viral protein sequences. These sequences are the product of nonequilibrium viral evolution driven by patient-specific immune responses and are subject to phylogenetic constraints. How can such sequence data allow inference of intrinsic fitness landscapes? We combined computer simulations and variational theory á la Feynman to show that, in most circumstances, spin models inferred from patient-derived viral sequences reflect the correct rank order of the fitness of mutant viral strains. Our findings are relevant for diverse viruses.
A Computational Framework for Analyzing Stochasticity in Gene Expression
Sherman, Marc S.; Cohen, Barak A.
2014-01-01
Stochastic fluctuations in gene expression give rise to distributions of protein levels across cell populations. Despite a mounting number of theoretical models explaining stochasticity in protein expression, we lack a robust, efficient, assumption-free approach for inferring the molecular mechanisms that underlie the shape of protein distributions. Here we propose a method for inferring sets of biochemical rate constants that govern chromatin modification, transcription, translation, and RNA and protein degradation from stochasticity in protein expression. We asked whether the rates of these underlying processes can be estimated accurately from protein expression distributions, in the absence of any limiting assumptions. To do this, we (1) derived analytical solutions for the first four moments of the protein distribution, (2) found that these four moments completely capture the shape of protein distributions, and (3) developed an efficient algorithm for inferring gene expression rate constants from the moments of protein distributions. Using this algorithm we find that most protein distributions are consistent with a large number of different biochemical rate constant sets. Despite this degeneracy, the solution space of rate constants almost always informs on underlying mechanism. For example, we distinguish between regimes where transcriptional bursting occurs from regimes reflecting constitutive transcript production. Our method agrees with the current standard approach, and in the restrictive regime where the standard method operates, also identifies rate constants not previously obtainable. Even without making any assumptions we obtain estimates of individual biochemical rate constants, or meaningful ratios of rate constants, in 91% of tested cases. In some cases our method identified all of the underlying rate constants. The framework developed here will be a powerful tool for deducing the contributions of particular molecular mechanisms to specific patterns of gene expression. PMID:24811315
Wang, Wei; Xia, Minxuan; Chen, Jie; Deng, Fenni; Yuan, Rui; Zhang, Xiaopei; Shen, Fafu
2016-12-01
The data presented in this paper is supporting the research article "Genome-Wide Analysis of Superoxide Dismutase Gene Family in Gossypium raimondii and G. arboreum" [1]. In this data article, we present phylogenetic tree showing dichotomy with two different clusters of SODs inferred by the Bayesian method of MrBayes (version 3.2.4), "Bayesian phylogenetic inference under mixed models" [2], Ramachandran plots of G. raimondii and G. arboreum SODs, the protein sequence used to generate 3D sructure of proteins and the template accession via SWISS-MODEL server, "SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information." [3] and motif sequences of SODs identified by InterProScan (version 4.8) with the Pfam database, "Pfam: the protein families database" [4].
RAIN: RNA–protein Association and Interaction Networks
Junge, Alexander; Refsgaard, Jan C.; Garde, Christian; Pan, Xiaoyong; Santos, Alberto; Alkan, Ferhat; Anthon, Christian; von Mering, Christian; Workman, Christopher T.; Jensen, Lars Juhl; Gorodkin, Jan
2017-01-01
Protein association networks can be inferred from a range of resources including experimental data, literature mining and computational predictions. These types of evidence are emerging for non-coding RNAs (ncRNAs) as well. However, integration of ncRNAs into protein association networks is challenging due to data heterogeneity. Here, we present a database of ncRNA–RNA and ncRNA–protein interactions and its integration with the STRING database of protein–protein interactions. These ncRNA associations cover four organisms and have been established from curated examples, experimental data, interaction predictions and automatic literature mining. RAIN uses an integrative scoring scheme to assign a confidence score to each interaction. We demonstrate that RAIN outperforms the underlying microRNA-target predictions in inferring ncRNA interactions. RAIN can be operated through an easily accessible web interface and all interaction data can be downloaded. Database URL: http://rth.dk/resources/rain PMID:28077569
Bayesian Model Selection in Geophysics: The evidence
NASA Astrophysics Data System (ADS)
Vrugt, J. A.
2016-12-01
Bayesian inference has found widespread application and use in science and engineering to reconcile Earth system models with data, including prediction in space (interpolation), prediction in time (forecasting), assimilation of observations and deterministic/stochastic model output, and inference of the model parameters. Per Bayes theorem, the posterior probability, , P(H|D), of a hypothesis, H, given the data D, is equivalent to the product of its prior probability, P(H), and likelihood, L(H|D), divided by a normalization constant, P(D). In geophysics, the hypothesis, H, often constitutes a description (parameterization) of the subsurface for some entity of interest (e.g. porosity, moisture content). The normalization constant, P(D), is not required for inference of the subsurface structure, yet of great value for model selection. Unfortunately, it is not particularly easy to estimate P(D) in practice. Here, I will introduce the various building blocks of a general purpose method which provides robust and unbiased estimates of the evidence, P(D). This method uses multi-dimensional numerical integration of the posterior (parameter) distribution. I will then illustrate this new estimator by application to three competing subsurface models (hypothesis) using GPR travel time data from the South Oyster Bacterial Transport Site, in Virginia, USA. The three subsurface models differ in their treatment of the porosity distribution and use (a) horizontal layering with fixed layer thicknesses, (b) vertical layering with fixed layer thicknesses and (c) a multi-Gaussian field. The results of the new estimator are compared against the brute force Monte Carlo method, and the Laplace-Metropolis method.
Quantifying selection in evolving populations using time-resolved genetic data
NASA Astrophysics Data System (ADS)
Illingworth, Christopher J. R.; Mustonen, Ville
2013-01-01
Methods which uncover the molecular basis of the adaptive evolution of a population address some important biological questions. For example, the problem of identifying genetic variants which underlie drug resistance, a question of importance for the treatment of pathogens, and of cancer, can be understood as a matter of inferring selection. One difficulty in the inference of variants under positive selection is the potential complexity of the underlying evolutionary dynamics, which may involve an interplay between several contributing processes, including mutation, recombination and genetic drift. A source of progress may be found in modern sequencing technologies, which confer an increasing ability to gather information about evolving populations, granting a window into these complex processes. One particularly interesting development is the ability to follow evolution as it happens, by whole-genome sequencing of an evolving population at multiple time points. We here discuss how to use time-resolved sequence data to draw inferences about the evolutionary dynamics of a population under study. We begin by reviewing our earlier analysis of a yeast selection experiment, in which we used a deterministic evolutionary framework to identify alleles under selection for heat tolerance, and to quantify the selection acting upon them. Considering further the use of advanced intercross lines to measure selection, we here extend this framework to cover scenarios of simultaneous recombination and selection, and of two driver alleles with multiple linked neutral, or passenger, alleles, where the driver pair evolves under an epistatic fitness landscape. We conclude by discussing the limitations of the approach presented and outlining future challenges for such methodologies.
Deterministic chaotic dynamics of Raba River flow (Polish Carpathian Mountains)
NASA Astrophysics Data System (ADS)
Kędra, Mariola
2014-02-01
Is the underlying dynamics of river flow random or deterministic? If it is deterministic, is it deterministic chaotic? This issue is still controversial. The application of several independent methods, techniques and tools for studying daily river flow data gives consistent, reliable and clear-cut results to the question. The outcomes point out that the investigated discharge dynamics is not random but deterministic. Moreover, the results completely confirm the nonlinear deterministic chaotic nature of the studied process. The research was conducted on daily discharge from two selected gauging stations of the mountain river in southern Poland, the Raba River.
Drichoutis, Andreas C.; Lusk, Jayson L.
2014-01-01
Despite the fact that conceptual models of individual decision making under risk are deterministic, attempts to econometrically estimate risk preferences require some assumption about the stochastic nature of choice. Unfortunately, the consequences of making different assumptions are, at present, unclear. In this paper, we compare three popular error specifications (Fechner, contextual utility, and Luce error) for three different preference functionals (expected utility, rank-dependent utility, and a mixture of those two) using in- and out-of-sample selection criteria. We find drastically different inferences about structural risk preferences across the competing functionals and error specifications. Expected utility theory is least affected by the selection of the error specification. A mixture model combining the two conceptual models assuming contextual utility provides the best fit of the data both in- and out-of-sample. PMID:25029467
Drichoutis, Andreas C; Lusk, Jayson L
2014-01-01
Despite the fact that conceptual models of individual decision making under risk are deterministic, attempts to econometrically estimate risk preferences require some assumption about the stochastic nature of choice. Unfortunately, the consequences of making different assumptions are, at present, unclear. In this paper, we compare three popular error specifications (Fechner, contextual utility, and Luce error) for three different preference functionals (expected utility, rank-dependent utility, and a mixture of those two) using in- and out-of-sample selection criteria. We find drastically different inferences about structural risk preferences across the competing functionals and error specifications. Expected utility theory is least affected by the selection of the error specification. A mixture model combining the two conceptual models assuming contextual utility provides the best fit of the data both in- and out-of-sample.
USDA-ARS?s Scientific Manuscript database
The role of PROTEIN ISOASPARTYL-METHYLTRANSFERASE (PIMT) in repairing a wide assortment of damaged proteins in a host of organisms has been inferred from the affinity of the enzyme for isoaspartyl residues in a plethora of amino acid contexts. The identification of specific PIMT target proteins in p...
Characterizing the topology of probabilistic biological networks.
Todor, Andrei; Dobra, Alin; Kahveci, Tamer
2013-01-01
Biological interactions are often uncertain events, that may or may not take place with some probability. This uncertainty leads to a massive number of alternative interaction topologies for each such network. The existing studies analyze the degree distribution of biological networks by assuming that all the given interactions take place under all circumstances. This strong and often incorrect assumption can lead to misleading results. In this paper, we address this problem and develop a sound mathematical basis to characterize networks in the presence of uncertain interactions. Using our mathematical representation, we develop a method that can accurately describe the degree distribution of such networks. We also take one more step and extend our method to accurately compute the joint-degree distributions of node pairs connected by edges. The number of possible network topologies grows exponentially with the number of uncertain interactions. However, the mathematical model we develop allows us to compute these degree distributions in polynomial time in the number of interactions. Our method works quickly even for entire protein-protein interaction (PPI) networks. It also helps us find an adequate mathematical model using MLE. We perform a comparative study of node-degree and joint-degree distributions in two types of biological networks: the classical deterministic networks and the more flexible probabilistic networks. Our results confirm that power-law and log-normal models best describe degree distributions for both probabilistic and deterministic networks. Moreover, the inverse correlation of degrees of neighboring nodes shows that, in probabilistic networks, nodes with large number of interactions prefer to interact with those with small number of interactions more frequently than expected. We also show that probabilistic networks are more robust for node-degree distribution computation than the deterministic ones. all the data sets used, the software implemented and the alignments found in this paper are available at http://bioinformatics.cise.ufl.edu/projects/probNet/.
A white-box model of S-shaped and double S-shaped single-species population growth
Kalmykov, Lev V.
2015-01-01
Complex systems may be mechanistically modelled by white-box modeling with using logical deterministic individual-based cellular automata. Mathematical models of complex systems are of three types: black-box (phenomenological), white-box (mechanistic, based on the first principles) and grey-box (mixtures of phenomenological and mechanistic models). Most basic ecological models are of black-box type, including Malthusian, Verhulst, Lotka–Volterra models. In black-box models, the individual-based (mechanistic) mechanisms of population dynamics remain hidden. Here we mechanistically model the S-shaped and double S-shaped population growth of vegetatively propagated rhizomatous lawn grasses. Using purely logical deterministic individual-based cellular automata we create a white-box model. From a general physical standpoint, the vegetative propagation of plants is an analogue of excitation propagation in excitable media. Using the Monte Carlo method, we investigate a role of different initial positioning of an individual in the habitat. We have investigated mechanisms of the single-species population growth limited by habitat size, intraspecific competition, regeneration time and fecundity of individuals in two types of boundary conditions and at two types of fecundity. Besides that, we have compared the S-shaped and J-shaped population growth. We consider this white-box modeling approach as a method of artificial intelligence which works as automatic hyper-logical inference from the first principles of the studied subject. This approach is perspective for direct mechanistic insights into nature of any complex systems. PMID:26038717
NASA Astrophysics Data System (ADS)
Fischer, P.; Jardani, A.; Wang, X.; Jourde, H.; Lecoq, N.
2017-12-01
The distributed modeling of flow paths within karstic and fractured fields remains a complex task because of the high dependence of the hydraulic responses to the relative locations between observational boreholes and interconnected fractures and karstic conduits that control the main flow of the hydrosystem. The inverse problem in a distributed model is one alternative approach to interpret the hydraulic test data by mapping the karstic networks and fractured areas. In this work, we developed a Bayesian inversion approach, the Cellular Automata-based Deterministic Inversion (CADI) algorithm to infer the spatial distribution of hydraulic properties in a structurally constrained model. This method distributes hydraulic properties along linear structures (i.e., flow conduits) and iteratively modifies the structural geometry of this conduit network to progressively match the observed hydraulic data to the modeled ones. As a result, this method produces a conductivity model that is composed of a discrete conduit network embedded in the background matrix, capable of producing the same flow behavior as the investigated hydrologic system. The method is applied to invert a set of multiborehole hydraulic tests collected from a hydraulic tomography experiment conducted at the Terrieu field site in the Lez aquifer, Southern France. The emergent model shows a high consistency to field observation of hydraulic connections between boreholes. Furthermore, it provides a geologically realistic pattern of flow conduits. This method is therefore of considerable value toward an enhanced distributed modeling of the fractured and karstified aquifers.
Towards Inferring Protein Interactions: Challenges and Solutions
NASA Astrophysics Data System (ADS)
Zhang, Ya; Zha, Hongyuan; Chu, Chao-Hsien; Ji, Xiang
2006-12-01
Discovering interacting proteins has been an essential part of functional genomics. However, existing experimental techniques only uncover a small portion of any interactome. Furthermore, these data often have a very high false rate. By conceptualizing the interactions at domain level, we provide a more abstract representation of interactome, which also facilitates the discovery of unobserved protein-protein interactions. Although several domain-based approaches have been proposed to predict protein-protein interactions, they usually assume that domain interactions are independent on each other for the convenience of computational modeling. A new framework to predict protein interactions is proposed in this paper, where no assumption is made about domain interactions. Protein interactions may be the result of multiple domain interactions which are dependent on each other. A conjunctive norm form representation is used to capture the relationships between protein interactions and domain interactions. The problem of interaction inference is then modeled as a constraint satisfiability problem and solved via linear programing. Experimental results on a combined yeast data set have demonstrated the robustness and the accuracy of the proposed algorithm. Moreover, we also map some predicted interacting domains to three-dimensional structures of protein complexes to show the validity of our predictions.
Prophetic Granger Causality to infer gene regulatory networks.
Carlin, Daniel E; Paull, Evan O; Graim, Kiley; Wong, Christopher K; Bivol, Adrian; Ryabinin, Peter; Ellrott, Kyle; Sokolov, Artem; Stuart, Joshua M
2017-01-01
We introduce a novel method called Prophetic Granger Causality (PGC) for inferring gene regulatory networks (GRNs) from protein-level time series data. The method uses an L1-penalized regression adaptation of Granger Causality to model protein levels as a function of time, stimuli, and other perturbations. When combined with a data-independent network prior, the framework outperformed all other methods submitted to the HPN-DREAM 8 breast cancer network inference challenge. Our investigations reveal that PGC provides complementary information to other approaches, raising the performance of ensemble learners, while on its own achieves moderate performance. Thus, PGC serves as a valuable new tool in the bioinformatics toolkit for analyzing temporal datasets. We investigate the general and cell-specific interactions predicted by our method and find several novel interactions, demonstrating the utility of the approach in charting new tumor wiring.
Prophetic Granger Causality to infer gene regulatory networks
Carlin, Daniel E.; Paull, Evan O.; Graim, Kiley; Wong, Christopher K.; Bivol, Adrian; Ryabinin, Peter; Ellrott, Kyle; Sokolov, Artem
2017-01-01
We introduce a novel method called Prophetic Granger Causality (PGC) for inferring gene regulatory networks (GRNs) from protein-level time series data. The method uses an L1-penalized regression adaptation of Granger Causality to model protein levels as a function of time, stimuli, and other perturbations. When combined with a data-independent network prior, the framework outperformed all other methods submitted to the HPN-DREAM 8 breast cancer network inference challenge. Our investigations reveal that PGC provides complementary information to other approaches, raising the performance of ensemble learners, while on its own achieves moderate performance. Thus, PGC serves as a valuable new tool in the bioinformatics toolkit for analyzing temporal datasets. We investigate the general and cell-specific interactions predicted by our method and find several novel interactions, demonstrating the utility of the approach in charting new tumor wiring. PMID:29211761
Erdem, Cemal; Nagle, Alison M.; Casa, Angelo J.; Litzenburger, Beate C.; Wang, Yu-fen; Taylor, D. Lansing; Lee, Adrian V.; Lezon, Timothy R.
2016-01-01
Insulin and insulin-like growth factor I (IGF1) influence cancer risk and progression through poorly understood mechanisms. To better understand the roles of insulin and IGF1 signaling in breast cancer, we combined proteomic screening with computational network inference to uncover differences in IGF1 and insulin induced signaling. Using reverse phase protein array, we measured the levels of 134 proteins in 21 breast cancer cell lines stimulated with IGF1 or insulin for up to 48 h. We then constructed directed protein expression networks using three separate methods: (i) lasso regression, (ii) conventional matrix inversion, and (iii) entropy maximization. These networks, named here as the time translation models, were analyzed and the inferred interactions were ranked by differential magnitude to identify pathway differences. The two top candidates, chosen for experimental validation, were shown to regulate IGF1/insulin induced phosphorylation events. First, acetyl-CoA carboxylase (ACC) knock-down was shown to increase the level of mitogen-activated protein kinase (MAPK) phosphorylation. Second, stable knock-down of E-Cadherin increased the phospho-Akt protein levels. Both of the knock-down perturbations incurred phosphorylation responses stronger in IGF1 stimulated cells compared with insulin. Overall, the time-translation modeling coupled to wet-lab experiments has proven to be powerful in inferring differential interactions downstream of IGF1 and insulin signaling, in vitro. PMID:27364358
Protein-based forensic identification using genetically variant peptides in human bone.
Mason, Katelyn Elizabeth; Anex, Deon; Grey, Todd; Hart, Bradley; Parker, Glendon
2018-04-22
Bone tissue contains organic material that is useful for forensic investigations and may contain preserved endogenous protein that can persist in the environment for extended periods of time over a range of conditions. Single amino acid polymorphisms in these proteins reflect genetic information since they result from non-synonymous single nucleotide polymorphisms (SNPs) in DNA. Detection of genetically variant peptides (GVPs) - those peptides that contain amino acid polymorphisms - in digests of bone proteins allows for the corresponding SNP alleles to be inferred. Resulting genetic profiles can be used to calculate statistical measures of association between a bone sample and an individual. In this study proteomic analysis on rib cortical bone samples from 10 recently deceased individuals demonstrates this concept. A straight-forward acidic demineralization protocol yielded proteins that were digested with trypsin. Tryptic digests were analyzed by liquid chromatography mass spectrometry. A total of 1736 different proteins were identified across all resulting datasets. On average, individual samples contained 454±121 (x¯±σ) proteins. Thirty-five genetically variant peptides were identified from 15 observed proteins. Overall, 134 SNP inferences were made based on proteomically detected GVPs, which were confirmed by sequencing of subject DNA. Inferred individual SNP genetic profiles ranged in random match probability (RMP) from 1/6 to 1/42,472 when calculated with European population frequencies in the 1000 Genomes Project, Phase 3. Similarly, RMPs based on African population frequencies were calculated for each SNP genetic profile and likelihood ratios (LR) were obtained by dividing each European RMP by the corresponding African RMP. Resulting LR values ranged from 1.4 to 825 with a median value of 16. GVP markers offer a basis for the identification of compromised skeletal remains independent of the presence of DNA template. Published by Elsevier B.V.
Nonparametric Inference of Doubly Stochastic Poisson Process Data via the Kernel Method
Zhang, Tingting; Kou, S. C.
2010-01-01
Doubly stochastic Poisson processes, also known as the Cox processes, frequently occur in various scientific fields. In this article, motivated primarily by analyzing Cox process data in biophysics, we propose a nonparametric kernel-based inference method. We conduct a detailed study, including an asymptotic analysis, of the proposed method, and provide guidelines for its practical use, introducing a fast and stable regression method for bandwidth selection. We apply our method to real photon arrival data from recent single-molecule biophysical experiments, investigating proteins' conformational dynamics. Our result shows that conformational fluctuation is widely present in protein systems, and that the fluctuation covers a broad range of time scales, highlighting the dynamic and complex nature of proteins' structure. PMID:21258615
Nonparametric Inference of Doubly Stochastic Poisson Process Data via the Kernel Method.
Zhang, Tingting; Kou, S C
2010-01-01
Doubly stochastic Poisson processes, also known as the Cox processes, frequently occur in various scientific fields. In this article, motivated primarily by analyzing Cox process data in biophysics, we propose a nonparametric kernel-based inference method. We conduct a detailed study, including an asymptotic analysis, of the proposed method, and provide guidelines for its practical use, introducing a fast and stable regression method for bandwidth selection. We apply our method to real photon arrival data from recent single-molecule biophysical experiments, investigating proteins' conformational dynamics. Our result shows that conformational fluctuation is widely present in protein systems, and that the fluctuation covers a broad range of time scales, highlighting the dynamic and complex nature of proteins' structure.
Evol and ProDy for bridging protein sequence evolution and structural dynamics.
Bakan, Ahmet; Dutta, Anindita; Mao, Wenzhi; Liu, Ying; Chennubhotla, Chakra; Lezon, Timothy R; Bahar, Ivet
2014-09-15
Correlations between sequence evolution and structural dynamics are of utmost importance in understanding the molecular mechanisms of function and their evolution. We have integrated Evol, a new package for fast and efficient comparative analysis of evolutionary patterns and conformational dynamics, into ProDy, a computational toolbox designed for inferring protein dynamics from experimental and theoretical data. Using information-theoretic approaches, Evol coanalyzes conservation and coevolution profiles extracted from multiple sequence alignments of protein families with their inferred dynamics. ProDy and Evol are open-source and freely available under MIT License from http://prody.csb.pitt.edu/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Johnston, Iain G; Williams, Ben P
2016-02-24
Since their endosymbiotic origin, mitochondria have lost most of their genes. Although many selective mechanisms underlying the evolution of mitochondrial genomes have been proposed, a data-driven exploration of these hypotheses is lacking, and a quantitatively supported consensus remains absent. We developed HyperTraPS, a methodology coupling stochastic modeling with Bayesian inference, to identify the ordering of evolutionary events and suggest their causes. Using 2015 complete mitochondrial genomes, we inferred evolutionary trajectories of mtDNA gene loss across the eukaryotic tree of life. We find that proteins comprising the structural cores of the electron transport chain are preferentially encoded within mitochondrial genomes across eukaryotes. A combination of high GC content and high protein hydrophobicity is required to explain patterns of mtDNA gene retention; a model that accounts for these selective pressures can also predict the success of artificial gene transfer experiments in vivo. This work provides a general method for data-driven inference of the ordering of evolutionary and progressive events, here identifying the distinct features shaping mitochondrial genomes of present-day species. Copyright © 2016 Elsevier Inc. All rights reserved.
Arenas, Ailan F; Salcedo, Gladys E; Gomez-Marin, Jorge E
2017-01-01
Pathogen-host protein-protein interaction systems examine the interactions between the protein repertoires of 2 distinct organisms. Some of these pathogen proteins interact with the host protein system and may manipulate it for their own advantages. In this work, we designed an R script by concatenating 2 functions called rowDM and rowCVmed to infer pathogen-host interaction using previously reported microarray data, including host gene enrichment analysis and the crossing of interspecific domain-domain interactions. We applied this script to the Toxoplasma-host system to describe pathogen survival mechanisms from human, mouse, and Toxoplasma Gene Expression Omnibus series. Our outcomes exhibited similar results with previously reported microarray analyses, but we found other important proteins that could contribute to toxoplasma pathogenesis. We observed that Toxoplasma ROP38 is the most differentially expressed protein among toxoplasma strains. Enrichment analysis and KEGG mapping indicated that the human retinal genes most affected by Toxoplasma infections are those related to antiapoptotic mechanisms. We suggest that proteins PIK3R1, PRKCA, PRKCG, PRKCB, HRAS, and c-JUN could be the possible substrates for differentially expressed Toxoplasma kinase ROP38. Likewise, we propose that Toxoplasma causes overexpression of apoptotic suppression human genes. PMID:29317802
Statistical inference of protein structural alignments using information and compression.
Collier, James H; Allison, Lloyd; Lesk, Arthur M; Stuckey, Peter J; Garcia de la Banda, Maria; Konagurthu, Arun S
2017-04-01
Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. We have implemented this approach in MMLigner , the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner 's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes. Source code, binaries and an interactive web version are available at http://lcb.infotech.monash.edu.au/mmligner . arun.konagurthu@monash.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Yang, Jianhua; Osman, Kim; Iqbal, Mudassar; Stekel, Dov J.; Luo, Zewei; Armstrong, Susan J.; Franklin, F. Chris H.
2013-01-01
Following successful completion of the Brassica rapa sequencing project, the next step is to investigate functions of individual genes/proteins. For Arabidopsis thaliana, large amounts of protein–protein interaction (PPI) data are available from the major PPI databases (DBs). It is known that Brassica crop species are closely related to A. thaliana. This provides an opportunity to infer the B. rapa interactome using PPI data available from A. thaliana. In this paper, we present an inferred B. rapa interactome that is based on the A. thaliana PPI data from two resources: (i) A. thaliana PPI data from three major DBs, BioGRID, IntAct, and TAIR. (ii) ortholog-based A. thaliana PPI predictions. Linking between B. rapa and A. thaliana was accomplished in three complementary ways: (i) ortholog predictions, (ii) identification of gene duplication based on synteny and collinearity, and (iii) BLAST sequence similarity search. A complementary approach was also applied, which used known/predicted domain–domain interaction data. Specifically, since the two species are closely related, we used PPI data from A. thaliana to predict interacting domains that might be conserved between the two species. The predicted interactome was investigated for the component that contains known A. thaliana meiotic proteins to demonstrate its usability. PMID:23293649
A standardized framing for reporting protein identifications in mzIdentML 1.2
Seymour, Sean L.; Farrah, Terry; Binz, Pierre-Alain; Chalkley, Robert J.; Cottrell, John S.; Searle, Brian C.; Tabb, David L.; Vizcaíno, Juan Antonio; Prieto, Gorka; Uszkoreit, Julian; Eisenacher, Martin; Martínez-Bartolomé, Salvador; Ghali, Fawaz; Jones, Andrew R.
2015-01-01
Inferring which protein species have been detected in bottom-up proteomics experiments has been a challenging problem for which solutions have been maturing over the past decade. While many inference approaches now function well in isolation, comparing and reconciling the results generated across different tools remains difficult. It presently stands as one of the greatest barriers in collaborative efforts such as the Human Proteome Project and public repositories like the PRoteomics IDEntifications (PRIDE) database. Here we present a framework for reporting protein identifications that seeks to improve capabilities for comparing results generated by different inference tools. This framework standardizes the terminology for describing protein identification results, associated with the HUPO-Proteomics Standards Initiative (PSI) mzIdentML standard, while still allowing for differing methodologies to reach that final state. It is proposed that developers of software for reporting identification results will adopt this terminology in their outputs. While the new terminology does not require any changes to the core mzIdentML model, it represents a significant change in practice, and, as such, the rules will be released via a new version of the mzIdentML specification (version 1.2) so that consumers of files are able to determine whether the new guidelines have been adopted by export software. PMID:25092112
Complete fold annotation of the human proteome using a novel structural feature space.
Middleton, Sarah A; Illuminati, Joseph; Kim, Junhyong
2017-04-13
Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Middleton, Sarah A.; Illuminati, Joseph; Kim, Junhyong
Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this methodmore » by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Finally, our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.« less
Complete fold annotation of the human proteome using a novel structural feature space
Middleton, Sarah A.; Illuminati, Joseph; Kim, Junhyong
2017-01-01
Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families. PMID:28406174
Oscillatory regulation of Hes1: Discrete stochastic delay modelling and simulation.
Barrio, Manuel; Burrage, Kevin; Leier, André; Tian, Tianhai
2006-09-08
Discrete stochastic simulations are a powerful tool for understanding the dynamics of chemical kinetics when there are small-to-moderate numbers of certain molecular species. In this paper we introduce delays into the stochastic simulation algorithm, thus mimicking delays associated with transcription and translation. We then show that this process may well explain more faithfully than continuous deterministic models the observed sustained oscillations in expression levels of hes1 mRNA and Hes1 protein.
Timescales and bottlenecks in miRNA-dependent gene regulation.
Hausser, Jean; Syed, Afzal Pasha; Selevsek, Nathalie; van Nimwegen, Erik; Jaskiewicz, Lukasz; Aebersold, Ruedi; Zavolan, Mihaela
2013-12-03
MiRNAs are post-transcriptional regulators that contribute to the establishment and maintenance of gene expression patterns. Although their biogenesis and decay appear to be under complex control, the implications of miRNA expression dynamics for the processes that they regulate are not well understood. We derived a mathematical model of miRNA-mediated gene regulation, inferred its parameters from experimental data sets, and found that the model describes well time-dependent changes in mRNA, protein and ribosome density levels measured upon miRNA transfection and induction. The inferred parameters indicate that the timescale of miRNA-dependent regulation is slower than initially thought. Delays in miRNA loading into Argonaute proteins and the slow decay of proteins relative to mRNAs can explain the typically small changes in protein levels observed upon miRNA transfection. For miRNAs to regulate protein expression on the timescale of a day, as miRNAs involved in cell-cycle regulation do, accelerated miRNA turnover is necessary.
Song, Min; Yu, Hwanjo; Han, Wook-Shin
2011-11-24
Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore combining the AL with the SSL to improve the performance of the PPI task. We propose a novel PPI extraction technique called PPISpotter by combining Deterministic Annealing-based SSL and an AL technique to extract protein-protein interaction. In addition, we extract a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques, which further improve the SVM classifiers. In our feature selection technique, syntactic, semantic, and lexical properties of text are incorporated into feature selection that boosts the system performance significantly. By conducting experiments with three different PPI corpuses, we show that PPISpotter is superior to the other techniques incorporated into semi-supervised SVMs such as Random Sampling, Clustering, and Transductive SVMs by precision, recall, and F-measure. Our system is a novel, state-of-the-art technique for efficiently extracting protein-protein interaction pairs.
Narimani, Zahra; Beigy, Hamid; Ahmad, Ashar; Masoudi-Nejad, Ali; Fröhlich, Holger
2017-01-01
Inferring the structure of molecular networks from time series protein or gene expression data provides valuable information about the complex biological processes of the cell. Causal network structure inference has been approached using different methods in the past. Most causal network inference techniques, such as Dynamic Bayesian Networks and ordinary differential equations, are limited by their computational complexity and thus make large scale inference infeasible. This is specifically true if a Bayesian framework is applied in order to deal with the unavoidable uncertainty about the correct model. We devise a novel Bayesian network reverse engineering approach using ordinary differential equations with the ability to include non-linearity. Besides modeling arbitrary, possibly combinatorial and time dependent perturbations with unknown targets, one of our main contributions is the use of Expectation Propagation, an algorithm for approximate Bayesian inference over large scale network structures in short computation time. We further explore the possibility of integrating prior knowledge into network inference. We evaluate the proposed model on DREAM4 and DREAM8 data and find it competitive against several state-of-the-art existing network inference methods.
NASA Astrophysics Data System (ADS)
Lerner, Michael G.; Meagher, Kristin L.; Carlson, Heather A.
2008-10-01
Use of solvent mapping, based on multiple-copy minimization (MCM) techniques, is common in structure-based drug discovery. The minima of small-molecule probes define locations for complementary interactions within a binding pocket. Here, we present improved methods for MCM. In particular, a Jarvis-Patrick (JP) method is outlined for grouping the final locations of minimized probes into physical clusters. This algorithm has been tested through a study of protein-protein interfaces, showing the process to be robust, deterministic, and fast in the mapping of protein "hot spots." Improvements in the initial placement of probe molecules are also described. A final application to HIV-1 protease shows how our automated technique can be used to partition data too complicated to analyze by hand. These new automated methods may be easily and quickly extended to other protein systems, and our clustering methodology may be readily incorporated into other clustering packages.
Structural Deterministic Safety Factors Selection Criteria and Verification
NASA Technical Reports Server (NTRS)
Verderaime, V.
1992-01-01
Though current deterministic safety factors are arbitrarily and unaccountably specified, its ratio is rooted in resistive and applied stress probability distributions. This study approached the deterministic method from a probabilistic concept leading to a more systematic and coherent philosophy and criterion for designing more uniform and reliable high-performance structures. The deterministic method was noted to consist of three safety factors: a standard deviation multiplier of the applied stress distribution; a K-factor for the A- or B-basis material ultimate stress; and the conventional safety factor to ensure that the applied stress does not operate in the inelastic zone of metallic materials. The conventional safety factor is specifically defined as the ratio of ultimate-to-yield stresses. A deterministic safety index of the combined safety factors was derived from which the corresponding reliability proved the deterministic method is not reliability sensitive. The bases for selecting safety factors are presented and verification requirements are discussed. The suggested deterministic approach is applicable to all NASA, DOD, and commercial high-performance structures under static stresses.
Experimental validation of a coupled neutron-photon inverse radiation transport solver
NASA Astrophysics Data System (ADS)
Mattingly, John; Mitchell, Dean J.; Harding, Lee T.
2011-10-01
Sandia National Laboratories has developed an inverse radiation transport solver that applies nonlinear regression to coupled neutron-photon deterministic transport models. The inverse solver uses nonlinear regression to fit a radiation transport model to gamma spectrometry and neutron multiplicity counting measurements. The subject of this paper is the experimental validation of that solver. This paper describes a series of experiments conducted with a 4.5 kg sphere of α-phase, weapons-grade plutonium. The source was measured bare and reflected by high-density polyethylene (HDPE) spherical shells with total thicknesses between 1.27 and 15.24 cm. Neutron and photon emissions from the source were measured using three instruments: a gross neutron counter, a portable neutron multiplicity counter, and a high-resolution gamma spectrometer. These measurements were used as input to the inverse radiation transport solver to evaluate the solver's ability to correctly infer the configuration of the source from its measured radiation signatures.
‘Particle genetics’: treating every cell as unique
Yvert, Gaël
2014-01-01
Genotype-phenotype relations are usually inferred from a deterministic point of view. For example, quantitative trait loci (QTL), which describe regions of the genome associated with a particular phenotype, are based on a mean trait difference between genotype categories. However, living systems comprise huge numbers of cells (the ‘particles’ of biology). Each cell can exhibit substantial phenotypic individuality, which can have dramatic consequences at the organismal level. Now, with technology capable of interrogating individual cells, it is time to consider how genotypes shape the probability laws of single cell traits. The possibility of mapping single cell probabilistic trait loci (PTL), which link genomic regions to probabilities of cellular traits, is a promising step in this direction. This approach requires thinking about phenotypes in probabilistic terms, a concept that statistical physicists have been applying to particles for a century. Here, I describe PTL and discuss their potential to enlarge our understanding of genotype-phenotype relations. PMID:24315431
NASA Technical Reports Server (NTRS)
Sohn, Byung-Ju; Smith, Eric A.
1993-01-01
The maximum entropy production principle suggested by Paltridge (1975) is applied to separating the satellite-determined required total transports into atmospheric and oceanic components. Instead of using the excessively restrictive equal energy dissipation hypothesis as a deterministic tool for separating transports between the atmosphere and ocean fluids, the satellite-inferred required 2D energy transports are imposed on Paltridge's energy balance model, which is then solved as a variational problem using the equal energy dissipation hypothesis only to provide an initial guess field. It is suggested that Southern Ocean transports are weaker than previously reported. It is argued that a maximum entropy production principle can serve as a governing rule on macroscale global climate, and, in conjunction with conventional satellite measurements of the net radiation balance, provides a means to decompose atmosphere and ocean transports from the total transport field.
Recurrence analysis of ant activity patterns
2017-01-01
In this study, we used recurrence quantification analysis (RQA) and recurrence plots (RPs) to compare the movement activity of individual workers of three ant species, as well as a gregarious beetle species. RQA and RPs quantify the number and duration of recurrences of a dynamical system, including a detailed quantification of signals that could be stochastic, deterministic, or both. First, we found substantial differences between the activity dynamics of beetles and ants, with the results suggesting that the beetles have quasi-periodic dynamics and the ants do not. Second, workers from different ant species varied with respect to their dynamics, presenting degrees of predictability as well as stochastic signals. Finally, differences were found among minor and major caste of the same (dimorphic) ant species. Our results underscore the potential of RQA and RPs in the analysis of complex behavioral patterns, as well as in general inferences on animal behavior and other biological phenomena. PMID:29016648
NASA Astrophysics Data System (ADS)
Casdagli, M. C.
1997-09-01
We show that recurrence plots (RPs) give detailed characterizations of time series generated by dynamical systems driven by slowly varying external forces. For deterministic systems we show that RPs of the time series can be used to reconstruct the RP of the driving force if it varies sufficiently slowly. If the driving force is one-dimensional, its functional form can then be inferred up to an invertible coordinate transformation. The same results hold for stochastic systems if the RP of the time series is suitably averaged and transformed. These results are used to investigate the nonlinear prediction of time series generated by dynamical systems driven by slowly varying external forces. We also consider the problem of detecting a small change in the driving force, and propose a surrogate data technique for assessing statistical significance. Numerically simulated time series and a time series of respiration rates recorded from a subject with sleep apnea are used as illustrative examples.
Probabilistic forecasting of extreme weather events based on extreme value theory
NASA Astrophysics Data System (ADS)
Van De Vyver, Hans; Van Schaeybroeck, Bert
2016-04-01
Extreme events in weather and climate such as high wind gusts, heavy precipitation or extreme temperatures are commonly associated with high impacts on both environment and society. Forecasting extreme weather events is difficult, and very high-resolution models are needed to describe explicitly extreme weather phenomena. A prediction system for such events should therefore preferably be probabilistic in nature. Probabilistic forecasts and state estimations are nowadays common in the numerical weather prediction community. In this work, we develop a new probabilistic framework based on extreme value theory that aims to provide early warnings up to several days in advance. We consider the combined events when an observation variable Y (for instance wind speed) exceeds a high threshold y and its corresponding deterministic forecasts X also exceeds a high forecast threshold y. More specifically two problems are addressed:} We consider pairs (X,Y) of extreme events where X represents a deterministic forecast, and Y the observation variable (for instance wind speed). More specifically two problems are addressed: Given a high forecast X=x_0, what is the probability that Y>y? In other words: provide inference on the conditional probability: [ Pr{Y>y|X=x_0}. ] Given a probabilistic model for Problem 1, what is the impact on the verification analysis of extreme events. These problems can be solved with bivariate extremes (Coles, 2001), and the verification analysis in (Ferro, 2007). We apply the Ramos and Ledford (2009) parametric model for bivariate tail estimation of the pair (X,Y). The model accommodates different types of extremal dependence and asymmetry within a parsimonious representation. Results are presented using the ensemble reforecast system of the European Centre of Weather Forecasts (Hagedorn, 2008). Coles, S. (2001) An Introduction to Statistical modelling of Extreme Values. Springer-Verlag.Ferro, C.A.T. (2007) A probability model for verifying deterministic forecasts of extreme events. Wea. Forecasting {22}, 1089-1100.Hagedorn, R. (2008) Using the ECMWF reforecast dataset to calibrate EPS forecasts. ECMWF Newsletter, {117}, 8-13.Ramos, A., Ledford, A. (2009) A new class of models for bivariate joint tails. J.R. Statist. Soc. B {71}, 219-241.
Inference of epistatic effects in a key mitochondrial protein
NASA Astrophysics Data System (ADS)
Nelson, Erik D.; Grishin, Nick V.
2018-06-01
We use Potts model inference to predict pair epistatic effects in a key mitochondrial protein—cytochrome c oxidase subunit 2—for ray-finned fishes. We examine the effect of phylogenetic correlations on our predictions using a simple exact fitness model, and we find that, although epistatic effects are underpredicted, they maintain a roughly linear relationship to their true (model) values. After accounting for this correction, epistatic effects in the protein are still relatively weak, leading to fitness valleys of depth 2 N s ≃-5 in compensatory double mutants. Interestingly, positive epistasis is more pronounced than negative epistasis, and the strongest positive effects capture nearly all sites subject to positive selection in fishes, similar to virus proteins evolving under selection pressure in the context of drug therapy.
Ben Abdallah, Emna; Folschette, Maxime; Roux, Olivier; Magnin, Morgan
2017-01-01
This paper addresses the problem of finding attractors in biological regulatory networks. We focus here on non-deterministic synchronous and asynchronous multi-valued networks, modeled using automata networks (AN). AN is a general and well-suited formalism to study complex interactions between different components (genes, proteins,...). An attractor is a minimal trap domain, that is, a part of the state-transition graph that cannot be escaped. Such structures are terminal components of the dynamics and take the form of steady states (singleton) or complex compositions of cycles (non-singleton). Studying the effect of a disease or a mutation on an organism requires finding the attractors in the model to understand the long-term behaviors. We present a computational logical method based on answer set programming (ASP) to identify all attractors. Performed without any network reduction, the method can be applied on any dynamical semantics. In this paper, we present the two most widespread non-deterministic semantics: the asynchronous and the synchronous updating modes. The logical approach goes through a complete enumeration of the states of the network in order to find the attractors without the necessity to construct the whole state-transition graph. We realize extensive computational experiments which show good performance and fit the expected theoretical results in the literature. The originality of our approach lies on the exhaustive enumeration of all possible (sets of) states verifying the properties of an attractor thanks to the use of ASP. Our method is applied to non-deterministic semantics in two different schemes (asynchronous and synchronous). The merits of our methods are illustrated by applying them to biological examples of various sizes and comparing the results with some existing approaches. It turns out that our approach succeeds to exhaustively enumerate on a desktop computer, in a large model (100 components), all existing attractors up to a given size (20 states). This size is only limited by memory and computation time.
Singh, Reema; Schilde, Christina; Schaap, Pauline
2016-11-17
Dictyostelia are a well-studied group of organisms with colonial multicellularity, which are members of the mostly unicellular Amoebozoa. A phylogeny based on SSU rDNA data subdivided all Dictyostelia into four major groups, but left the position of the root and of six group-intermediate taxa unresolved. Recent phylogenies inferred from 30 or 213 proteins from sequenced genomes, positioned the root between two branches, each containing two major groups, but lacked data to position the group-intermediate taxa. Since the positions of these early diverging taxa are crucial for understanding the evolution of phenotypic complexity in Dictyostelia, we sequenced six representative genomes of early diverging taxa. We retrieved orthologs of 47 housekeeping proteins with an average size of 890 amino acids from six newly sequenced and eight published genomes of Dictyostelia and unicellular Amoebozoa and inferred phylogenies from single and concatenated protein sequence alignments. Concatenated alignments of all 47 proteins, and four out of five subsets of nine concatenated proteins all produced the same consensus phylogeny with 100% statistical support. Trees inferred from just two out of the 47 proteins, individually reproduced the consensus phylogeny, highlighting that single gene phylogenies will rarely reflect correct species relationships. However, sets of two or three concatenated proteins again reproduced the consensus phylogeny, indicating that a small selection of genes suffices for low cost classification of as yet unincorporated or newly discovered dictyostelid and amoebozoan taxa by gene amplification. The multi-locus consensus phylogeny shows that groups 1 and 2 are sister clades in branch I, with the group-intermediate taxon D. polycarpum positioned as outgroup to group 2. Branch II consists of groups 3 and 4, with the group-intermediate taxon Polysphondylium violaceum positioned as sister to group 4, and the group-intermediate taxon Dictyostelium polycephalum branching at the base of that whole clade. Given the data, the approximately unbiased test rejects all alternative topologies favoured by SSU rDNA and individual proteins with high statistical support. The test also rejects monophyletic origins for the genera Acytostelium, Polysphondylium and Dictyostelium. The current position of Acytostelium ellipticum in the consensus phylogeny indicates that somatic cells were lost twice in Dictyostelia.
Cheng, Yiming; Perocchi, Fabiana
2015-07-01
ProtPhylo is a web-based tool to identify proteins that are functionally linked to either a phenotype or a protein of interest based on co-evolution. ProtPhylo infers functional associations by comparing protein phylogenetic profiles (co-occurrence patterns of orthology relationships) for more than 9.7 million non-redundant protein sequences from all three domains of life. Users can query any of 2048 fully sequenced organisms, including 1678 bacteria, 255 eukaryotes and 115 archaea. In addition, they can tailor ProtPhylo to a particular kind of biological question by choosing among four main orthology inference methods based either on pair-wise sequence comparisons (One-way Best Hits and Best Reciprocal Hits) or clustering of orthologous proteins across multiple species (OrthoMCL and eggNOG). Next, ProtPhylo ranks phylogenetic neighbors of query proteins or phenotypic properties using the Hamming distance as a measure of similarity between pairs of phylogenetic profiles. Candidate hits can be easily and flexibly prioritized by complementary clues on subcellular localization, known protein-protein interactions, membrane spanning regions and protein domains. The resulting protein list can be quickly exported into a csv text file for further analyses. ProtPhylo is freely available at http://www.protphylo.org. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
A Multi-Method Approach for Proteomic Network Inference in 11 Human Cancers.
Şenbabaoğlu, Yasin; Sümer, Selçuk Onur; Sánchez-Vega, Francisco; Bemis, Debra; Ciriello, Giovanni; Schultz, Nikolaus; Sander, Chris
2016-02-01
Protein expression and post-translational modification levels are tightly regulated in neoplastic cells to maintain cellular processes known as 'cancer hallmarks'. The first Pan-Cancer initiative of The Cancer Genome Atlas (TCGA) Research Network has aggregated protein expression profiles for 3,467 patient samples from 11 tumor types using the antibody based reverse phase protein array (RPPA) technology. The resultant proteomic data can be utilized to computationally infer protein-protein interaction (PPI) networks and to study the commonalities and differences across tumor types. In this study, we compare the performance of 13 established network inference methods in their capacity to retrieve the curated Pathway Commons interactions from RPPA data. We observe that no single method has the best performance in all tumor types, but a group of six methods, including diverse techniques such as correlation, mutual information, and regression, consistently rank highly among the tested methods. We utilize the high performing methods to obtain a consensus network; and identify four robust and densely connected modules that reveal biological processes as well as suggest antibody-related technical biases. Mapping the consensus network interactions to Reactome gene lists confirms the pan-cancer importance of signal transduction pathways, innate and adaptive immune signaling, cell cycle, metabolism, and DNA repair; and also suggests several biological processes that may be specific to a subset of tumor types. Our results illustrate the utility of the RPPA platform as a tool to study proteomic networks in cancer.
Erdem, Cemal; Nagle, Alison M; Casa, Angelo J; Litzenburger, Beate C; Wang, Yu-Fen; Taylor, D Lansing; Lee, Adrian V; Lezon, Timothy R
2016-09-01
Insulin and insulin-like growth factor I (IGF1) influence cancer risk and progression through poorly understood mechanisms. To better understand the roles of insulin and IGF1 signaling in breast cancer, we combined proteomic screening with computational network inference to uncover differences in IGF1 and insulin induced signaling. Using reverse phase protein array, we measured the levels of 134 proteins in 21 breast cancer cell lines stimulated with IGF1 or insulin for up to 48 h. We then constructed directed protein expression networks using three separate methods: (i) lasso regression, (ii) conventional matrix inversion, and (iii) entropy maximization. These networks, named here as the time translation models, were analyzed and the inferred interactions were ranked by differential magnitude to identify pathway differences. The two top candidates, chosen for experimental validation, were shown to regulate IGF1/insulin induced phosphorylation events. First, acetyl-CoA carboxylase (ACC) knock-down was shown to increase the level of mitogen-activated protein kinase (MAPK) phosphorylation. Second, stable knock-down of E-Cadherin increased the phospho-Akt protein levels. Both of the knock-down perturbations incurred phosphorylation responses stronger in IGF1 stimulated cells compared with insulin. Overall, the time-translation modeling coupled to wet-lab experiments has proven to be powerful in inferring differential interactions downstream of IGF1 and insulin signaling, in vitro. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
A random walk on water (Henry Darcy Medal Lecture)
NASA Astrophysics Data System (ADS)
Koutsoyiannis, D.
2009-04-01
Randomness and uncertainty had been well appreciated in hydrology and water resources engineering in their initial steps as scientific disciplines. However, this changed through the years and, following other geosciences, hydrology adopted a naïve view of randomness in natural processes. Such a view separates natural phenomena into two mutually exclusive types, random or stochastic, and deterministic. When a classification of a specific process into one of these two types fails, then a separation of the process into two different, usually additive, parts is typically devised, each of which may be further subdivided into subparts (e.g., deterministic subparts such as periodic and aperiodic or trends). This dichotomous logic is typically combined with a manichean perception, in which the deterministic part supposedly represents cause-effect relationships and thus is physics and science (the "good"), whereas randomness has little relationship with science and no relationship with understanding (the "evil"). Probability theory and statistics, which traditionally provided the tools for dealing with randomness and uncertainty, have been regarded by some as the "necessary evil" but not as an essential part of hydrology and geophysics. Some took a step further to banish them from hydrology, replacing them with deterministic sensitivity analysis and fuzzy-logic representations. Others attempted to demonstrate that irregular fluctuations observed in natural processes are au fond manifestations of underlying chaotic deterministic dynamics with low dimensionality, thus attempting to render probabilistic descriptions unnecessary. Some of the above recent developments are simply flawed because they make erroneous use of probability and statistics (which, remarkably, provide the tools for such analyses), whereas the entire underlying logic is just a false dichotomy. To see this, it suffices to recall that Pierre Simon Laplace, perhaps the most famous proponent of determinism in the history of philosophy of science (cf. Laplace's demon), is, at the same time, one of the founders of probability theory, which he regarded as "nothing but common sense reduced to calculation". This harmonizes with James Clerk Maxwell's view that "the true logic for this world is the calculus of Probabilities" and was more recently and epigrammatically formulated in the title of Edwin Thompson Jaynes's book "Probability Theory: The Logic of Science" (2003). Abandoning dichotomous logic, either on ontological or epistemic grounds, we can identify randomness or stochasticity with unpredictability. Admitting that (a) uncertainty is an intrinsic property of nature; (b) causality implies dependence of natural processes in time and thus suggests predictability; but, (c) even the tiniest uncertainty (e.g., in initial conditions) may result in unpredictability after a certain time horizon, we may shape a stochastic representation of natural processes that is consistent with Karl Popper's indeterministic world view. In this representation, probability quantifies uncertainty according to the Kolmogorov system, in which probability is a normalized measure, i.e., a function that maps sets (areas where the initial conditions or the parameter values lie) to real numbers (in the interval [0, 1]). In such a representation, predictability (suggested by deterministic laws) and unpredictability (randomness) coexist, are not separable or additive components, and it is a matter of specifying the time horizon of prediction to decide which of the two dominates. An elementary numerical example has been devised to illustrate the above ideas and demonstrate that they offer a pragmatic and useful guide for practice, rather than just pertaining to philosophical discussions. A chaotic model, with fully and a priori known deterministic dynamics and deterministic inputs (without any random agent), is assumed to represent the hydrological balance in an area partly covered by vegetation. Experimentation with this toy model demonstrates, inter alia, that: (1) for short time horizons the deterministic dynamics is able to give good predictions; but (2) these predictions become extremely inaccurate and useless for long time horizons; (3) for such horizons a naïve statistical prediction (average of past data) which fully neglects the deterministic dynamics is more skilful; and (4) if this statistical prediction, in addition to past data, is combined with the probability theory (the principle of maximum entropy, in particular), it can provide a more informative prediction. Also, the toy model shows that the trajectories of the system state (and derivative properties thereof) do not resemble a regular (e.g., periodic) deterministic process nor a purely random process, but exhibit patterns indicating anti-persistence and persistence (where the latter statistically complies with a Hurst-Kolmogorov behaviour). If the process is averaged over long time scales, the anti-persistent behaviour improves predictability, whereas the persistent behaviour substantially deteriorates it. A stochastic representation of this deterministic system, which incorporates dynamics, is not only possible, but also powerful as it provides good predictions for both short and long horizons and helps to decide on when the deterministic dynamics should be considered or neglected. Obviously, a natural system is extremely more complex than this simple toy model and hence unpredictability is naturally even more prominent in the former. In addition, in a complex natural system, we can never know the exact dynamics and we must infer it from past data, which implies additional uncertainty and an additional role of stochastics in the process of formulating the system equations and estimating the involved parameters. Data also offer the only solid grounds to test any hypothesis about the dynamics, and failure of performing such testing against evidence from data renders the hypothesised dynamics worthless. If this perception of natural phenomena is adequately plausible, then it may help in studying interesting fundamental questions regarding the current state and the trends of hydrological and water resources research and their promising future paths. For instance: (i) Will it ever be possible to achieve a fully "physically based" modelling of hydrological systems that will not depend on data or stochastic representations? (ii) To what extent can hydrological uncertainty be reduced and what are the effective means for such reduction? (iii) Are current stochastic methods in hydrology consistent with observed natural behaviours? What paths should we explore for their advancement? (iv) Can deterministic methods provide solid scientific grounds for water resources engineering and management? In particular, can there be risk-free hydraulic engineering and water management? (v) Is the current (particularly important) interface between hydrology and climate satisfactory?. In particular, should hydrology rely on climate models that are not properly validated (i.e., for periods and scales not used in calibration)? In effect, is the evolution of climate and its impacts on water resources deterministically predictable?
Schlaier, Juergen R; Beer, Anton L; Faltermeier, Rupert; Fellner, Claudia; Steib, Kathrin; Lange, Max; Greenlee, Mark W; Brawanski, Alexander T; Anthofer, Judith M
2017-06-01
This study compared tractography approaches for identifying cerebellar-thalamic fiber bundles relevant to planning target sites for deep brain stimulation (DBS). In particular, probabilistic and deterministic tracking of the dentate-rubro-thalamic tract (DRTT) and differences between the spatial courses of the DRTT and the cerebello-thalamo-cortical (CTC) tract were compared. Six patients with movement disorders were examined by magnetic resonance imaging (MRI), including two sets of diffusion-weighted images (12 and 64 directions). Probabilistic and deterministic tractography was applied on each diffusion-weighted dataset to delineate the DRTT. Results were compared with regard to their sensitivity in revealing the DRTT and additional fiber tracts and processing time. Two sets of regions-of-interests (ROIs) guided deterministic tractography of the DRTT or the CTC, respectively. Tract distances to an atlas-based reference target were compared. Probabilistic fiber tracking with 64 orientations detected the DRTT in all twelve hemispheres. Deterministic tracking detected the DRTT in nine (12 directions) and in only two (64 directions) hemispheres. Probabilistic tracking was more sensitive in detecting additional fibers (e.g. ansa lenticularis and medial forebrain bundle) than deterministic tracking. Probabilistic tracking lasted substantially longer than deterministic. Deterministic tracking was more sensitive in detecting the CTC than the DRTT. CTC tracts were located adjacent but consistently more posterior to DRTT tracts. These results suggest that probabilistic tracking is more sensitive and robust in detecting the DRTT but harder to implement than deterministic approaches. Although sensitivity of deterministic tracking is higher for the CTC than the DRTT, targets for DBS based on these tracts likely differ. © 2017 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Inferring Domain-Domain Interactions from Protein-Protein Interactions with Formal Concept Analysis
Khor, Susan
2014-01-01
Identifying reliable domain-domain interactions will increase our ability to predict novel protein-protein interactions, to unravel interactions in protein complexes, and thus gain more information about the function and behavior of genes. One of the challenges of identifying reliable domain-domain interactions is domain promiscuity. Promiscuous domains are domains that can occur in many domain architectures and are therefore found in many proteins. This becomes a problem for a method where the score of a domain-pair is the ratio between observed and expected frequencies because the protein-protein interaction network is sparse. As such, many protein-pairs will be non-interacting and domain-pairs with promiscuous domains will be penalized. This domain promiscuity challenge to the problem of inferring reliable domain-domain interactions from protein-protein interactions has been recognized, and a number of work-arounds have been proposed. This paper reports on an application of Formal Concept Analysis to this problem. It is found that the relationship between formal concepts provides a natural way for rare domains to elevate the rank of promiscuous domain-pairs and enrich highly ranked domain-pairs with reliable domain-domain interactions. This piggybacking of promiscuous domain-pairs onto less promiscuous domain-pairs is possible only with concept lattices whose attribute-labels are not reduced and is enhanced by the presence of proteins that comprise both promiscuous and rare domains. PMID:24586450
A prior-based integrative framework for functional transcriptional regulatory network inference
Siahpirani, Alireza F.
2017-01-01
Abstract Transcriptional regulatory networks specify regulatory proteins controlling the context-specific expression levels of genes. Inference of genome-wide regulatory networks is central to understanding gene regulation, but remains an open challenge. Expression-based network inference is among the most popular methods to infer regulatory networks, however, networks inferred from such methods have low overlap with experimentally derived (e.g. ChIP-chip and transcription factor (TF) knockouts) networks. Currently we have a limited understanding of this discrepancy. To address this gap, we first develop a regulatory network inference algorithm, based on probabilistic graphical models, to integrate expression with auxiliary datasets supporting a regulatory edge. Second, we comprehensively analyze our and other state-of-the-art methods on different expression perturbation datasets. Networks inferred by integrating sequence-specific motifs with expression have substantially greater agreement with experimentally derived networks, while remaining more predictive of expression than motif-based networks. Our analysis suggests natural genetic variation as the most informative perturbation for network inference, and, identifies core TFs whose targets are predictable from expression. Multiple reasons make the identification of targets of other TFs difficult, including network architecture and insufficient variation of TF mRNA level. Finally, we demonstrate the utility of our inference algorithm to infer stress-specific regulatory networks and for regulator prioritization. PMID:27794550
Oscillatory Regulation of Hes1: Discrete Stochastic Delay Modelling and Simulation
Barrio, Manuel; Burrage, Kevin; Leier, André; Tian, Tianhai
2006-01-01
Discrete stochastic simulations are a powerful tool for understanding the dynamics of chemical kinetics when there are small-to-moderate numbers of certain molecular species. In this paper we introduce delays into the stochastic simulation algorithm, thus mimicking delays associated with transcription and translation. We then show that this process may well explain more faithfully than continuous deterministic models the observed sustained oscillations in expression levels of hes1 mRNA and Hes1 protein. PMID:16965175
Annotation-based inference of transporter function.
Lee, Thomas J; Paulsen, Ian; Karp, Peter
2008-07-01
We present a method for inferring and constructing transport reactions for transporter proteins based primarily on the analysis of the names of individual proteins in the genome annotation of an organism. Transport reactions are declarative descriptions of transporter activities, and thus can be manipulated computationally, unlike free-text protein names. Once transporter activities are encoded as transport reactions, a number of computational analyses are possible including database queries by transporter activity; inclusion of transporters into an automatically generated metabolic-map diagram that can be painted with omics data to aid in their interpretation; detection of anomalies in the metabolic and transport networks, such as substrates that are transported into the cell but are not inputs to any metabolic reaction or pathway; and comparative analyses of the transport capabilities of different organisms. On randomly selected organisms, the method achieves precision and recall rates of 0.93 and 0.90, respectively in identifying transporter proteins by name within the complete genome. The method obtains 67.5% accuracy in predicting complete transport reactions; if allowance is made for predictions that are overly general yet not incorrect, reaction prediction accuracy is 82.5%. The method is implemented as part of PathoLogic, the inference component of the Pathway Tools software. Pathway Tools is freely available to researchers at non-commercial institutions, including source code; a fee applies to commercial institutions. Supplementary data are available at Bioinformatics online.
MoCha: Molecular Characterization of Unknown Pathways.
Lobo, Daniel; Hammelman, Jennifer; Levin, Michael
2016-04-01
Automated methods for the reverse-engineering of complex regulatory networks are paving the way for the inference of mechanistic comprehensive models directly from experimental data. These novel methods can infer not only the relations and parameters of the known molecules defined in their input datasets, but also unknown components and pathways identified as necessary by the automated algorithms. Identifying the molecular nature of these unknown components is a crucial step for making testable predictions and experimentally validating the models, yet no specific and efficient tools exist to aid in this process. To this end, we present here MoCha (Molecular Characterization), a tool optimized for the search of unknown proteins and their pathways from a given set of known interacting proteins. MoCha uses the comprehensive dataset of protein-protein interactions provided by the STRING database, which currently includes more than a billion interactions from over 2,000 organisms. MoCha is highly optimized, performing typical searches within seconds. We demonstrate the use of MoCha with the characterization of unknown components from reverse-engineered models from the literature. MoCha is useful for working on network models by hand or as a downstream step of a model inference engine workflow and represents a valuable and efficient tool for the characterization of unknown pathways using known data from thousands of organisms. MoCha and its source code are freely available online under the GPLv3 license.
Massatti, Rob; Knowles, L Lacey
2016-08-01
Deterministic processes may uniquely affect codistributed species' phylogeographic patterns such that discordant genetic variation among taxa is predicted. Yet, explicitly testing expectations of genomic discordance in a statistical framework remains challenging. Here, we construct spatially and temporally dynamic models to investigate the hypothesized effect of microhabitat preferences on the permeability of glaciated regions to gene flow in two closely related montane species. Utilizing environmental niche models from the Last Glacial Maximum and the present to inform demographic models of changes in habitat suitability over time, we evaluate the relative probabilities of two alternative models using approximate Bayesian computation (ABC) in which glaciated regions are either (i) permeable or (ii) a barrier to gene flow. Results based on the fit of the empirical data to data sets simulated using a spatially explicit coalescent under alternative models indicate that genomic data are consistent with predictions about the hypothesized role of microhabitat in generating discordant patterns of genetic variation among the taxa. Specifically, a model in which glaciated areas acted as a barrier was much more probable based on patterns of genomic variation in Carex nova, a wet-adapted species. However, in the dry-adapted Carex chalciolepis, the permeable model was more probable, although the difference in the support of the models was small. This work highlights how statistical inferences can be used to distinguish deterministic processes that are expected to result in discordant genomic patterns among species, including species-specific responses to climate change. © 2016 John Wiley & Sons Ltd.
Markov Logic Networks in the Analysis of Genetic Data
Sakhanenko, Nikita A.
2010-01-01
Abstract Complex, non-additive genetic interactions are common and can be critical in determining phenotypes. Genome-wide association studies (GWAS) and similar statistical studies of linkage data, however, assume additive models of gene interactions in looking for genotype-phenotype associations. These statistical methods view the compound effects of multiple genes on a phenotype as a sum of influences of each gene and often miss a substantial part of the heritable effect. Such methods do not use any biological knowledge about underlying mechanisms. Modeling approaches from the artificial intelligence (AI) field that incorporate deterministic knowledge into models to perform statistical analysis can be applied to include prior knowledge in genetic analysis. We chose to use the most general such approach, Markov Logic Networks (MLNs), for combining deterministic knowledge with statistical analysis. Using simple, logistic regression-type MLNs we can replicate the results of traditional statistical methods, but we also show that we are able to go beyond finding independent markers linked to a phenotype by using joint inference without an independence assumption. The method is applied to genetic data on yeast sporulation, a complex phenotype with gene interactions. In addition to detecting all of the previously identified loci associated with sporulation, our method identifies four loci with smaller effects. Since their effect on sporulation is small, these four loci were not detected with methods that do not account for dependence between markers due to gene interactions. We show how gene interactions can be detected using more complex models, which can be used as a general framework for incorporating systems biology with genetics. PMID:20958249
Dispersal-Based Microbial Community Assembly Decreases Biogeochemical Function
DOE Office of Scientific and Technical Information (OSTI.GOV)
Graham, Emily B.; Stegen, James C.
Ecological mechanisms influence relationships among microbial communities, which in turn impact biogeochemistry. In particular, microbial communities are assembled by deterministic (e.g., selection) and stochastic (e.g., dispersal) processes, and the relative balance of these two process types is hypothesized to alter the influence of microbial communities over biogeochemical function. We used an ecological simulation model to evaluate this hypothesis, defining biogeochemical function generically to represent any biogeochemical reaction of interest. We assembled receiving communities under different levels of dispersal from a source community that was assembled purely by selection. The dispersal scenarios ranged from no dispersal (i.e., selection-only) to dispersal ratesmore » high enough to overwhelm selection (i.e., homogenizing dispersal). We used an aggregate measure of community fitness to infer a given community’s biogeochemical function relative to other communities. We also used ecological null models to further link the relative influence of deterministic assembly to function. We found that increasing rates of dispersal decrease biogeochemical function by increasing the proportion of maladapted taxa in a local community. Niche breadth was also a key determinant of biogeochemical function, suggesting a tradeoff between the function of generalist and specialist species. Finally, we show that microbial assembly processes exert greater influence over biogeochemical function when there is variation in the relative contributions of dispersal and selection among communities. Taken together, our results highlight the influence of spatial processes on biogeochemical function and indicate the need to account for such effects in models that aim to predict biogeochemical function under future environmental scenarios.« less
Dispersal-Based Microbial Community Assembly Decreases Biogeochemical Function
Graham, Emily B.; Stegen, James C.
2017-11-01
Ecological mechanisms influence relationships among microbial communities, which in turn impact biogeochemistry. In particular, microbial communities are assembled by deterministic (e.g., selection) and stochastic (e.g., dispersal) processes, and the relative balance of these two process types is hypothesized to alter the influence of microbial communities over biogeochemical function. We used an ecological simulation model to evaluate this hypothesis, defining biogeochemical function generically to represent any biogeochemical reaction of interest. We assembled receiving communities under different levels of dispersal from a source community that was assembled purely by selection. The dispersal scenarios ranged from no dispersal (i.e., selection-only) to dispersal ratesmore » high enough to overwhelm selection (i.e., homogenizing dispersal). We used an aggregate measure of community fitness to infer a given community’s biogeochemical function relative to other communities. We also used ecological null models to further link the relative influence of deterministic assembly to function. We found that increasing rates of dispersal decrease biogeochemical function by increasing the proportion of maladapted taxa in a local community. Niche breadth was also a key determinant of biogeochemical function, suggesting a tradeoff between the function of generalist and specialist species. Finally, we show that microbial assembly processes exert greater influence over biogeochemical function when there is variation in the relative contributions of dispersal and selection among communities. Taken together, our results highlight the influence of spatial processes on biogeochemical function and indicate the need to account for such effects in models that aim to predict biogeochemical function under future environmental scenarios.« less
Bayesian CP Factorization of Incomplete Tensors with Automatic Rank Determination.
Zhao, Qibin; Zhang, Liqing; Cichocki, Andrzej
2015-09-01
CANDECOMP/PARAFAC (CP) tensor factorization of incomplete data is a powerful technique for tensor completion through explicitly capturing the multilinear latent factors. The existing CP algorithms require the tensor rank to be manually specified, however, the determination of tensor rank remains a challenging problem especially for CP rank . In addition, existing approaches do not take into account uncertainty information of latent factors, as well as missing entries. To address these issues, we formulate CP factorization using a hierarchical probabilistic model and employ a fully Bayesian treatment by incorporating a sparsity-inducing prior over multiple latent factors and the appropriate hyperpriors over all hyperparameters, resulting in automatic rank determination. To learn the model, we develop an efficient deterministic Bayesian inference algorithm, which scales linearly with data size. Our method is characterized as a tuning parameter-free approach, which can effectively infer underlying multilinear factors with a low-rank constraint, while also providing predictive distributions over missing entries. Extensive simulations on synthetic data illustrate the intrinsic capability of our method to recover the ground-truth of CP rank and prevent the overfitting problem, even when a large amount of entries are missing. Moreover, the results from real-world applications, including image inpainting and facial image synthesis, demonstrate that our method outperforms state-of-the-art approaches for both tensor factorization and tensor completion in terms of predictive performance.
Multi-Scale Multi-Physics Modeling of Matrix Transport Properties in Fractured Shale Reservoirs
NASA Astrophysics Data System (ADS)
Mehmani, A.; Prodanovic, M.
2014-12-01
Understanding the shale matrix flow behavior is imperative in successful reservoir development for hydrocarbon production and carbon storage. Without a predictive model, significant uncertainties in flowback from the formation, the communication between the fracture and matrix as well as proper fracturing practice will ensue. Informed by SEM images, we develop deterministic network models that couple pores from multiple scales and their respective fluid physics. The models are used to investigate sorption hysteresis as an affordable way of inferring the nanoscale pore structure in core scale. In addition, restricted diffusion as a function of pore shape, pore-throat size ratios and network connectivity is computed to make correct interpretation of the 2D NMR maps possible. Our novel pore network models have the ability to match sorption hysteresis measurements without any tuning parameters. The results clarify a common misconception of linking type 3 nitrogen hysteresis curves to only the shale pore shape and show promising sensitivty for nanopore structre inference in core scale. The results on restricted diffusion shed light on the importance of including shape factors in 2D NMR interpretations. A priori "weighting factors" as a function of pore-throat and throat-length ratio are presented and the effect of network connectivity on diffusion is quantitatively assessed. We are currently working on verifying our models with experimental data gathered from the Eagleford formation.
Speech Enhancement Using Gaussian Scale Mixture Models
Hao, Jiucang; Lee, Te-Won; Sejnowski, Terrence J.
2011-01-01
This paper presents a novel probabilistic approach to speech enhancement. Instead of a deterministic logarithmic relationship, we assume a probabilistic relationship between the frequency coefficients and the log-spectra. The speech model in the log-spectral domain is a Gaussian mixture model (GMM). The frequency coefficients obey a zero-mean Gaussian whose covariance equals to the exponential of the log-spectra. This results in a Gaussian scale mixture model (GSMM) for the speech signal in the frequency domain, since the log-spectra can be regarded as scaling factors. The probabilistic relation between frequency coefficients and log-spectra allows these to be treated as two random variables, both to be estimated from the noisy signals. Expectation-maximization (EM) was used to train the GSMM and Bayesian inference was used to compute the posterior signal distribution. Because exact inference of this full probabilistic model is computationally intractable, we developed two approaches to enhance the efficiency: the Laplace method and a variational approximation. The proposed methods were applied to enhance speech corrupted by Gaussian noise and speech-shaped noise (SSN). For both approximations, signals reconstructed from the estimated frequency coefficients provided higher signal-to-noise ratio (SNR) and those reconstructed from the estimated log-spectra produced lower word recognition error rate because the log-spectra fit the inputs to the recognizer better. Our algorithms effectively reduced the SSN, which algorithms based on spectral analysis were not able to suppress. PMID:21359139
Deterministic quantum dense coding networks
NASA Astrophysics Data System (ADS)
Roy, Saptarshi; Chanda, Titas; Das, Tamoghna; Sen(De), Aditi; Sen, Ujjwal
2018-07-01
We consider the scenario of deterministic classical information transmission between multiple senders and a single receiver, when they a priori share a multipartite quantum state - an attempt towards building a deterministic dense coding network. Specifically, we prove that in the case of two or three senders and a single receiver, generalized Greenberger-Horne-Zeilinger (gGHZ) states are not beneficial for sending classical information deterministically beyond the classical limit, except when the shared state is the GHZ state itself. On the other hand, three- and four-qubit generalized W (gW) states with specific parameters as well as the four-qubit Dicke states can provide a quantum advantage of sending the information in deterministic dense coding. Interestingly however, numerical simulations in the three-qubit scenario reveal that the percentage of states from the GHZ-class that are deterministic dense codeable is higher than that of states from the W-class.
Will, Thorsten; Helms, Volkhard
2017-04-04
Differential analysis of cellular conditions is a key approach towards understanding the consequences and driving causes behind biological processes such as developmental transitions or diseases. The progress of whole-genome expression profiling enabled to conveniently capture the state of a cell's transcriptome and to detect the characteristic features that distinguish cells in specific conditions. In contrast, mapping the physical protein interactome for many samples is experimentally infeasible at the moment. For the understanding of the whole system, however, it is equally important how the interactions of proteins are rewired between cellular states. To overcome this deficiency, we recently showed how condition-specific protein interaction networks that even consider alternative splicing can be inferred from transcript expression data. Here, we present the differential network analysis tool PPICompare that was specifically designed for isoform-sensitive protein interaction networks. Besides detecting significant rewiring events between the interactomes of grouped samples, PPICompare infers which alterations to the transcriptome caused each rewiring event and what is the minimal set of alterations necessary to explain all between-group changes. When applied to the development of blood cells, we verified that a reasonable amount of rewiring events were reported by the tool and found that differential gene expression was the major determinant of cellular adjustments to the interactome. Alternative splicing events were consistently necessary in each developmental step to explain all significant alterations and were especially important for rewiring in the context of transcriptional control. Applying PPICompare enabled us to investigate the dynamics of the human protein interactome during developmental transitions. A platform-independent implementation of the tool PPICompare is available at https://sourceforge.net/projects/ppicompare/ .
Limited utility of residue masking for positive-selection inference.
Spielman, Stephanie J; Dawson, Eric T; Wilke, Claus O
2014-09-01
Errors in multiple sequence alignments (MSAs) can reduce accuracy in positive-selection inference. Therefore, it has been suggested to filter MSAs before conducting further analyses. One widely used filter, Guidance, allows users to remove MSA positions aligned with low confidence. However, Guidance's utility in positive-selection inference has been disputed in the literature. We have conducted an extensive simulation-based study to characterize fully how Guidance impacts positive-selection inference, specifically for protein-coding sequences of realistic divergence levels. We also investigated whether novel scoring algorithms, which phylogenetically corrected confidence scores, and a new gap-penalization score-normalization scheme improved Guidance's performance. We found that no filter, including original Guidance, consistently benefitted positive-selection inferences. Moreover, all improvements detected were exceedingly minimal, and in certain circumstances, Guidance-based filters worsened inferences. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
From pull-down data to protein interaction networks and complexes with biological relevance.
Zhang, Bing; Park, Byung-Hoon; Karpinets, Tatiana; Samatova, Nagiza F
2008-04-01
Recent improvements in high-throughput Mass Spectrometry (MS) technology have expedited genome-wide discovery of protein-protein interactions by providing a capability of detecting protein complexes in a physiological setting. Computational inference of protein interaction networks and protein complexes from MS data are challenging. Advances are required in developing robust and seamlessly integrated procedures for assessment of protein-protein interaction affinities, mathematical representation of protein interaction networks, discovery of protein complexes and evaluation of their biological relevance. A multi-step but easy-to-follow framework for identifying protein complexes from MS pull-down data is introduced. It assesses interaction affinity between two proteins based on similarity of their co-purification patterns derived from MS data. It constructs a protein interaction network by adopting a knowledge-guided threshold selection method. Based on the network, it identifies protein complexes and infers their core components using a graph-theoretical approach. It deploys a statistical evaluation procedure to assess biological relevance of each found complex. On Saccharomyces cerevisiae pull-down data, the framework outperformed other more complicated schemes by at least 10% in F(1)-measure and identified 610 protein complexes with high-functional homogeneity based on the enrichment in Gene Ontology (GO) annotation. Manual examination of the complexes brought forward the hypotheses on cause of false identifications. Namely, co-purification of different protein complexes as mediated by a common non-protein molecule, such as DNA, might be a source of false positives. Protein identification bias in pull-down technology, such as the hydrophilic bias could result in false negatives.
Improved orthologous databases to ease protozoan targets inference.
Kotowski, Nelson; Jardim, Rodrigo; Dávila, Alberto M R
2015-09-29
Homology inference helps on identifying similarities, as well as differences among organisms, which provides a better insight on how closely related one might be to another. In addition, comparative genomics pipelines are widely adopted tools designed using different bioinformatics applications and algorithms. In this article, we propose a methodology to build improved orthologous databases with the potential to aid on protozoan target identification, one of the many tasks which benefit from comparative genomics tools. Our analyses are based on OrthoSearch, a comparative genomics pipeline originally designed to infer orthologs through protein-profile comparison, supported by an HMM, reciprocal best hits based approach. Our methodology allows OrthoSearch to confront two orthologous databases and to generate an improved new one. Such can be later used to infer potential protozoan targets through a similarity analysis against the human genome. The protein sequences of Cryptosporidium hominis, Entamoeba histolytica and Leishmania infantum genomes were comparatively analyzed against three orthologous databases: (i) EggNOG KOG, (ii) ProtozoaDB and (iii) Kegg Orthology (KO). That allowed us to create two new orthologous databases, "KO + EggNOG KOG" and "KO + EggNOG KOG + ProtozoaDB", with 16,938 and 27,701 orthologous groups, respectively. Such new orthologous databases were used for a regular OrthoSearch run. By confronting "KO + EggNOG KOG" and "KO + EggNOG KOG + ProtozoaDB" databases and protozoan species we were able to detect the following total of orthologous groups and coverage (relation between the inferred orthologous groups and the species total number of proteins): Cryptosporidium hominis: 1,821 (11 %) and 3,254 (12 %); Entamoeba histolytica: 2,245 (13 %) and 5,305 (19 %); Leishmania infantum: 2,702 (16 %) and 4,760 (17 %). Using our HMM-based methodology and the largest created orthologous database, it was possible to infer 13 orthologous groups which represent potential protozoan targets; these were found because of our distant homology approach. We also provide the number of species-specific, pair-to-pair and core groups from such analyses, depicted in Venn diagrams. The orthologous databases generated by our HMM-based methodology provide a broader dataset, with larger amounts of orthologous groups when compared to the original databases used as input. Those may be used for several homology inference analyses, annotation tasks and protozoan targets identification.
Domain repertoires as a tool to derive protein recognition rules.
Zucconi, A; Panni, S; Paoluzi, S; Castagnoli, L; Dente, L; Cesareni, G
2000-08-25
Several approaches, some of which are described in this issue, have been proposed to assemble a complete protein interaction map. These are often based on high throughput methods that explore the ability of each gene product to bind any other element of the proteome of the organism. Here we propose that a large number of interactions can be inferred by revealing the rules underlying recognition specificity of a small number (a few hundreds) of families of protein recognition modules. This can be achieved through the construction and characterization of domain repertoires. A domain repertoire is assembled in a combinatorial fashion by allowing each amino acid position in the binding site of a given protein recognition domain to vary to include all the residues allowed at that position in the domain family. The repertoire is then searched by phage display techniques with any target of interest and from the primary structure of the binding site of the selected domains one derives rules that are used to infer the formation of complexes between natural proteins in the cell.
Principal Component Analysis: A Method for Determining the Essential Dynamics of Proteins
David, Charles C.; Jacobs, Donald J.
2015-01-01
It has become commonplace to employ principal component analysis to reveal the most important motions in proteins. This method is more commonly known by its acronym, PCA. While most popular molecular dynamics packages inevitably provide PCA tools to analyze protein trajectories, researchers often make inferences of their results without having insight into how to make interpretations, and they are often unaware of limitations and generalizations of such analysis. Here we review best practices for applying standard PCA, describe useful variants, discuss why one may wish to make comparison studies, and describe a set of metrics that make comparisons possible. In practice, one will be forced to make inferences about the essential dynamics of a protein without having the desired amount of samples. Therefore, considerable time is spent on describing how to judge the significance of results, highlighting pitfalls. The topic of PCA is reviewed from the perspective of many practical considerations, and useful recipes are provided. PMID:24061923
Principal component analysis: a method for determining the essential dynamics of proteins.
David, Charles C; Jacobs, Donald J
2014-01-01
It has become commonplace to employ principal component analysis to reveal the most important motions in proteins. This method is more commonly known by its acronym, PCA. While most popular molecular dynamics packages inevitably provide PCA tools to analyze protein trajectories, researchers often make inferences of their results without having insight into how to make interpretations, and they are often unaware of limitations and generalizations of such analysis. Here we review best practices for applying standard PCA, describe useful variants, discuss why one may wish to make comparison studies, and describe a set of metrics that make comparisons possible. In practice, one will be forced to make inferences about the essential dynamics of a protein without having the desired amount of samples. Therefore, considerable time is spent on describing how to judge the significance of results, highlighting pitfalls. The topic of PCA is reviewed from the perspective of many practical considerations, and useful recipes are provided.
Complete fold annotation of the human proteome using a novel structural feature space
Middleton, Sarah A.; Illuminati, Joseph; Kim, Junhyong
2017-04-13
Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this methodmore » by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Finally, our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.« less
Fuertes, Gustavo; Banterle, Niccolò; Ruff, Kiersten M.; Chowdhury, Aritra; Mercadante, Davide; Koehler, Christine; Kachala, Michael; Estrada Girona, Gemma; Milles, Sigrid; Mishra, Ankur; Onck, Patrick R.; Gräter, Frauke; Esteban-Martín, Santiago; Pappu, Rohit V.; Svergun, Dmitri I.; Lemke, Edward A.
2017-01-01
Unfolded states of proteins and native states of intrinsically disordered proteins (IDPs) populate heterogeneous conformational ensembles in solution. The average sizes of these heterogeneous systems, quantified by the radius of gyration (RG), can be measured by small-angle X-ray scattering (SAXS). Another parameter, the mean dye-to-dye distance (RE) for proteins with fluorescently labeled termini, can be estimated using single-molecule Förster resonance energy transfer (smFRET). A number of studies have reported inconsistencies in inferences drawn from the two sets of measurements for the dimensions of unfolded proteins and IDPs in the absence of chemical denaturants. These differences are typically attributed to the influence of fluorescent labels used in smFRET and to the impact of high concentrations and averaging features of SAXS. By measuring the dimensions of a collection of labeled and unlabeled polypeptides using smFRET and SAXS, we directly assessed the contributions of dyes to the experimental values RG and RE. For chemically denatured proteins we obtain mutual consistency in our inferences based on RG and RE, whereas for IDPs under native conditions, we find substantial deviations. Using computations, we show that discrepant inferences are neither due to methodological shortcomings of specific measurements nor due to artifacts of dyes. Instead, our analysis suggests that chemical heterogeneity in heteropolymeric systems leads to a decoupling between RE and RG that is amplified in the absence of denaturants. Therefore, joint assessments of RG and RE combined with measurements of polymer shapes should provide a consistent and complete picture of the underlying ensembles. PMID:28716919
Nanoscale lateral displacement arrays for the separation of exosomes and colloids down to 20 nm
NASA Astrophysics Data System (ADS)
Austin, Robert; Wunsch, Benjamin; Smith, Joshua; Gifford, Stacey; Wang, Chao; Brink, Markus; Bruce, Robert; Stolovitzky, Gustavo; Astier, Yann
Deterministic lateral displacement (DLD) pillar arrays are an efficient technology to sort, separate and enrich micrometre-scale particles, which include parasites1, bacteria2, blood cells3 and circulating tumour cells in blood4. However, this technology has not been translated to the true nanoscale, where it could function on biocolloids, such as exosomes. Exosomes, a key target of liquid biopsies, are secreted by cells and contain nucleic acid and protein information about their originating tissue5. One challenge in the study of exosome biology is to sort exosomes by size and surface markers6, 7. We use manufacturable silicon processes to produce nanoscale DLD (nano-DLD) arrays of uniform gap sizes ranging from 25 to 235 nm. We show that at low Péclet (Pe) numbers, at which diffusion and deterministic displacement compete, nano-DLD arrays separate particles between 20 to 110 nm based on size with sharp resolution. Further, we demonstrate the size-based displacement of exosomes, and so open up the potential for on-chip sorting and quantification of these important biocolloids.
Nanoscale lateral displacement arrays for the separation of exosomes and colloids down to 20 nm
NASA Astrophysics Data System (ADS)
Wunsch, Benjamin H.; Smith, Joshua T.; Gifford, Stacey M.; Wang, Chao; Brink, Markus; Bruce, Robert L.; Austin, Robert H.; Stolovitzky, Gustavo; Astier, Yann
2016-11-01
Deterministic lateral displacement (DLD) pillar arrays are an efficient technology to sort, separate and enrich micrometre-scale particles, which include parasites, bacteria, blood cells and circulating tumour cells in blood. However, this technology has not been translated to the true nanoscale, where it could function on biocolloids, such as exosomes. Exosomes, a key target of 'liquid biopsies', are secreted by cells and contain nucleic acid and protein information about their originating tissue. One challenge in the study of exosome biology is to sort exosomes by size and surface markers. We use manufacturable silicon processes to produce nanoscale DLD (nano-DLD) arrays of uniform gap sizes ranging from 25 to 235 nm. We show that at low Péclet (Pe) numbers, at which diffusion and deterministic displacement compete, nano-DLD arrays separate particles between 20 to 110 nm based on size with sharp resolution. Further, we demonstrate the size-based displacement of exosomes, and so open up the potential for on-chip sorting and quantification of these important biocolloids.
A deterministic computer simulation model of life-cycle lamb and wool production.
Wang, C T; Dickerson, G E
1991-11-01
A deterministic mathematical computer model was developed to simulate effects on life-cycle efficiency of lamb and wool production from genetic improvement of performance traits under alternative management systems. Genetic input parameters can be varied for age at puberty, length of anestrus, fertility, precocity of fertility, number born, milk yield, mortality, growth rate, body fat, and wool growth. Management options include mating systems, lambing intervals, feeding levels, creep feeding, weaning age, marketing age or weight, and culling policy. Simulated growth of animals is linear from birth to inflection point, then slows asymptotically to specified mature empty BW and fat content when nutrition is not limiting. The ME intake requirement to maintain normal condition is calculated daily or weekly for maintenance, protein and fat deposition, wool growth, gestation, and lactation. Simulated feed intake is the minimum of availability, DM physical limit, or ME physiological limit. Tissue catabolism occurs when intake is below the requirement for essential functions. Mortality increases when BW is depressed. Equations developed for calculations of biological functions were validated with published and unpublished experimental data. Lifetime totals are accumulated for TDN, DM, and protein intake and for market lamb equivalent output values of empty body or carcass lean and wool from both lambs and ewes. These measures of efficiency for combinations of genetic, management, and marketing variables can provide the relative economic weighting of traits needed to derive optimal criteria for genetic selection among and within breeds under defined industry production systems.
USDA-ARS?s Scientific Manuscript database
Inferences about lactation responses to diet have been hypothesized to be affected by the use of change-over instead of continuous experimental designs. A direct test of this hypothesis has not been well studied. Additionally, when dietary protein level is changed it must occur through dilution with...
Prediction of kinase-inhibitor binding affinity using energetic parameters
Usha, Singaravelu; Selvaraj, Samuel
2016-01-01
The combination of physicochemical properties and energetic parameters derived from protein-ligand complexes play a vital role in determining the biological activity of a molecule. In the present work, protein-ligand interaction energy along with logP values was used to predict the experimental log (IC50) values of 25 different kinase-inhibitors using multiple regressions which gave a correlation coefficient of 0.93. The regression equation obtained was tested on 93 kinase-inhibitor complexes and an average deviation of 0.92 from the experimental log IC50 values was shown. The same set of descriptors was used to predict binding affinities for a test set of five individual kinase families, with correlation values > 0.9. We show that the protein-ligand interaction energies and partition coefficient values form the major deterministic factors for binding affinity of the ligand for its receptor. PMID:28149052
The relationship between stochastic and deterministic quasi-steady state approximations.
Kim, Jae Kyoung; Josić, Krešimir; Bennett, Matthew R
2015-11-23
The quasi steady-state approximation (QSSA) is frequently used to reduce deterministic models of biochemical networks. The resulting equations provide a simplified description of the network in terms of non-elementary reaction functions (e.g. Hill functions). Such deterministic reductions are frequently a basis for heuristic stochastic models in which non-elementary reaction functions are used to define reaction propensities. Despite their popularity, it remains unclear when such stochastic reductions are valid. It is frequently assumed that the stochastic reduction can be trusted whenever its deterministic counterpart is accurate. However, a number of recent examples show that this is not necessarily the case. Here we explain the origin of these discrepancies, and demonstrate a clear relationship between the accuracy of the deterministic and the stochastic QSSA for examples widely used in biological systems. With an analysis of a two-state promoter model, and numerical simulations for a variety of other models, we find that the stochastic QSSA is accurate whenever its deterministic counterpart provides an accurate approximation over a range of initial conditions which cover the likely fluctuations from the quasi steady-state (QSS). We conjecture that this relationship provides a simple and computationally inexpensive way to test the accuracy of reduced stochastic models using deterministic simulations. The stochastic QSSA is one of the most popular multi-scale stochastic simulation methods. While the use of QSSA, and the resulting non-elementary functions has been justified in the deterministic case, it is not clear when their stochastic counterparts are accurate. In this study, we show how the accuracy of the stochastic QSSA can be tested using their deterministic counterparts providing a concrete method to test when non-elementary rate functions can be used in stochastic simulations.
Elaziz, Mohamed Abd; Hemdan, Ahmed Monem; Hassanien, AboulElla; Oliva, Diego; Xiong, Shengwu
2017-09-07
The current economics of the fish protein industry demand rapid, accurate and expressive prediction algorithms at every step of protein production especially with the challenge of global climate change. This help to predict and analyze functional and nutritional quality then consequently control food allergies in hyper allergic patients. As, it is quite expensive and time-consuming to know these concentrations by the lab experimental tests, especially to conduct large-scale projects. Therefore, this paper introduced a new intelligent algorithm using adaptive neuro-fuzzy inference system based on whale optimization algorithm. This algorithm is used to predict the concentration levels of bioactive amino acids in fish protein hydrolysates at different times during the year. The whale optimization algorithm is used to determine the optimal parameters in adaptive neuro-fuzzy inference system. The results of proposed algorithm are compared with others and it is indicated the higher performance of the proposed algorithm.
Phylogenetic inference under varying proportions of indel-induced alignment gaps
Dwivedi, Bhakti; Gadagkar, Sudhindra R
2009-01-01
Background The effect of alignment gaps on phylogenetic accuracy has been the subject of numerous studies. In this study, we investigated the relationship between the total number of gapped sites and phylogenetic accuracy, when the gaps were introduced (by means of computer simulation) to reflect indel (insertion/deletion) events during the evolution of DNA sequences. The resulting (true) alignments were subjected to commonly used gap treatment and phylogenetic inference methods. Results (1) In general, there was a strong – almost deterministic – relationship between the amount of gap in the data and the level of phylogenetic accuracy when the alignments were very "gappy", (2) gaps resulting from deletions (as opposed to insertions) contributed more to the inaccuracy of phylogenetic inference, (3) the probabilistic methods (Bayesian, PhyML & "MLε, " a method implemented in DNAML in PHYLIP) performed better at most levels of gap percentage when compared to parsimony (MP) and distance (NJ) methods, with Bayesian analysis being clearly the best, (4) methods that treat gapped sites as missing data yielded less accurate trees when compared to those that attribute phylogenetic signal to the gapped sites (by coding them as binary character data – presence/absence, or as in the MLε method), and (5) in general, the accuracy of phylogenetic inference depended upon the amount of available data when the gaps resulted from mainly deletion events, and the amount of missing data when insertion events were equally likely to have caused the alignment gaps. Conclusion When gaps in an alignment are a consequence of indel events in the evolution of the sequences, the accuracy of phylogenetic analysis is likely to improve if: (1) alignment gaps are categorized as arising from insertion events or deletion events and then treated separately in the analysis, (2) the evolutionary signal provided by indels is harnessed in the phylogenetic analysis, and (3) methods that utilize the phylogenetic signal in indels are developed for distance methods too. When the true homology is known and the amount of gaps is 20 percent of the alignment length or less, the methods used in this study are likely to yield trees with 90–100 percent accuracy. PMID:19698168
NASA Astrophysics Data System (ADS)
Paasche, Hendrik
2018-01-01
Site characterization requires detailed and ideally spatially continuous information about the subsurface. Geophysical tomographic experiments allow for spatially continuous imaging of physical parameter variations, e.g., seismic wave propagation velocities. Such physical parameters are often related to typical geotechnical or hydrological target parameters, e.g. as achieved from 1D direct push or borehole logging. Here, the probabilistic inference of 2D tip resistance, sleeve friction, and relative dielectric permittivity distributions in near-surface sediments is constrained by ill-posed cross-borehole seismic P- and S-wave and radar wave traveltime tomography. In doing so, we follow a discovery science strategy employing a fully data-driven approach capable of accounting for tomographic ambiguity and differences in spatial resolution between the geophysical tomograms and the geotechnical logging data used for calibration. We compare the outcome to results achieved employing classical hypothesis-driven approaches, i.e., deterministic transfer functions derived empirically for the inference of 2D sleeve friction from S-wave velocity tomograms and theoretically for the inference of 2D dielectric permittivity from radar wave velocity tomograms. The data-driven approach offers maximal flexibility in combination with very relaxed considerations about the character of the expected links. This makes it a versatile tool applicable to almost any combination of data sets. However, error propagation may be critical and justify thinking about a hypothesis-driven pre-selection of an optimal database going along with the risk of excluding relevant information from the analyses. Results achieved by transfer function rely on information about the nature of the link and optimal calibration settings drawn as retrospective hypothesis by other authors. Applying such transfer functions at other sites turns them into a priori valid hypothesis, which can, particularly for empirically derived transfer functions, result in poor predictions. However, a mindful utilization and critical evaluation of the consequences of turning a retrospectively drawn hypothesis into an a priori valid hypothesis can also result in good results for inference and prediction problems when using classical transfer function concepts.
Deterministic Walks with Choice
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beeler, Katy E.; Berenhaut, Kenneth S.; Cooper, Joshua N.
2014-01-10
This paper studies deterministic movement over toroidal grids, integrating local information, bounded memory and choice at individual nodes. The research is motivated by recent work on deterministic random walks, and applications in multi-agent systems. Several results regarding passing tokens through toroidal grids are discussed, as well as some open questions.
MacGilvray, Matthew E; Shishkova, Evgenia; Chasman, Deborah; Place, Michael; Gitter, Anthony; Coon, Joshua J; Gasch, Audrey P
2018-05-01
Cells respond to stressful conditions by coordinating a complex, multi-faceted response that spans many levels of physiology. Much of the response is coordinated by changes in protein phosphorylation. Although the regulators of transcriptome changes during stress are well characterized in Saccharomyces cerevisiae, the upstream regulatory network controlling protein phosphorylation is less well dissected. Here, we developed a computational approach to infer the signaling network that regulates phosphorylation changes in response to salt stress. We developed an approach to link predicted regulators to groups of likely co-regulated phospho-peptides responding to stress, thereby creating new edges in a background protein interaction network. We then use integer linear programming (ILP) to integrate wild type and mutant phospho-proteomic data and predict the network controlling stress-activated phospho-proteomic changes. The network we inferred predicted new regulatory connections between stress-activated and growth-regulating pathways and suggested mechanisms coordinating metabolism, cell-cycle progression, and growth during stress. We confirmed several network predictions with co-immunoprecipitations coupled with mass-spectrometry protein identification and mutant phospho-proteomic analysis. Results show that the cAMP-phosphodiesterase Pde2 physically interacts with many stress-regulated transcription factors targeted by PKA, and that reduced phosphorylation of those factors during stress requires the Rck2 kinase that we show physically interacts with Pde2. Together, our work shows how a high-quality computational network model can facilitate discovery of new pathway interactions during osmotic stress.
Nanotransfer and nanoreplication using deterministically grown sacrificial nanotemplates
Melechko, Anatoli V [Oak Ridge, TN; McKnight, Timothy E. , Guillorn, Michael A.; Ilic, Bojan [Ithaca, NY; Merkulov, Vladimir I [Knoxville, TN; Doktycz, Mitchel J [Knoxville, TN; Lowndes, Douglas H [Knoxville, TN; Simpson, Michael L [Knoxville, TN
2011-05-17
Methods, manufactures, machines and compositions are described for nanotransfer and nanoreplication using deterministically grown sacrificial nanotemplates. A method includes depositing a catalyst particle on a surface of a substrate to define a deterministically located position; growing an aligned elongated nanostructure on the substrate, an end of the aligned elongated nanostructure coupled to the substrate at the deterministically located position; coating the aligned elongated nanostructure with a conduit material; removing a portion of the conduit material to expose the catalyst particle; removing the catalyst particle; and removing the elongated nanostructure to define a nanoconduit.
NASA Astrophysics Data System (ADS)
Itoh, Kosuke; Nakada, Tsutomu
2013-04-01
Deterministic nonlinear dynamical processes are ubiquitous in nature. Chaotic sounds generated by such processes may appear irregular and random in waveform, but these sounds are mathematically distinguished from random stochastic sounds in that they contain deterministic short-time predictability in their temporal fine structures. We show that the human brain distinguishes deterministic chaotic sounds from spectrally matched stochastic sounds in neural processing and perception. Deterministic chaotic sounds, even without being attended to, elicited greater cerebral cortical responses than the surrogate control sounds after about 150 ms in latency after sound onset. Listeners also clearly discriminated these sounds in perception. The results support the hypothesis that the human auditory system is sensitive to the subtle short-time predictability embedded in the temporal fine structure of sounds.
A deterministic particle method for one-dimensional reaction-diffusion equations
NASA Technical Reports Server (NTRS)
Mascagni, Michael
1995-01-01
We derive a deterministic particle method for the solution of nonlinear reaction-diffusion equations in one spatial dimension. This deterministic method is an analog of a Monte Carlo method for the solution of these problems that has been previously investigated by the author. The deterministic method leads to the consideration of a system of ordinary differential equations for the positions of suitably defined particles. We then consider the time explicit and implicit methods for this system of ordinary differential equations and we study a Picard and Newton iteration for the solution of the implicit system. Next we solve numerically this system and study the discretization error both analytically and numerically. Numerical computation shows that this deterministic method is automatically adaptive to large gradients in the solution.
Fuertes, Gustavo; Banterle, Niccolò; Ruff, Kiersten M; Chowdhury, Aritra; Mercadante, Davide; Koehler, Christine; Kachala, Michael; Estrada Girona, Gemma; Milles, Sigrid; Mishra, Ankur; Onck, Patrick R; Gräter, Frauke; Esteban-Martín, Santiago; Pappu, Rohit V; Svergun, Dmitri I; Lemke, Edward A
2017-08-01
Unfolded states of proteins and native states of intrinsically disordered proteins (IDPs) populate heterogeneous conformational ensembles in solution. The average sizes of these heterogeneous systems, quantified by the radius of gyration ( R G ), can be measured by small-angle X-ray scattering (SAXS). Another parameter, the mean dye-to-dye distance ( R E ) for proteins with fluorescently labeled termini, can be estimated using single-molecule Förster resonance energy transfer (smFRET). A number of studies have reported inconsistencies in inferences drawn from the two sets of measurements for the dimensions of unfolded proteins and IDPs in the absence of chemical denaturants. These differences are typically attributed to the influence of fluorescent labels used in smFRET and to the impact of high concentrations and averaging features of SAXS. By measuring the dimensions of a collection of labeled and unlabeled polypeptides using smFRET and SAXS, we directly assessed the contributions of dyes to the experimental values R G and R E For chemically denatured proteins we obtain mutual consistency in our inferences based on R G and R E , whereas for IDPs under native conditions, we find substantial deviations. Using computations, we show that discrepant inferences are neither due to methodological shortcomings of specific measurements nor due to artifacts of dyes. Instead, our analysis suggests that chemical heterogeneity in heteropolymeric systems leads to a decoupling between R E and R G that is amplified in the absence of denaturants. Therefore, joint assessments of R G and R E combined with measurements of polymer shapes should provide a consistent and complete picture of the underlying ensembles.
Deterministic and Stochastic Analysis of a Prey-Dependent Predator-Prey System
ERIC Educational Resources Information Center
Maiti, Alakes; Samanta, G. P.
2005-01-01
This paper reports on studies of the deterministic and stochastic behaviours of a predator-prey system with prey-dependent response function. The first part of the paper deals with the deterministic analysis of uniform boundedness, permanence, stability and bifurcation. In the second part the reproductive and mortality factors of the prey and…
ShinyGPAS: interactive genomic prediction accuracy simulator based on deterministic formulas.
Morota, Gota
2017-12-20
Deterministic formulas for the accuracy of genomic predictions highlight the relationships among prediction accuracy and potential factors influencing prediction accuracy prior to performing computationally intensive cross-validation. Visualizing such deterministic formulas in an interactive manner may lead to a better understanding of how genetic factors control prediction accuracy. The software to simulate deterministic formulas for genomic prediction accuracy was implemented in R and encapsulated as a web-based Shiny application. Shiny genomic prediction accuracy simulator (ShinyGPAS) simulates various deterministic formulas and delivers dynamic scatter plots of prediction accuracy versus genetic factors impacting prediction accuracy, while requiring only mouse navigation in a web browser. ShinyGPAS is available at: https://chikudaisei.shinyapps.io/shinygpas/ . ShinyGPAS is a shiny-based interactive genomic prediction accuracy simulator using deterministic formulas. It can be used for interactively exploring potential factors that influence prediction accuracy in genome-enabled prediction, simulating achievable prediction accuracy prior to genotyping individuals, or supporting in-class teaching. ShinyGPAS is open source software and it is hosted online as a freely available web-based resource with an intuitive graphical user interface.
Stochastic calculus of protein filament formation under spatial confinement
NASA Astrophysics Data System (ADS)
Michaels, Thomas C. T.; Dear, Alexander J.; Knowles, Tuomas P. J.
2018-05-01
The growth of filamentous aggregates from precursor proteins is a process of central importance to both normal and aberrant biology, for instance as the driver of devastating human disorders such as Alzheimer's and Parkinson's diseases. The conventional theoretical framework for describing this class of phenomena in bulk is based upon the mean-field limit of the law of mass action, which implicitly assumes deterministic dynamics. However, protein filament formation processes under spatial confinement, such as in microdroplets or in the cellular environment, show intrinsic variability due to the molecular noise associated with small-volume effects. To account for this effect, in this paper we introduce a stochastic differential equation approach for investigating protein filament formation processes under spatial confinement. Using this framework, we study the statistical properties of stochastic aggregation curves, as well as the distribution of reaction lag-times. Moreover, we establish the gradual breakdown of the correlation between lag-time and normalized growth rate under spatial confinement. Our results establish the key role of spatial confinement in determining the onset of stochasticity in protein filament formation and offer a formalism for studying protein aggregation kinetics in small volumes in terms of the kinetic parameters describing the aggregation dynamics in bulk.
NASA Astrophysics Data System (ADS)
De Ridder, K.; Bertrand, C.; Casanova, G.; Lefebvre, W.
2012-09-01
Increasingly, mesoscale meteorological and climate models are used to predict urban weather and climate. Yet, large uncertainties remain regarding values of some urban surface properties. In particular, information concerning urban values for thermal roughness length and thermal admittance is scarce. In this paper, we present a method to estimate values for thermal admittance in combination with an optimal scheme for thermal roughness length, based on METEOSAT-8/SEVIRI thermal infrared imagery in conjunction with a deterministic atmospheric model containing a simple urbanized land surface scheme. Given the spatial resolution of the SEVIRI sensor, the resulting parameter values are applicable at scales of the order of 5 km. As a study case we focused on the city of Paris, for the day of 29 June 2006. Land surface temperature was calculated from SEVIRI thermal radiances using a new split-window algorithm specifically designed to handle urban conditions, as described inAppendix A, including a correction for anisotropy effects. Land surface temperature was also calculated in an ensemble of simulations carried out with the ARPS mesoscale atmospheric model, combining different thermal roughness length parameterizations with a range of thermal admittance values. Particular care was taken to spatially match the simulated land surface temperature with the SEVIRI field of view, using the so-called point spread function of the latter. Using Bayesian inference, the best agreement between simulated and observed land surface temperature was obtained for the Zilitinkevich (1970) and Brutsaert (1975) thermal roughness length parameterizations, the latter with the coefficients obtained by Kanda et al. (2007). The retrieved thermal admittance values associated with either thermal roughness parameterization were, respectively, 1843 ± 108 J m-2 s-1/2 K-1 and 1926 ± 115 J m-2 s-1/2 K-1.
Engineering molecular machines
NASA Astrophysics Data System (ADS)
Erman, Burak
2016-04-01
Biological molecular motors use chemical energy, mostly in the form of ATP hydrolysis, and convert it to mechanical energy. Correlated thermal fluctuations are essential for the function of a molecular machine and it is the hydrolysis of ATP that modifies the correlated fluctuations of the system. Correlations are consequences of the molecular architecture of the protein. The idea that synthetic molecular machines may be constructed by designing the proper molecular architecture is challenging. In their paper, Sarkar et al (2016 New J. Phys. 18 043006) propose a synthetic molecular motor based on the coarse grained elastic network model of proteins and show by numerical simulations that motor function is realized, ranging from deterministic to thermal, depending on temperature. This work opens up a new range of possibilities of molecular architecture based engine design.
Visualization of newt aragonitic otoconial matrices using transmission electron microscopy
NASA Technical Reports Server (NTRS)
Steyger, P. S.; Wiederhold, M. L.
1995-01-01
Otoconia are calcified protein matrices within the gravity-sensing organs of the vertebrate vestibular system. These protein matrices are thought to originate from the supporting or hair cells in the macula during development. Previous studies of mammalian calcitic, barrel-shaped otoconia revealed an organized protein matrix consisting of a thin peripheral layer, a well-defined organic core and a flocculent matrix inbetween. No studies have reported the microscopic organization of the aragonitic otoconial matrix, despite its protein characterization. Pote et al. (1993b) used densitometric methods and inferred that prismatic (aragonitic) otoconia have a peripheral protein distribution, compared to that described for the barrel-shaped, calcitic otoconia of birds, mammals, and the amphibian utricle. By using tannic acid as a negative stain, we observed three kinds of organic matrices in preparations of fixed, decalcified saccular otoconia from the adult newt: (1) fusiform shapes with a homogenous electron-dense matrix; (2) singular and multiple strands of matrix; and (3) more significantly, prismatic shapes outlined by a peripheral organic matrix. These prismatic shapes remain following removal of the gelatinous matrix, revealing an internal array of organic matter. We conclude that prismatic otoconia have a largely peripheral otoconial matrix, as inferred by densitometry.
Combinatorial Labeling Method for Improving Peptide Fragmentation in Mass Spectrometry
NASA Astrophysics Data System (ADS)
Kuchibhotla, Bhanuramanand; Kola, Sankara Rao; Medicherla, Jagannadham V.; Cherukuvada, Swamy V.; Dhople, Vishnu M.; Nalam, Madhusudhana Rao
2017-06-01
Annotation of peptide sequence from tandem mass spectra constitutes the central step of mass spectrometry-based proteomics. Peptide mass spectra are obtained upon gas-phase fragmentation. Identification of the protein from a set of experimental peptide spectral matches is usually referred as protein inference. Occurrence and intensity of these fragment ions in the MS/MS spectra are dependent on many factors such as amino acid composition, peptide basicity, activation mode, protease, etc. Particularly, chemical derivatizations of peptides were known to alter their fragmentation. In this study, the influence of acetylation, guanidinylation, and their combination on peptide fragmentation was assessed initially on a lipase (LipA) from Bacillus subtilis followed by a bovine six protein mix digest. The dual modification resulted in improved fragment ion occurrence and intensity changes, and this resulted in the equivalent representation of b- and y-type fragment ions in an ion trap MS/MS spectrum. The improved representation has allowed us to accurately annotate the peptide sequences de novo. Dual labeling has significantly reduced the false positive protein identifications in standard bovine six peptide digest. Our study suggests that the combinatorial labeling of peptides is a useful method to validate protein identifications for high confidence protein inference. [Figure not available: see fulltext.
Boissinot, Sylvaine; Erdinger, Monique; Monsion, Baptiste; Ziegler-Graff, Véronique; Brault, Véronique
2014-01-01
Cucurbit aphid-borne yellows virus (CABYV) is a polerovirus (Luteoviridae family) with a capsid composed of the major coat protein and a minor component referred to as the readthrough protein (RT). Two forms of the RT were reported: a full-length protein of 74 kDa detected in infected plants and a truncated form of 55 kDa (RT*) incorporated into virions. Both forms were detected in CABYV-infected plants. To clarify the specific roles of each protein in the viral cycle, we generated by deletion a polerovirus mutant able to synthesize only the RT* which is incorporated into the particle. This mutant was unable to move systemically from inoculated leaves inferring that the C-terminal half of the RT is required for efficient long-distance transport of CABYV. Among a collection of CABYV mutants bearing point mutations in the central domain of the RT, we obtained a mutant impaired in the correct processing of the RT which does not produce the RT*. This mutant accumulated very poorly in upper non-inoculated leaves, suggesting that the RT* has a functional role in long-distance movement of CABYV. Taken together, these results infer that both RT proteins are required for an efficient CABYV movement.
Boissinot, Sylvaine; Erdinger, Monique; Monsion, Baptiste; Ziegler-Graff, Véronique; Brault, Véronique
2014-01-01
Cucurbit aphid-borne yellows virus (CABYV) is a polerovirus (Luteoviridae family) with a capsid composed of the major coat protein and a minor component referred to as the readthrough protein (RT). Two forms of the RT were reported: a full-length protein of 74 kDa detected in infected plants and a truncated form of 55 kDa (RT*) incorporated into virions. Both forms were detected in CABYV-infected plants. To clarify the specific roles of each protein in the viral cycle, we generated by deletion a polerovirus mutant able to synthesize only the RT* which is incorporated into the particle. This mutant was unable to move systemically from inoculated leaves inferring that the C-terminal half of the RT is required for efficient long-distance transport of CABYV. Among a collection of CABYV mutants bearing point mutations in the central domain of the RT, we obtained a mutant impaired in the correct processing of the RT which does not produce the RT*. This mutant accumulated very poorly in upper non-inoculated leaves, suggesting that the RT* has a functional role in long-distance movement of CABYV. Taken together, these results infer that both RT proteins are required for an efficient CABYV movement. PMID:24691251
2012-01-01
Background ChIP-seq provides new opportunities to study allele-specific protein-DNA binding (ASB). However, detecting allelic imbalance from a single ChIP-seq dataset often has low statistical power since only sequence reads mapped to heterozygote SNPs are informative for discriminating two alleles. Results We develop a new method iASeq to address this issue by jointly analyzing multiple ChIP-seq datasets. iASeq uses a Bayesian hierarchical mixture model to learn correlation patterns of allele-specificity among multiple proteins. Using the discovered correlation patterns, the model allows one to borrow information across datasets to improve detection of allelic imbalance. Application of iASeq to 77 ChIP-seq samples from 40 ENCODE datasets and 1 genomic DNA sample in GM12878 cells reveals that allele-specificity of multiple proteins are highly correlated, and demonstrates the ability of iASeq to improve allelic inference compared to analyzing each individual dataset separately. Conclusions iASeq illustrates the value of integrating multiple datasets in the allele-specificity inference and offers a new tool to better analyze ASB. PMID:23194258
Quantitative Proteomics via High Resolution MS Quantification: Capabilities and Limitations
Higgs, Richard E.; Butler, Jon P.; Han, Bomie; Knierman, Michael D.
2013-01-01
Recent improvements in the mass accuracy and resolution of mass spectrometers have led to renewed interest in label-free quantification using data from the primary mass spectrum (MS1) acquired from data-dependent proteomics experiments. The capacity for higher specificity quantification of peptides from samples enriched for proteins of biological interest offers distinct advantages for hypothesis generating experiments relative to immunoassay detection methods or prespecified peptide ions measured by multiple reaction monitoring (MRM) approaches. Here we describe an evaluation of different methods to post-process peptide level quantification information to support protein level inference. We characterize the methods by examining their ability to recover a known dilution of a standard protein in background matrices of varying complexity. Additionally, the MS1 quantification results are compared to a standard, targeted, MRM approach on the same samples under equivalent instrument conditions. We show the existence of multiple peptides with MS1 quantification sensitivity similar to the best MRM peptides for each of the background matrices studied. Based on these results we provide recommendations on preferred approaches to leveraging quantitative measurements of multiple peptides to improve protein level inference. PMID:23710359
NASA Astrophysics Data System (ADS)
Kumar, R.; Sulaiman, E.; Soomro, H. A.; Jusoh, L. I.; Bahrim, F. S.; Omar, M. F.
2017-08-01
The recent change in innovation and employments of high-temperature magnets, permanent magnet flux switching machine (PMFSM) has turned out to be one of the suitable contenders for seaward boring, however, less intended for downhole because of high atmospheric temperature. Subsequently, this extensive review manages the design enhancement and performance examination of external rotor PMFSM for the downhole application. Preparatory, the essential design parameters required for machine configuration are computed numerically. At that point, the design enhancement strategy is actualized through deterministic technique. At last, preliminary and refined execution of the machine is contrasted and as a consequence, the yield torque is raised from 16.39Nm to 33.57Nm while depreciating the cogging torque and PM weight up to 1.77Nm and 0.79kg, individually. In this manner, it is inferred that purposed enhanced design of 12slot-22pole with external rotor is convenient for the downhole application.
Spin-Orbit-Coupled Interferometry with Ring-Trapped Bose-Einstein Condensates
NASA Astrophysics Data System (ADS)
Helm, J. L.; Billam, T. P.; Rakonjac, A.; Cornish, S. L.; Gardiner, S. A.
2018-02-01
We propose a method of atom interferometry using a spinor Bose-Einstein condensate with a time-varying magnetic field acting as a coherent beam splitter. Our protocol creates long-lived superpositional counterflow states, which are of fundamental interest and can be made sensitive to both the Sagnac effect and magnetic fields on the sub-μ G scale. We split a ring-trapped condensate, initially in the mf=0 hyperfine state, into superpositions of internal mf=±1 states and condensate superflow, which are spin-orbit coupled. After interrogation, the relative phase accumulation can be inferred from a population transfer to the mf=±1 states. The counterflow generation protocol is adiabatically deterministic and does not rely on coupling to additional optical fields or mechanical stirring techniques. Our protocol can maximize the classical Fisher information for any rotation, magnetic field, or interrogation time and so has the maximum sensitivity available to uncorrelated particles. Precision can increase with the interrogation time and so is limited only by the lifetime of the condensate.
Design of an Ada expert system shell for the VHSIC avionic modular flight processor
NASA Technical Reports Server (NTRS)
Fanning, F. Jesse
1992-01-01
The Embedded Computer System Expert System Shell (ES Shell) is an Ada-based expert system shell developed at the Avionics Laboratory for use on the VHSIC Avionic Modular Processor (VAMP) running under the Ada Avionics Real-Time Software (AARTS) Operating System. The ES Shell provides the interface between the expert system and the avionics environment, and controls execution of the expert system. Testing of the ES Shell in the Avionics Laboratory's Integrated Test Bed (ITB) has demonstrated its ability to control a non-deterministic software application executing on the VAMP's which can control the ITB's real-time closed-loop aircraft simulation. The results of these tests and the conclusions reached in the design and development of the ES Shell have played an important role in the formulation of the requirements for a production-quality expert system inference engine, an ingredient necessary for the successful use of expert systems on the VAMP embedded avionic flight processor.
Alikhani, Jamal; Takacs, Imre; Al-Omari, Ahmed; Murthy, Sudhir; Massoudieh, Arash
2017-03-01
A parameter estimation framework was used to evaluate the ability of observed data from a full-scale nitrification-denitrification bioreactor to reduce the uncertainty associated with the bio-kinetic and stoichiometric parameters of an activated sludge model (ASM). Samples collected over a period of 150 days from the effluent as well as from the reactor tanks were used. A hybrid genetic algorithm and Bayesian inference were used to perform deterministic and parameter estimations, respectively. The main goal was to assess the ability of the data to obtain reliable parameter estimates for a modified version of the ASM. The modified ASM model includes methylotrophic processes which play the main role in methanol-fed denitrification. Sensitivity analysis was also used to explain the ability of the data to provide information about each of the parameters. The results showed that the uncertainty in the estimates of the most sensitive parameters (including growth rate, decay rate, and yield coefficients) decreased with respect to the prior information.
Mixture models for protein structure ensembles.
Hirsch, Michael; Habeck, Michael
2008-10-01
Protein structure ensembles provide important insight into the dynamics and function of a protein and contain information that is not captured with a single static structure. However, it is not clear a priori to what extent the variability within an ensemble is caused by internal structural changes. Additional variability results from overall translations and rotations of the molecule. And most experimental data do not provide information to relate the structures to a common reference frame. To report meaningful values of intrinsic dynamics, structural precision, conformational entropy, etc., it is therefore important to disentangle local from global conformational heterogeneity. We consider the task of disentangling local from global heterogeneity as an inference problem. We use probabilistic methods to infer from the protein ensemble missing information on reference frames and stable conformational sub-states. To this end, we model a protein ensemble as a mixture of Gaussian probability distributions of either entire conformations or structural segments. We learn these models from a protein ensemble using the expectation-maximization algorithm. Our first model can be used to find multiple conformers in a structure ensemble. The second model partitions the protein chain into locally stable structural segments or core elements and less structured regions typically found in loops. Both models are simple to implement and contain only a single free parameter: the number of conformers or structural segments. Our models can be used to analyse experimental ensembles, molecular dynamics trajectories and conformational change in proteins. The Python source code for protein ensemble analysis is available from the authors upon request.
Inferring ontology graph structures using OWL reasoning.
Rodríguez-García, Miguel Ángel; Hoehndorf, Robert
2018-01-05
Ontologies are representations of a conceptualization of a domain. Traditionally, ontologies in biology were represented as directed acyclic graphs (DAG) which represent the backbone taxonomy and additional relations between classes. These graphs are widely exploited for data analysis in the form of ontology enrichment or computation of semantic similarity. More recently, ontologies are developed in a formal language such as the Web Ontology Language (OWL) and consist of a set of axioms through which classes are defined or constrained. While the taxonomy of an ontology can be inferred directly from the axioms of an ontology as one of the standard OWL reasoning tasks, creating general graph structures from OWL ontologies that exploit the ontologies' semantic content remains a challenge. We developed a method to transform ontologies into graphs using an automated reasoner while taking into account all relations between classes. Searching for (existential) patterns in the deductive closure of ontologies, we can identify relations between classes that are implied but not asserted and generate graph structures that encode for a large part of the ontologies' semantic content. We demonstrate the advantages of our method by applying it to inference of protein-protein interactions through semantic similarity over the Gene Ontology and demonstrate that performance is increased when graph structures are inferred using deductive inference according to our method. Our software and experiment results are available at http://github.com/bio-ontology-research-group/Onto2Graph . Onto2Graph is a method to generate graph structures from OWL ontologies using automated reasoning. The resulting graphs can be used for improved ontology visualization and ontology-based data analysis.
Yu, Xiaoyu; Reva, Oleg N
2018-01-01
Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA.
Yu, Xiaoyu; Reva, Oleg N
2018-01-01
Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA. PMID:29511354
Handfield, Louis-François; Chong, Yolanda T.; Simmons, Jibril; Andrews, Brenda J.; Moses, Alan M.
2013-01-01
Protein subcellular localization has been systematically characterized in budding yeast using fluorescently tagged proteins. Based on the fluorescence microscopy images, subcellular localization of many proteins can be classified automatically using supervised machine learning approaches that have been trained to recognize predefined image classes based on statistical features. Here, we present an unsupervised analysis of protein expression patterns in a set of high-resolution, high-throughput microscope images. Our analysis is based on 7 biologically interpretable features which are evaluated on automatically identified cells, and whose cell-stage dependency is captured by a continuous model for cell growth. We show that it is possible to identify most previously identified localization patterns in a cluster analysis based on these features and that similarities between the inferred expression patterns contain more information about protein function than can be explained by a previous manual categorization of subcellular localization. Furthermore, the inferred cell-stage associated to each fluorescence measurement allows us to visualize large groups of proteins entering the bud at specific stages of bud growth. These correspond to proteins localized to organelles, revealing that the organelles must be entering the bud in a stereotypical order. We also identify and organize a smaller group of proteins that show subtle differences in the way they move around the bud during growth. Our results suggest that biologically interpretable features based on explicit models of cell morphology will yield unprecedented power for pattern discovery in high-resolution, high-throughput microscopy images. PMID:23785265
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Y M; Bush, K; Han, B
Purpose: Accurate and fast dose calculation is a prerequisite of precision radiation therapy in modern photon and particle therapy. While Monte Carlo (MC) dose calculation provides high dosimetric accuracy, the drastically increased computational time hinders its routine use. Deterministic dose calculation methods are fast, but problematic in the presence of tissue density inhomogeneity. We leverage the useful features of deterministic methods and MC to develop a hybrid dose calculation platform with autonomous utilization of MC and deterministic calculation depending on the local geometry, for optimal accuracy and speed. Methods: Our platform utilizes a Geant4 based “localized Monte Carlo” (LMC) methodmore » that isolates MC dose calculations only to volumes that have potential for dosimetric inaccuracy. In our approach, additional structures are created encompassing heterogeneous volumes. Deterministic methods calculate dose and energy fluence up to the volume surfaces, where the energy fluence distribution is sampled into discrete histories and transported using MC. Histories exiting the volume are converted back into energy fluence, and transported deterministically. By matching boundary conditions at both interfaces, deterministic dose calculation account for dose perturbations “downstream” of localized heterogeneities. Hybrid dose calculation was performed for water and anthropomorphic phantoms. Results: We achieved <1% agreement between deterministic and MC calculations in the water benchmark for photon and proton beams, and dose differences of 2%–15% could be observed in heterogeneous phantoms. The saving in computational time (a factor ∼4–7 compared to a full Monte Carlo dose calculation) was found to be approximately proportional to the volume of the heterogeneous region. Conclusion: Our hybrid dose calculation approach takes advantage of the computational efficiency of deterministic method and accuracy of MC, providing a practical tool for high performance dose calculation in modern RT. The approach is generalizable to all modalities where heterogeneities play a large role, notably particle therapy.« less
Orenstein, Yaron; Wang, Yuhao; Berger, Bonnie
2016-06-15
Protein-RNA interactions, which play vital roles in many processes, are mediated through both RNA sequence and structure. CLIP-based methods, which measure protein-RNA binding in vivo, suffer from experimental noise and systematic biases, whereas in vitro experiments capture a clearer signal of protein RNA-binding. Among them, RNAcompete provides binding affinities of a specific protein to more than 240 000 unstructured RNA probes in one experiment. The computational challenge is to infer RNA structure- and sequence-based binding models from these data. The state-of-the-art in sequence models, Deepbind, does not model structural preferences. RNAcontext models both sequence and structure preferences, but is outperformed by GraphProt. Unfortunately, GraphProt cannot detect structural preferences from RNAcompete data due to the unstructured nature of the data, as noted by its developers, nor can it be tractably run on the full RNACompete dataset. We develop RCK, an efficient, scalable algorithm that infers both sequence and structure preferences based on a new k-mer based model. Remarkably, even though RNAcompete data is designed to be unstructured, RCK can still learn structural preferences from it. RCK significantly outperforms both RNAcontext and Deepbind in in vitro binding prediction for 244 RNAcompete experiments. Moreover, RCK is also faster and uses less memory, which enables scalability. While currently on par with existing methods in in vivo binding prediction on a small scale test, we demonstrate that RCK will increasingly benefit from experimentally measured RNA structure profiles as compared to computationally predicted ones. By running RCK on the entire RNAcompete dataset, we generate and provide as a resource a set of protein-RNA structure-based models on an unprecedented scale. Software and models are freely available at http://rck.csail.mit.edu/ bab@mit.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
The past, present and future of cyber-physical systems: a focus on models.
Lee, Edward A
2015-02-26
This paper is about better engineering of cyber-physical systems (CPSs) through better models. Deterministic models have historically proven extremely useful and arguably form the kingpin of the industrial revolution and the digital and information technology revolutions. Key deterministic models that have proven successful include differential equations, synchronous digital logic and single-threaded imperative programs. Cyber-physical systems, however, combine these models in such a way that determinism is not preserved. Two projects show that deterministic CPS models with faithful physical realizations are possible and practical. The first project is PRET, which shows that the timing precision of synchronous digital logic can be practically made available at the software level of abstraction. The second project is Ptides (programming temporally-integrated distributed embedded systems), which shows that deterministic models for distributed cyber-physical systems have practical faithful realizations. These projects are existence proofs that deterministic CPS models are possible and practical.
The Past, Present and Future of Cyber-Physical Systems: A Focus on Models
Lee, Edward A.
2015-01-01
This paper is about better engineering of cyber-physical systems (CPSs) through better models. Deterministic models have historically proven extremely useful and arguably form the kingpin of the industrial revolution and the digital and information technology revolutions. Key deterministic models that have proven successful include differential equations, synchronous digital logic and single-threaded imperative programs. Cyber-physical systems, however, combine these models in such a way that determinism is not preserved. Two projects show that deterministic CPS models with faithful physical realizations are possible and practical. The first project is PRET, which shows that the timing precision of synchronous digital logic can be practically made available at the software level of abstraction. The second project is Ptides (programming temporally-integrated distributed embedded systems), which shows that deterministic models for distributed cyber-physical systems have practical faithful realizations. These projects are existence proofs that deterministic CPS models are possible and practical. PMID:25730486
Hands-on-Entropy, Energy Balance with Biological Relevance
NASA Astrophysics Data System (ADS)
Reeves, Mark
2015-03-01
Entropy changes underlie the physics that dominates biological interactions. Indeed, introductory biology courses often begin with an exploration of the qualities of water that are important to living systems. However, one idea that is not explicitly addressed in most introductory physics or biology textbooks is important contribution of the entropy in driving fundamental biological processes towards equilibrium. From diffusion to cell-membrane formation, to electrostatic binding in protein folding, to the functioning of nerve cells, entropic effects often act to counterbalance deterministic forces such as electrostatic attraction and in so doing, allow for effective molecular signaling. A small group of biology, biophysics and computer science faculty have worked together for the past five years to develop curricular modules (based on SCALEUP pedagogy). This has enabled students to create models of stochastic and deterministic processes. Our students are first-year engineering and science students in the calculus-based physics course and they are not expected to know biology beyond the high-school level. In our class, they learn to reduce complex biological processes and structures in order model them mathematically to account for both deterministic and probabilistic processes. The students test these models in simulations and in laboratory experiments that are biologically relevant such as diffusion, ionic transport, and ligand-receptor binding. Moreover, the students confront random forces and traditional forces in problems, simulations, and in laboratory exploration throughout the year-long course as they move from traditional kinematics through thermodynamics to electrostatic interactions. This talk will present a number of these exercises, with particular focus on the hands-on experiments done by the students, and will give examples of the tangible material that our students work with throughout the two-semester sequence of their course on introductory physics with a bio focus. Supported by NSF DUE.
Djordjevic, Michael A; Chen, Han Cai; Natera, Siria; Van Noorden, Giel; Menzel, Christian; Taylor, Scott; Renard, Clotilde; Geiger, Otto; Weiller, Georg F
2003-06-01
A proteomic examination of Sinorhizobium meliloti strain 1021 was undertaken using a combination of 2-D gel electrophoresis, peptide mass fingerprinting, and bioinformatics. Our goal was to identify (i) putative symbiosis- or nutrient-stress-specific proteins, (ii) the biochemical pathways active under different conditions, (iii) potential new genes, and (iv) the extent of posttranslational modifications of S. meliloti proteins. In total, we identified the protein products of 810 genes (13.1% of the genome's coding capacity). The 810 genes generated 1,180 gene products, with chromosomal genes accounting for 78% of the gene products identified (18.8% of the chromosome's coding capacity). The activity of 53 metabolic pathways was inferred from bioinformatic analysis of proteins with assigned Enzyme Commission numbers. Of the remaining proteins that did not encode enzymes, ABC-type transporters composed 12.7% and regulatory proteins 3.4% of the total. Proteins with up to seven transmembrane domains were identified in membrane preparations. A total of 27 putative nodule-specific proteins and 35 nutrient-stress-specific proteins were identified and used as a basis to define genes and describe processes occurring in S. meliloti cells in nodules and under stress. Several nodule proteins from the plant host were present in the nodule bacteria preparations. We also identified seven potentially novel proteins not predicted from the DNA sequence. Post-translational modifications such as N-terminal processing could be inferred from the data. The posttranslational addition of UMP to the key regulator of nitrogen metabolism, PII, was demonstrated. This work demonstrates the utility of combining mass spectrometry with protein arraying or separation techniques to identify candidate genes involved in important biological processes and niche occupations that may be intransigent to other methods of gene expression profiling.
Inferring nucleosome positions with their histone mark annotation from ChIP data
Mammana, Alessandro; Vingron, Martin; Chung, Ho-Ryun
2013-01-01
Motivation: The nucleosome is the basic repeating unit of chromatin. It contains two copies each of the four core histones H2A, H2B, H3 and H4 and about 147 bp of DNA. The residues of the histone proteins are subject to numerous post-translational modifications, such as methylation or acetylation. Chromatin immunoprecipitiation followed by sequencing (ChIP-seq) is a technique that provides genome-wide occupancy data of these modified histone proteins, and it requires appropriate computational methods. Results: We present NucHunter, an algorithm that uses the data from ChIP-seq experiments directed against many histone modifications to infer positioned nucleosomes. NucHunter annotates each of these nucleosomes with the intensities of the histone modifications. We demonstrate that these annotations can be used to infer nucleosomal states with distinct correlations to underlying genomic features and chromatin-related processes, such as transcriptional start sites, enhancers, elongation by RNA polymerase II and chromatin-mediated repression. Thus, NucHunter is a versatile tool that can be used to predict positioned nucleosomes from a panel of histone modification ChIP-seq experiments and infer distinct histone modification patterns associated to different chromatin states. Availability: The software is available at http://epigen.molgen.mpg.de/nuchunter/. Contact: chung@molgen.mpg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23981350
Stability analysis of multi-group deterministic and stochastic epidemic models with vaccination rate
NASA Astrophysics Data System (ADS)
Wang, Zhi-Gang; Gao, Rui-Mei; Fan, Xiao-Ming; Han, Qi-Xing
2014-09-01
We discuss in this paper a deterministic multi-group MSIR epidemic model with a vaccination rate, the basic reproduction number ℛ0, a key parameter in epidemiology, is a threshold which determines the persistence or extinction of the disease. By using Lyapunov function techniques, we show if ℛ0 is greater than 1 and the deterministic model obeys some conditions, then the disease will prevail, the infective persists and the endemic state is asymptotically stable in a feasible region. If ℛ0 is less than or equal to 1, then the infective disappear so the disease dies out. In addition, stochastic noises around the endemic equilibrium will be added to the deterministic MSIR model in order that the deterministic model is extended to a system of stochastic ordinary differential equations. In the stochastic version, we carry out a detailed analysis on the asymptotic behavior of the stochastic model. In addition, regarding the value of ℛ0, when the stochastic system obeys some conditions and ℛ0 is greater than 1, we deduce the stochastic system is stochastically asymptotically stable. Finally, the deterministic and stochastic model dynamics are illustrated through computer simulations.
Effects of delay and noise in a negative feedback regulatory motif
NASA Astrophysics Data System (ADS)
Palassini, Matteo; Dies, Marta
2009-03-01
The small copy number of the molecules involved in gene regulation can induce nontrivial stochastic phenomena such as noise-induced oscillations. An often neglected aspect of regulation dynamics are the delays involved in transcription and translation. Delays introduce analytical and computational complications because the dynamics is non-Markovian. We study the interplay of noise and delays in a negative feedback model of the p53 core regulatory network. Recent experiments have found pronounced oscillations in the concentrations of proteins p53 and Mdm2 in individual cells subjected to DNA damage. Similar oscillations occur in the Hes-1 and NK-kB systems, and in circadian rhythms. Several mechanisms have been proposed to explain this oscillatory behaviour, such as deterministic limit cycles, with and without delay, or noise-induced excursions in excitable models. We consider a generic delayed Master Equation incorporating the activation of Mdm2 by p53 and the Mdm2-promoted degradation of p53. In the deterministic limit and for large delays, the model shows a Hopf bifurcation. Via exact stochastic simulations, we find strong noise-induced oscillations well outside the limit-cycle region. We propose that this may be a generic mechanism for oscillations in gene regulatory systems.
Multi-Harmony: detecting functional specificity from sequence alignment
Brandt, Bernd W.; Feenstra, K. Anton; Heringa, Jaap
2010-01-01
Many protein families contain sub-families with functional specialization, such as binding different ligands or being involved in different protein–protein interactions. A small number of amino acids generally determine functional specificity. The identification of these residues can aid the understanding of protein function and help finding targets for experimental analysis. Here, we present multi-Harmony, an interactive web sever for detecting sub-type-specific sites in proteins starting from a multiple sequence alignment. Combining our Sequence Harmony (SH) and multi-Relief (mR) methods in one web server allows simultaneous analysis and comparison of specificity residues; furthermore, both methods have been significantly improved and extended. SH has been extended to cope with more than two sub-groups. mR has been changed from a sampling implementation to a deterministic one, making it more consistent and user friendly. For both methods Z-scores are reported. The multi-Harmony web server produces a dynamic output page, which includes interactive connections to the Jalview and Jmol applets, thereby allowing interactive analysis of the results. Multi-Harmony is available at http://www.ibi.vu.nl/ programs/shmrwww. PMID:20525785
Using neighborhood cohesiveness to infer interactions between protein domains.
Segura, Joan; Sorzano, C O S; Cuenca-Alba, Jesus; Aloy, Patrick; Carazo, J M
2015-08-01
In recent years, large-scale studies have been undertaken to describe, at least partially, protein-protein interaction maps, or interactomes, for a number of relevant organisms, including human. However, current interactomes provide a somehow limited picture of the molecular details involving protein interactions, mostly because essential experimental information, especially structural data, is lacking. Indeed, the gap between structural and interactomics information is enlarging and thus, for most interactions, key experimental information is missing. We elaborate on the observation that many interactions between proteins involve a pair of their constituent domains and, thus, the knowledge of how protein domains interact adds very significant information to any interactomic analysis. In this work, we describe a novel use of the neighborhood cohesiveness property to infer interactions between protein domains given a protein interaction network. We have shown that some clustering coefficients can be extended to measure a degree of cohesiveness between two sets of nodes within a network. Specifically, we used the meet/min coefficient to measure the proportion of interacting nodes between two sets of nodes and the fraction of common neighbors. This approach extends previous works where homolog coefficients were first defined around network nodes and later around edges. The proposed approach substantially increases both the number of predicted domain-domain interactions as well as its accuracy as compared with current methods. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Bacterial growth laws reflect the evolutionary importance of energy efficiency.
Maitra, Arijit; Dill, Ken A
2015-01-13
We are interested in the balance of energy and protein synthesis in bacterial growth. How has evolution optimized this balance? We describe an analytical model that leverages extensive literature data on growth laws to infer the underlying fitness landscape and to draw inferences about what evolution has optimized in Escherichia coli. Is E. coli optimized for growth speed, energy efficiency, or some other property? Experimental data show that at its replication speed limit, E. coli produces about four mass equivalents of nonribosomal proteins for every mass equivalent of ribosomes. This ratio can be explained if the cell's fitness function is the the energy efficiency of cells under fast growth conditions, indicating a tradeoff between the high energy costs of ribosomes under fast growth and the high energy costs of turning over nonribosomal proteins under slow growth. This model gives insight into some of the complex nonlinear relationships between energy utilization and ribosomal and nonribosomal production as a function of cell growth conditions.
NovelFam3000 – Uncharacterized human protein domains conserved across model organisms
Kemmer, Danielle; Podowski, Raf M; Arenillas, David; Lim, Jonathan; Hodges, Emily; Roth, Peggy; Sonnhammer, Erik LL; Höög, Christer; Wasserman, Wyeth W
2006-01-01
Background Despite significant efforts from the research community, an extensive portion of the proteins encoded by human genes lack an assigned cellular function. Most metazoan proteins are composed of structural and/or functional domains, of which many appear in multiple proteins. Once a domain is characterized in one protein, the presence of a similar sequence in an uncharacterized protein serves as a basis for inference of function. Thus knowledge of a domain's function, or the protein within which it arises, can facilitate the analysis of an entire set of proteins. Description From the Pfam domain database, we extracted uncharacterized protein domains represented in proteins from humans, worms, and flies. A data centre was created to facilitate the analysis of the uncharacterized domain-containing proteins. The centre both provides researchers with links to dispersed internet resources containing gene-specific experimental data and enables them to post relevant experimental results or comments. For each human gene in the system, a characterization score is posted, allowing users to track the progress of characterization over time or to identify for study uncharacterized domains in well-characterized genes. As a test of the system, a subset of 39 domains was selected for analysis and the experimental results posted to the NovelFam3000 system. For 25 human protein members of these 39 domain families, detailed sub-cellular localizations were determined. Specific observations are presented based on the analysis of the integrated information provided through the online NovelFam3000 system. Conclusion Consistent experimental results between multiple members of a domain family allow for inferences of the domain's functional role. We unite bioinformatics resources and experimental data in order to accelerate the functional characterization of scarcely annotated domain families. PMID:16533400
Modeling gene regulatory networks: A network simplification algorithm
NASA Astrophysics Data System (ADS)
Ferreira, Luiz Henrique O.; de Castro, Maria Clicia S.; da Silva, Fabricio A. B.
2016-12-01
Boolean networks have been used for some time to model Gene Regulatory Networks (GRNs), which describe cell functions. Those models can help biologists to make predictions, prognosis and even specialized treatment when some disturb on the GRN lead to a sick condition. However, the amount of information related to a GRN can be huge, making the task of inferring its boolean network representation quite a challenge. The method shown here takes into account information about the interactome to build a network, where each node represents a protein, and uses the entropy of each node as a key to reduce the size of the network, allowing the further inferring process to focus only on the main protein hubs, the ones with most potential to interfere in overall network behavior.
Song, Jia; Zheng, Sisi; Nguyen, Nhung; Wang, Youjun; Zhou, Yubin; Lin, Kui
2017-10-03
Because phylogenetic inference is an important basis for answering many evolutionary problems, a large number of algorithms have been developed. Some of these algorithms have been improved by integrating gene evolution models with the expectation of accommodating the hierarchy of evolutionary processes. To the best of our knowledge, however, there still is no single unifying model or algorithm that can take all evolutionary processes into account through a stepwise or simultaneous method. On the basis of three existing phylogenetic inference algorithms, we built an integrated pipeline for inferring the evolutionary history of a given gene family; this pipeline can model gene sequence evolution, gene duplication-loss, gene transfer and multispecies coalescent processes. As a case study, we applied this pipeline to the STIMATE (TMEM110) gene family, which has recently been reported to play an important role in store-operated Ca 2+ entry (SOCE) mediated by ORAI and STIM proteins. We inferred their phylogenetic trees in 69 sequenced chordate genomes. By integrating three tree reconstruction algorithms with diverse evolutionary models, a pipeline for inferring the evolutionary history of a gene family was developed, and its application was demonstrated.
Mirroring co-evolving trees in the light of their topologies.
Hajirasouliha, Iman; Schönhuth, Alexander; de Juan, David; Valencia, Alfonso; Sahinalp, S Cenk
2012-05-01
Determining the interaction partners among protein/domain families poses hard computational problems, in particular in the presence of paralogous proteins. Available approaches aim to identify interaction partners among protein/domain families through maximizing the similarity between trimmed versions of their phylogenetic trees. Since maximization of any natural similarity score is computationally difficult, many approaches employ heuristics to evaluate the distance matrices corresponding to the tree topologies in question. In this article, we devise an efficient deterministic algorithm which directly maximizes the similarity between two leaf labeled trees with edge lengths, obtaining a score-optimal alignment of the two trees in question. Our algorithm is significantly faster than those methods based on distance matrix comparison: 1 min on a single processor versus 730 h on a supercomputer. Furthermore, we outperform the current state-of-the-art exhaustive search approach in terms of precision, while incurring acceptable losses in recall. A C implementation of the method demonstrated in this article is available at http://compbio.cs.sfu.ca/mirrort.htm
Modelling toxin effects on protein biosynthesis in eukaryotic cells.
Skakauskas, Vladas; Katauskis, Pranas
2017-08-01
We present a rather generic model for toxin (ricin) inhibition of protein biosynthesis in eukaryotic cells. We also study reduction of the ricin toxic effects with application of antibodies against the RTB subunit of ricin molecules. Both species initially are delivered extracellularly. The model accounts for the pinocytotic and receptor-mediated toxin endocytosis and the intact toxin exocytotic removal out of the cell. The model also includes the lysosomal toxin destruction, the intact toxin motion to the endoplasmic reticulum (ER) for separation of its molecules into the RTA and RTB subunits, and the RTA chain translocation into the cytosol. In the cytosol, one portion of the RTA undergoes degradation via the ERAD. The other its portion can inactivate ribosomes at a large rate. The model is based on a system of deterministic ODEs. The influence of the kinetic parameters on the protein concentration and antibody protection factor is studied in detail. Copyright © 2017 Elsevier Ltd. All rights reserved.
Lin, Xiaotong; Liu, Mei; Chen, Xue-wen
2009-04-29
Protein-protein interactions play vital roles in nearly all cellular processes and are involved in the construction of biological pathways such as metabolic and signal transduction pathways. Although large-scale experiments have enabled the discovery of thousands of previously unknown linkages among proteins in many organisms, the high-throughput interaction data is often associated with high error rates. Since protein interaction networks have been utilized in numerous biological inferences, the inclusive experimental errors inevitably affect the quality of such prediction. Thus, it is essential to assess the quality of the protein interaction data. In this paper, a novel Bayesian network-based integrative framework is proposed to assess the reliability of protein-protein interactions. We develop a cross-species in silico model that assigns likelihood scores to individual protein pairs based on the information entirely extracted from model organisms. Our proposed approach integrates multiple microarray datasets and novel features derived from gene ontology. Furthermore, the confidence scores for cross-species protein mappings are explicitly incorporated into our model. Applying our model to predict protein interactions in the human genome, we are able to achieve 80% in sensitivity and 70% in specificity. Finally, we assess the overall quality of the experimentally determined yeast protein-protein interaction dataset. We observe that the more high-throughput experiments confirming an interaction, the higher the likelihood score, which confirms the effectiveness of our approach. This study demonstrates that model organisms certainly provide important information for protein-protein interaction inference and assessment. The proposed method is able to assess not only the overall quality of an interaction dataset, but also the quality of individual protein-protein interactions. We expect the method to continually improve as more high quality interaction data from more model organisms becomes available and is readily scalable to a genome-wide application.
2010-01-01
Background Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets. Results Here we illustrate and validate the ability of the SIMBAL method to find functionally relevant short sequence signatures by application to two well-characterized protein families. In the first example, we partitioned a family of ABC permeases using a metabolic background property (urea utilization). Thus, the TRUE set for this family comprised members whose genome of origin encoded a urea utilization system. By moving a sliding window across the sequence of a permease, and searching each subsequence in turn against the full set of partitioned proteins, the method found which local sequence signatures best correlated with the urea utilization trait. Mapping of SIMBAL "hot spots" onto crystal structures of homologous permeases reveals that the significant sites are gating determinants on the cytosolic face rather than, say, docking sites for the substrate-binding protein on the extracellular face. In the second example, we partitioned a protein methyltransferase family using gene proximity as a criterion. In this case, the TRUE set comprised those methyltransferases encoded near the gene for the substrate RF-1. SIMBAL identifies sequence regions that map onto the substrate-binding interface while ignoring regions involved in the methyltransferase reaction mechanism in general. Neither method for training set construction requires any prior experimental characterization. Conclusions SIMBAL shows that, in functionally divergent protein families, selected short sequences often significantly outperform their full-length parent sequence for making functional predictions by sequence similarity, suggesting avenues for improved functional classifiers. When combined with structural data, SIMBAL affords the ability to localize and model functional sites. PMID:20102603
Bistability: Requirements on Cell-Volume, Protein Diffusion, and Thermodynamics
Endres, Robert G.
2015-01-01
Bistability is considered wide-spread among bacteria and eukaryotic cells, useful e.g. for enzyme induction, bet hedging, and epigenetic switching. However, this phenomenon has mostly been described with deterministic dynamic or well-mixed stochastic models. Here, we map known biological bistable systems onto the well-characterized biochemical Schlögl model, using analytical calculations and stochastic spatiotemporal simulations. In addition to network architecture and strong thermodynamic driving away from equilibrium, we show that bistability requires fine-tuning towards small cell volumes (or compartments) and fast protein diffusion (well mixing). Bistability is thus fragile and hence may be restricted to small bacteria and eukaryotic nuclei, with switching triggered by volume changes during the cell cycle. For large volumes, single cells generally loose their ability for bistable switching and instead undergo a first-order phase transition. PMID:25874711
Deterministic and stochastic CTMC models from Zika disease transmission
NASA Astrophysics Data System (ADS)
Zevika, Mona; Soewono, Edy
2018-03-01
Zika infection is one of the most important mosquito-borne diseases in the world. Zika virus (ZIKV) is transmitted by many Aedes-type mosquitoes including Aedes aegypti. Pregnant women with the Zika virus are at risk of having a fetus or infant with a congenital defect and suffering from microcephaly. Here, we formulate a Zika disease transmission model using two approaches, a deterministic model and a continuous-time Markov chain stochastic model. The basic reproduction ratio is constructed from a deterministic model. Meanwhile, the CTMC stochastic model yields an estimate of the probability of extinction and outbreaks of Zika disease. Dynamical simulations and analysis of the disease transmission are shown for the deterministic and stochastic models.
Distinguishing between stochasticity and determinism: Examples from cell cycle duration variability.
Pearl Mizrahi, Sivan; Sandler, Oded; Lande-Diner, Laura; Balaban, Nathalie Q; Simon, Itamar
2016-01-01
We describe a recent approach for distinguishing between stochastic and deterministic sources of variability, focusing on the mammalian cell cycle. Variability between cells is often attributed to stochastic noise, although it may be generated by deterministic components. Interestingly, lineage information can be used to distinguish between variability and determinism. Analysis of correlations within a lineage of the mammalian cell cycle duration revealed its deterministic nature. Here, we discuss the sources of such variability and the possibility that the underlying deterministic process is due to the circadian clock. Finally, we discuss the "kicked cell cycle" model and its implication on the study of the cell cycle in healthy and cancerous tissues. © 2015 WILEY Periodicals, Inc.
Robustness of Reconstructed Ancestral Protein Functions to Statistical Uncertainty.
Eick, Geeta N; Bridgham, Jamie T; Anderson, Douglas P; Harms, Michael J; Thornton, Joseph W
2017-02-01
Hypotheses about the functions of ancient proteins and the effects of historical mutations on them are often tested using ancestral protein reconstruction (APR)-phylogenetic inference of ancestral sequences followed by synthesis and experimental characterization. Usually, some sequence sites are ambiguously reconstructed, with two or more statistically plausible states. The extent to which the inferred functions and mutational effects are robust to uncertainty about the ancestral sequence has not been studied systematically. To address this issue, we reconstructed ancestral proteins in three domain families that have different functions, architectures, and degrees of uncertainty; we then experimentally characterized the functional robustness of these proteins when uncertainty was incorporated using several approaches, including sampling amino acid states from the posterior distribution at each site and incorporating the alternative amino acid state at every ambiguous site in the sequence into a single "worst plausible case" protein. In every case, qualitative conclusions about the ancestral proteins' functions and the effects of key historical mutations were robust to sequence uncertainty, with similar functions observed even when scores of alternate amino acids were incorporated. There was some variation in quantitative descriptors of function among plausible sequences, suggesting that experimentally characterizing robustness is particularly important when quantitative estimates of ancient biochemical parameters are desired. The worst plausible case method appears to provide an efficient strategy for characterizing the functional robustness of ancestral proteins to large amounts of sequence uncertainty. Sampling from the posterior distribution sometimes produced artifactually nonfunctional proteins for sequences reconstructed with substantial ambiguity. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Network-based function prediction and interactomics: the case for metabolic enzymes.
Janga, S C; Díaz-Mejía, J Javier; Moreno-Hagelsieb, G
2011-01-01
As sequencing technologies increase in power, determining the functions of unknown proteins encoded by the DNA sequences so produced becomes a major challenge. Functional annotation is commonly done on the basis of amino-acid sequence similarity alone. Long after sequence similarity becomes undetectable by pair-wise comparison, profile-based identification of homologs can often succeed due to the conservation of position-specific patterns, important for a protein's three dimensional folding and function. Nevertheless, prediction of protein function from homology-driven approaches is not without problems. Homologous proteins might evolve different functions and the power of homology detection has already started to reach its maximum. Computational methods for inferring protein function, which exploit the context of a protein in cellular networks, have come to be built on top of homology-based approaches. These network-based functional inference techniques provide both a first hand hint into a proteins' functional role and offer complementary insights to traditional methods for understanding the function of uncharacterized proteins. Most recent network-based approaches aim to integrate diverse kinds of functional interactions to boost both coverage and confidence level. These techniques not only promise to solve the moonlighting aspect of proteins by annotating proteins with multiple functions, but also increase our understanding on the interplay between different functional classes in a cell. In this article we review the state of the art in network-based function prediction and describe some of the underlying difficulties and successes. Given the volume of high-throughput data that is being reported the time is ripe to employ these network-based approaches, which can be used to unravel the functions of the uncharacterized proteins accumulating in the genomic databases. © 2010 Elsevier Inc. All rights reserved.
Wallqvist, Anders; Wang, Hao; Zavaljevski, Nela; Memišević, Vesna; Kwon, Keehwan; Pieper, Rembert; Rajagopala, Seesandra V; Reifman, Jaques
2017-01-01
Coxiella burnetii is an obligate Gram-negative intracellular pathogen and the etiological agent of Q fever. Successful infection requires a functional Type IV secretion system, which translocates more than 100 effector proteins into the host cytosol to establish the infection, restructure the intracellular host environment, and create a parasitophorous vacuole where the replicating bacteria reside. We used yeast two-hybrid (Y2H) screening of 33 selected C. burnetii effectors against whole genome human and murine proteome libraries to generate a map of potential host-pathogen protein-protein interactions (PPIs). We detected 273 unique interactions between 20 pathogen and 247 human proteins, and 157 between 17 pathogen and 137 murine proteins. We used orthology to combine the data and create a single host-pathogen interaction network containing 415 unique interactions between 25 C. burnetii and 363 human proteins. We further performed complementary pairwise Y2H testing of 43 out of 91 C. burnetii-human interactions involving five pathogen proteins. We used the combined data to 1) perform enrichment analyses of target host cellular processes and pathways, 2) examine effectors with known infection phenotypes, and 3) infer potential mechanisms of action for four effectors with uncharacterized functions. The host-pathogen interaction profiles supported known Coxiella phenotypes, such as adapting cell morphology through cytoskeletal re-arrangements, protein processing and trafficking, organelle generation, cholesterol processing, innate immune modulation, and interactions with the ubiquitin and proteasome pathways. The generated dataset of PPIs-the largest collection of unbiased Coxiella host-pathogen interactions to date-represents a rich source of information with respect to secreted pathogen effector proteins and their interactions with human host proteins.
Guymon, Gary L.; Yen, Chung-Cheng
1990-01-01
The applicability of a deterministic-probabilistic model for predicting water tables in southern Owens Valley, California, is evaluated. The model is based on a two-layer deterministic model that is cascaded with a two-point probability model. To reduce the potentially large number of uncertain variables in the deterministic model, lumping of uncertain variables was evaluated by sensitivity analysis to reduce the total number of uncertain variables to three variables: hydraulic conductivity, storage coefficient or specific yield, and source-sink function. Results demonstrate that lumping of uncertain parameters reduces computational effort while providing sufficient precision for the case studied. Simulated spatial coefficients of variation for water table temporal position in most of the basin is small, which suggests that deterministic models can predict water tables in these areas with good precision. However, in several important areas where pumping occurs or the geology is complex, the simulated spatial coefficients of variation are over estimated by the two-point probability method.
NASA Astrophysics Data System (ADS)
Guymon, Gary L.; Yen, Chung-Cheng
1990-07-01
The applicability of a deterministic-probabilistic model for predicting water tables in southern Owens Valley, California, is evaluated. The model is based on a two-layer deterministic model that is cascaded with a two-point probability model. To reduce the potentially large number of uncertain variables in the deterministic model, lumping of uncertain variables was evaluated by sensitivity analysis to reduce the total number of uncertain variables to three variables: hydraulic conductivity, storage coefficient or specific yield, and source-sink function. Results demonstrate that lumping of uncertain parameters reduces computational effort while providing sufficient precision for the case studied. Simulated spatial coefficients of variation for water table temporal position in most of the basin is small, which suggests that deterministic models can predict water tables in these areas with good precision. However, in several important areas where pumping occurs or the geology is complex, the simulated spatial coefficients of variation are over estimated by the two-point probability method.
NASA Technical Reports Server (NTRS)
Bollman, W. E.; Chadwick, C.
1982-01-01
A number of interplanetary missions now being planned involve placing deterministic maneuvers along the flight path to alter the trajectory. Lee and Boain (1973) examined the statistics of trajectory correction maneuver (TCM) magnitude with no deterministic ('bias') component. The Delta v vector magnitude statistics were generated for several values of random Delta v standard deviations using expansions in terms of infinite hypergeometric series. The present investigation uses a different technique (Monte Carlo simulation) to generate Delta v magnitude statistics for a wider selection of random Delta v standard deviations and also extends the analysis to the case of nonzero deterministic Delta v's. These Delta v magnitude statistics are plotted parametrically. The plots are useful in assisting the analyst in quickly answering questions about the statistics of Delta v magnitude for single TCM's consisting of both a deterministic and a random component. The plots provide quick insight into the nature of the Delta v magnitude distribution for the TCM.
Inferring High-Confidence Human Protein-Protein Interactions
2012-01-01
comprised proteins that had the same specific func- tion or were subunits of the same protein complex, such as branched chain keto acid E1 alpha (BCKDHA...and branched chain keto acid E1 beta (BCKDHB) [3,29], and dynein cytoplasmic 2 intermediate chain 1 (D2LIC) and dynein cytoplasmic 2 heavy chain 1...474.3 28.0 1337.0 BCKDHA 5 Branched chain keto acid dehydro. E1, alpha BCKDHB 4 Branched chain keto acid dehydro. E1, beta 4 471.4 29.0 1337.5 ARTN 2
Despite the identification of MYCN amplification as an adverse prognostic marker in neuroblastoma, MYCN inhibitors have yet to be developed. Here, by integrating evidence from a whole-genome shRNA library screen and the computational inference of master regulator proteins, we identify transcription factor activating protein 4 (TFAP4) as a critical effector of MYCN amplification in neuroblastoma, providing a novel synthetic lethal target.
NASA Astrophysics Data System (ADS)
García, Constantino A.; Otero, Abraham; Félix, Paulo; Presedo, Jesús; Márquez, David G.
2018-07-01
In the past few decades, it has been recognized that 1 / f fluctuations are ubiquitous in nature. The most widely used mathematical models to capture the long-term memory properties of 1 / f fluctuations have been stochastic fractal models. However, physical systems do not usually consist of just stochastic fractal dynamics, but they often also show some degree of deterministic behavior. The present paper proposes a model based on fractal stochastic and deterministic components that can provide a valuable basis for the study of complex systems with long-term correlations. The fractal stochastic component is assumed to be a fractional Brownian motion process and the deterministic component is assumed to be a band-limited signal. We also provide a method that, under the assumptions of this model, is able to characterize the fractal stochastic component and to provide an estimate of the deterministic components present in a given time series. The method is based on a Bayesian wavelet shrinkage procedure that exploits the self-similar properties of the fractal processes in the wavelet domain. This method has been validated over simulated signals and over real signals with economical and biological origin. Real examples illustrate how our model may be useful for exploring the deterministic-stochastic duality of complex systems, and uncovering interesting patterns present in time series.
Ibrahim, Ahmad M.; Wilson, Paul P.H.; Sawan, Mohamed E.; ...
2015-06-30
The CADIS and FW-CADIS hybrid Monte Carlo/deterministic techniques dramatically increase the efficiency of neutronics modeling, but their use in the accurate design analysis of very large and geometrically complex nuclear systems has been limited by the large number of processors and memory requirements for their preliminary deterministic calculations and final Monte Carlo calculation. Three mesh adaptivity algorithms were developed to reduce the memory requirements of CADIS and FW-CADIS without sacrificing their efficiency improvement. First, a macromaterial approach enhances the fidelity of the deterministic models without changing the mesh. Second, a deterministic mesh refinement algorithm generates meshes that capture as muchmore » geometric detail as possible without exceeding a specified maximum number of mesh elements. Finally, a weight window coarsening algorithm decouples the weight window mesh and energy bins from the mesh and energy group structure of the deterministic calculations in order to remove the memory constraint of the weight window map from the deterministic mesh resolution. The three algorithms were used to enhance an FW-CADIS calculation of the prompt dose rate throughout the ITER experimental facility. Using these algorithms resulted in a 23.3% increase in the number of mesh tally elements in which the dose rates were calculated in a 10-day Monte Carlo calculation and, additionally, increased the efficiency of the Monte Carlo simulation by a factor of at least 3.4. The three algorithms enabled this difficult calculation to be accurately solved using an FW-CADIS simulation on a regular computer cluster, eliminating the need for a world-class super computer.« less
Improving ground-penetrating radar data in sedimentary rocks using deterministic deconvolution
Xia, J.; Franseen, E.K.; Miller, R.D.; Weis, T.V.; Byrnes, A.P.
2003-01-01
Resolution is key to confidently identifying unique geologic features using ground-penetrating radar (GPR) data. Source wavelet "ringing" (related to bandwidth) in a GPR section limits resolution because of wavelet interference, and can smear reflections in time and/or space. The resultant potential for misinterpretation limits the usefulness of GPR. Deconvolution offers the ability to compress the source wavelet and improve temporal resolution. Unlike statistical deconvolution, deterministic deconvolution is mathematically simple and stable while providing the highest possible resolution because it uses the source wavelet unique to the specific radar equipment. Source wavelets generated in, transmitted through and acquired from air allow successful application of deterministic approaches to wavelet suppression. We demonstrate the validity of using a source wavelet acquired in air as the operator for deterministic deconvolution in a field application using "400-MHz" antennas at a quarry site characterized by interbedded carbonates with shale partings. We collected GPR data on a bench adjacent to cleanly exposed quarry faces in which we placed conductive rods to provide conclusive groundtruth for this approach to deconvolution. The best deconvolution results, which are confirmed by the conductive rods for the 400-MHz antenna tests, were observed for wavelets acquired when the transmitter and receiver were separated by 0.3 m. Applying deterministic deconvolution to GPR data collected in sedimentary strata at our study site resulted in an improvement in resolution (50%) and improved spatial location (0.10-0.15 m) of geologic features compared to the same data processed without deterministic deconvolution. The effectiveness of deterministic deconvolution for increased resolution and spatial accuracy of specific geologic features is further demonstrated by comparing results of deconvolved data with nondeconvolved data acquired along a 30-m transect immediately adjacent to a fresh quarry face. The results at this site support using deterministic deconvolution, which incorporates the GPR instrument's unique source wavelet, as a standard part of routine GPR data processing. ?? 2003 Elsevier B.V. All rights reserved.
Expansion or extinction: deterministic and stochastic two-patch models with Allee effects.
Kang, Yun; Lanchier, Nicolas
2011-06-01
We investigate the impact of Allee effect and dispersal on the long-term evolution of a population in a patchy environment. Our main focus is on whether a population already established in one patch either successfully invades an adjacent empty patch or undergoes a global extinction. Our study is based on the combination of analytical and numerical results for both a deterministic two-patch model and a stochastic counterpart. The deterministic model has either two, three or four attractors. The existence of a regime with exactly three attractors only appears when patches have distinct Allee thresholds. In the presence of weak dispersal, the analysis of the deterministic model shows that a high-density and a low-density populations can coexist at equilibrium in nearby patches, whereas the analysis of the stochastic model indicates that this equilibrium is metastable, thus leading after a large random time to either a global expansion or a global extinction. Up to some critical dispersal, increasing the intensity of the interactions leads to an increase of both the basin of attraction of the global extinction and the basin of attraction of the global expansion. Above this threshold, for both the deterministic and the stochastic models, the patches tend to synchronize as the intensity of the dispersal increases. This results in either a global expansion or a global extinction. For the deterministic model, there are only two attractors, while the stochastic model no longer exhibits a metastable behavior. In the presence of strong dispersal, the limiting behavior is entirely determined by the value of the Allee thresholds as the global population size in the deterministic and the stochastic models evolves as dictated by their single-patch counterparts. For all values of the dispersal parameter, Allee effects promote global extinction in terms of an expansion of the basin of attraction of the extinction equilibrium for the deterministic model and an increase of the probability of extinction for the stochastic model.
NASA Astrophysics Data System (ADS)
Camilloni, Carlo; Broglia, Ricardo A.; Tiana, Guido
2011-01-01
The study of the mechanism which is at the basis of the phenomenon of protein folding requires the knowledge of multiple folding trajectories under biological conditions. Using a biasing molecular-dynamics algorithm based on the physics of the ratchet-and-pawl system, we carry out all-atom, explicit solvent simulations of the sequence of folding events which proteins G, CI2, and ACBP undergo in evolving from the denatured to the folded state. Starting from highly disordered conformations, the algorithm allows the proteins to reach, at the price of a modest computational effort, nativelike conformations, within a root mean square deviation (RMSD) of approximately 1 Å. A scheme is developed to extract, from the myriad of events, information concerning the sequence of native contact formation and of their eventual correlation. Such an analysis indicates that all the studied proteins fold hierarchically, through pathways which, although not deterministic, are well-defined with respect to the order of contact formation. The algorithm also allows one to study unfolding, a process which looks, to a large extent, like the reverse of the major folding pathway. This is also true in situations in which many pathways contribute to the folding process, like in the case of protein G.
Wenger, Yvan; Galliot, Brigitte
2013-01-01
Phenotypic traits derive from the selective recruitment of genetic materials over macroevolutionary times, and protein-coding genes constitute an essential component of these materials. We took advantage of the recent production of genomic scale data from sponges and cnidarians, sister groups from eumetazoans and bilaterians, respectively, to date the emergence of human proteins and to infer the timing of acquisition of novel traits through metazoan evolution. Comparing the proteomes of 23 eukaryotes, we find that 33% human proteins have an ortholog in nonmetazoan species. This premetazoan proteome associates with 43% of all annotated human biological processes. Subsequently, four major waves of innovations can be inferred in the last common ancestors of eumetazoans, bilaterians, euteleostomi (bony vertebrates), and hominidae, largely specific to each epoch, whereas early branching deuterostome and chordate phyla show very few innovations. Interestingly, groups of proteins that act together in their modern human functions often originated concomitantly, although the corresponding human phenotypes frequently emerged later. For example, the three cnidarians Acropora, Nematostella, and Hydra express a highly similar protein inventory, and their protein innovations can be affiliated either to traits shared by all eumetazoans (gut differentiation, neurogenesis); or to bilaterian traits present in only some cnidarians (eyes, striated muscle); or to traits not identified yet in this phylum (mesodermal layer, endocrine glands). The variable correspondence between phenotypes predicted from protein enrichments and observed phenotypes suggests that a parallel mechanism repeatedly produce similar phenotypes, thanks to novel regulatory events that independently tie preexisting conserved genetic modules. PMID:24065732
Van Coillie, Samya; Liang, Lunxi; Zhang, Yao; Wang, Huanbin; Fang, Jing-Yuan; Xu, Jie
2016-04-05
High-throughput methods such as co-immunoprecipitationmass spectrometry (coIP-MS) and yeast 2 hybridization (Y2H) have suggested a broad range of unannotated protein-protein interactions (PPIs), and interpretation of these PPIs remains a challenging task. The advancements in cancer genomic researches allow for the inference of "coactivation pairs" in cancer, which may facilitate the identification of PPIs involved in cancer. Here we present OncoBinder as a tool for the assessment of proteomic interaction data based on the functional synergy of oncoproteins in cancer. This decision tree-based method combines gene mutation, copy number and mRNA expression information to infer the functional status of protein-coding genes. We applied OncoBinder to evaluate the potential binders of EGFR and ERK2 proteins based on the gastric cancer dataset of The Cancer Genome Atlas (TCGA). As a result, OncoBinder identified high confidence interactions (annotated by Kyoto Encyclopedia of Genes and Genomes (KEGG) or validated by low-throughput assays) more efficiently than co-expression based method. Taken together, our results suggest that evaluation of gene functional synergy in cancer may facilitate the interpretation of proteomic interaction data. The OncoBinder toolbox for Matlab is freely accessible online.
Identifying cooperative transcriptional regulations using protein–protein interactions
Nagamine, Nobuyoshi; Kawada, Yuji; Sakakibara, Yasubumi
2005-01-01
Cooperative transcriptional activations among multiple transcription factors (TFs) are important to understand the mechanisms of complex transcriptional regulations in eukaryotes. Previous studies have attempted to find cooperative TFs based on gene expression data with gene expression profiles as a measure of similarity of gene regulations. In this paper, we use protein–protein interaction data to infer synergistic binding of cooperative TFs. Our fundamental idea is based on the assumption that genes contributing to a similar biological process are regulated under the same control mechanism. First, the protein–protein interaction networks are used to calculate the similarity of biological processes among genes. Second, we integrate this similarity and the chromatin immuno-precipitation data to identify cooperative TFs. Our computational experiments in yeast show that predictions made by our method have successfully identified eight pairs of cooperative TFs that have literature evidences but could not be identified by the previous method. Further, 12 new possible pairs have been inferred and we have examined the biological relevances for them. However, since a typical problem using protein–protein interaction data is that many false-positive data are contained, we propose a method combining various biological data to increase the prediction accuracy. PMID:16126847
2013-01-01
Background In recent years, various types of cellular networks have penetrated biology and are nowadays used omnipresently for studying eukaryote and prokaryote organisms. Still, the relation and the biological overlap among phenomenological and inferential gene networks, e.g., between the protein interaction network and the gene regulatory network inferred from large-scale transcriptomic data, is largely unexplored. Results We provide in this study an in-depth analysis of the structural, functional and chromosomal relationship between a protein-protein network, a transcriptional regulatory network and an inferred gene regulatory network, for S. cerevisiae and E. coli. Further, we study global and local aspects of these networks and their biological information overlap by comparing, e.g., the functional co-occurrence of Gene Ontology terms by exploiting the available interaction structure among the genes. Conclusions Although the individual networks represent different levels of cellular interactions with global structural and functional dissimilarities, we observe crucial functions of their network interfaces for the assembly of protein complexes, proteolysis, transcription, translation, metabolic and regulatory interactions. Overall, our results shed light on the integrability of these networks and their interfacing biological processes. PMID:23663484
Cohen-Gihon, Inbar; Fong, Jessica H.; Sharan, Roded; Nussinov, Ruth
2012-01-01
Most eukaryotic proteins are composed of two or more domains. These assemble in a modular manner to create new proteins usually by the acquisition of one or more domains to an existing protein. Promiscuous domains which are found embedded in a variety of proteins and co-exist with many other domains are of particular interest and were shown to have roles in signaling pathways and mediating network communication. The evolution of domain promiscuity is still an open problem, mostly due to the lack of sequenced ancestral genomes. Here we use inferred domain architectures of ancestral genomes to trace the evolution of domain promiscuity in eukaryotic genomes. We find an increase in average promiscuity along many branches of the eukaryotic tree. Moreover, domain promiscuity can proceed at almost a steady rate over long evolutionary time or exhibit lineage-specific acceleration. We also observe that many signaling and regulatory domains gained domain promiscuity around the Bilateria divergence. In addition we show that those domains that played a role in the creation of two body axes and existed before the divergence of the bilaterians from fungi/metazoan achieve a boost in their promiscuities during the bilaterian evolution. PMID:21127809
Nadzirin, Nurul; Firdaus-Raih, Mohd
2012-10-08
Proteins of uncharacterized functions form a large part of many of the currently available biological databases and this situation exists even in the Protein Data Bank (PDB). Our analysis of recent PDB data revealed that only 42.53% of PDB entries (1084 coordinate files) that were categorized under "unknown function" are true examples of proteins of unknown function at this point in time. The remainder 1465 entries also annotated as such appear to be able to have their annotations re-assessed, based on the availability of direct functional characterization experiments for the protein itself, or for homologous sequences or structures thus enabling computational function inference.
Boulila, Moncef; Ben Tiba, Sawssen; Jilani, Saoussen
2013-04-01
The sequence alignments of five Tunisian isolates of Prunus necrotic ringspot virus (PNRSV) were searched for evidence of recombination and diversifying selection. Since failing to account for recombination can elevate the false positive error rate in positive selection inference, a genetic algorithm (GARD) was used first and led to the detection of potential recombination events in the coat protein-encoding gene of that virus. The Recco algorithm confirmed these results by identifying, additionally, the potential recombinants. For neutrality testing and evaluation of nucleotide polymorphism in PNRSV CP gene, Tajima's D, and Fu and Li's D and F statistical tests were used. About selection inference, eight algorithms (SLAC, FEL, IFEL, REL, FUBAR, MEME, PARRIS, and GA branch) incorporated in HyPhy package were utilized to assess the selection pressure exerted on the expression of PNRSV capsid. Inferred phylogenies pointed out, in addition to the three classical groups (PE-5, PV-32, and PV-96), the delineation of a fourth cluster having the new proposed designation SW6, and a fifth clade comprising four Tunisian PNRSV isolates which underwent recombination and selective pressure and to which the name Tunisian outgroup was allocated.
Estimating the epidemic threshold on networks by deterministic connections
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Kezan, E-mail: lkzzr@sohu.com; Zhu, Guanghu; Fu, Xinchu
2014-12-15
For many epidemic networks some connections between nodes are treated as deterministic, while the remainder are random and have different connection probabilities. By applying spectral analysis to several constructed models, we find that one can estimate the epidemic thresholds of these networks by investigating information from only the deterministic connections. Nonetheless, in these models, generic nonuniform stochastic connections and heterogeneous community structure are also considered. The estimation of epidemic thresholds is achieved via inequalities with upper and lower bounds, which are found to be in very good agreement with numerical simulations. Since these deterministic connections are easier to detect thanmore » those stochastic connections, this work provides a feasible and effective method to estimate the epidemic thresholds in real epidemic networks.« less
Experimental demonstration on the deterministic quantum key distribution based on entangled photons.
Chen, Hua; Zhou, Zhi-Yuan; Zangana, Alaa Jabbar Jumaah; Yin, Zhen-Qiang; Wu, Juan; Han, Yun-Guang; Wang, Shuang; Li, Hong-Wei; He, De-Yong; Tawfeeq, Shelan Khasro; Shi, Bao-Sen; Guo, Guang-Can; Chen, Wei; Han, Zheng-Fu
2016-02-10
As an important resource, entanglement light source has been used in developing quantum information technologies, such as quantum key distribution(QKD). There are few experiments implementing entanglement-based deterministic QKD protocols since the security of existing protocols may be compromised in lossy channels. In this work, we report on a loss-tolerant deterministic QKD experiment which follows a modified "Ping-Pong"(PP) protocol. The experiment results demonstrate for the first time that a secure deterministic QKD session can be fulfilled in a channel with an optical loss of 9 dB, based on a telecom-band entangled photon source. This exhibits a conceivable prospect of ultilizing entanglement light source in real-life fiber-based quantum communications.
Experimental demonstration on the deterministic quantum key distribution based on entangled photons
Chen, Hua; Zhou, Zhi-Yuan; Zangana, Alaa Jabbar Jumaah; Yin, Zhen-Qiang; Wu, Juan; Han, Yun-Guang; Wang, Shuang; Li, Hong-Wei; He, De-Yong; Tawfeeq, Shelan Khasro; Shi, Bao-Sen; Guo, Guang-Can; Chen, Wei; Han, Zheng-Fu
2016-01-01
As an important resource, entanglement light source has been used in developing quantum information technologies, such as quantum key distribution(QKD). There are few experiments implementing entanglement-based deterministic QKD protocols since the security of existing protocols may be compromised in lossy channels. In this work, we report on a loss-tolerant deterministic QKD experiment which follows a modified “Ping-Pong”(PP) protocol. The experiment results demonstrate for the first time that a secure deterministic QKD session can be fulfilled in a channel with an optical loss of 9 dB, based on a telecom-band entangled photon source. This exhibits a conceivable prospect of ultilizing entanglement light source in real-life fiber-based quantum communications. PMID:26860582
Estimation of the proteomic cancer co-expression sub networks by using association estimators.
Erdoğan, Cihat; Kurt, Zeyneb; Diri, Banu
2017-01-01
In this study, the association estimators, which have significant influences on the gene network inference methods and used for determining the molecular interactions, were examined within the co-expression network inference concept. By using the proteomic data from five different cancer types, the hub genes/proteins within the disease-associated gene-gene/protein-protein interaction sub networks were identified. Proteomic data from various cancer types is collected from The Cancer Proteome Atlas (TCPA). Correlation and mutual information (MI) based nine association estimators that are commonly used in the literature, were compared in this study. As the gold standard to measure the association estimators' performance, a multi-layer data integration platform on gene-disease associations (DisGeNET) and the Molecular Signatures Database (MSigDB) was used. Fisher's exact test was used to evaluate the performance of the association estimators by comparing the created co-expression networks with the disease-associated pathways. It was observed that the MI based estimators provided more successful results than the Pearson and Spearman correlation approaches, which are used in the estimation of biological networks in the weighted correlation network analysis (WGCNA) package. In correlation-based methods, the best average success rate for five cancer types was 60%, while in MI-based methods the average success ratio was 71% for James-Stein Shrinkage (Shrink) and 64% for Schurmann-Grassberger (SG) association estimator, respectively. Moreover, the hub genes and the inferred sub networks are presented for the consideration of researchers and experimentalists.
Estimation of the proteomic cancer co-expression sub networks by using association estimators
Kurt, Zeyneb; Diri, Banu
2017-01-01
In this study, the association estimators, which have significant influences on the gene network inference methods and used for determining the molecular interactions, were examined within the co-expression network inference concept. By using the proteomic data from five different cancer types, the hub genes/proteins within the disease-associated gene-gene/protein-protein interaction sub networks were identified. Proteomic data from various cancer types is collected from The Cancer Proteome Atlas (TCPA). Correlation and mutual information (MI) based nine association estimators that are commonly used in the literature, were compared in this study. As the gold standard to measure the association estimators’ performance, a multi-layer data integration platform on gene-disease associations (DisGeNET) and the Molecular Signatures Database (MSigDB) was used. Fisher's exact test was used to evaluate the performance of the association estimators by comparing the created co-expression networks with the disease-associated pathways. It was observed that the MI based estimators provided more successful results than the Pearson and Spearman correlation approaches, which are used in the estimation of biological networks in the weighted correlation network analysis (WGCNA) package. In correlation-based methods, the best average success rate for five cancer types was 60%, while in MI-based methods the average success ratio was 71% for James-Stein Shrinkage (Shrink) and 64% for Schurmann-Grassberger (SG) association estimator, respectively. Moreover, the hub genes and the inferred sub networks are presented for the consideration of researchers and experimentalists. PMID:29145449
Wang, Junbai; Wu, Qianqian; Hu, Xiaohua Tony; Tian, Tianhai
2016-11-01
Investigating the dynamics of genetic regulatory networks through high throughput experimental data, such as microarray gene expression profiles, is a very important but challenging task. One of the major hindrances in building detailed mathematical models for genetic regulation is the large number of unknown model parameters. To tackle this challenge, a new integrated method is proposed by combining a top-down approach and a bottom-up approach. First, the top-down approach uses probabilistic graphical models to predict the network structure of DNA repair pathway that is regulated by the p53 protein. Two networks are predicted, namely a network of eight genes with eight inferred interactions and an extended network of 21 genes with 17 interactions. Then, the bottom-up approach using differential equation models is developed to study the detailed genetic regulations based on either a fully connected regulatory network or a gene network obtained by the top-down approach. Model simulation error, parameter identifiability and robustness property are used as criteria to select the optimal network. Simulation results together with permutation tests of input gene network structures indicate that the prediction accuracy and robustness property of the two predicted networks using the top-down approach are better than those of the corresponding fully connected networks. In particular, the proposed approach reduces computational cost significantly for inferring model parameters. Overall, the new integrated method is a promising approach for investigating the dynamics of genetic regulation. Copyright © 2016 Elsevier Inc. All rights reserved.
2011-01-01
Background Bacteria have evolved a rich set of mechanisms for sensing and adapting to adverse conditions in their environment. These are crucial for their survival, which requires them to react to extracellular stresses such as heat shock, ethanol treatment or phage infection. Here we focus on studying the phage shock protein (Psp) stress response in Escherichia coli induced by a phage infection or other damage to the bacterial membrane. This system has not yet been theoretically modelled or analysed in silico. Results We develop a model of the Psp response system, and illustrate how such models can be constructed and analyzed in light of available sparse and qualitative information in order to generate novel biological hypotheses about their dynamical behaviour. We analyze this model using tools from Petri-net theory and study its dynamical range that is consistent with currently available knowledge by conditioning model parameters on the available data in an approximate Bayesian computation (ABC) framework. Within this ABC approach we analyze stochastic and deterministic dynamics. This analysis allows us to identify different types of behaviour and these mechanistic insights can in turn be used to design new, more detailed and time-resolved experiments. Conclusions We have developed the first mechanistic model of the Psp response in E. coli. This model allows us to predict the possible qualitative stochastic and deterministic dynamic behaviours of key molecular players in the stress response. Our inferential approach can be applied to stress response and signalling systems more generally: in the ABC framework we can condition mathematical models on qualitative data in order to delimit e.g. parameter ranges or the qualitative system dynamics in light of available end-point or qualitative information. PMID:21569396
Characterization of normality of chaotic systems including prediction and detection of anomalies
NASA Astrophysics Data System (ADS)
Engler, Joseph John
Accurate prediction and control pervades domains such as engineering, physics, chemistry, and biology. Often, it is discovered that the systems under consideration cannot be well represented by linear, periodic nor random data. It has been shown that these systems exhibit deterministic chaos behavior. Deterministic chaos describes systems which are governed by deterministic rules but whose data appear to be random or quasi-periodic distributions. Deterministically chaotic systems characteristically exhibit sensitive dependence upon initial conditions manifested through rapid divergence of states initially close to one another. Due to this characterization, it has been deemed impossible to accurately predict future states of these systems for longer time scales. Fortunately, the deterministic nature of these systems allows for accurate short term predictions, given the dynamics of the system are well understood. This fact has been exploited in the research community and has resulted in various algorithms for short term predictions. Detection of normality in deterministically chaotic systems is critical in understanding the system sufficiently to able to predict future states. Due to the sensitivity to initial conditions, the detection of normal operational states for a deterministically chaotic system can be challenging. The addition of small perturbations to the system, which may result in bifurcation of the normal states, further complicates the problem. The detection of anomalies and prediction of future states of the chaotic system allows for greater understanding of these systems. The goal of this research is to produce methodologies for determining states of normality for deterministically chaotic systems, detection of anomalous behavior, and the more accurate prediction of future states of the system. Additionally, the ability to detect subtle system state changes is discussed. The dissertation addresses these goals by proposing new representational techniques and novel prediction methodologies. The value and efficiency of these methods are explored in various case studies. Presented is an overview of chaotic systems with examples taken from the real world. A representation schema for rapid understanding of the various states of deterministically chaotic systems is presented. This schema is then used to detect anomalies and system state changes. Additionally, a novel prediction methodology which utilizes Lyapunov exponents to facilitate longer term prediction accuracy is presented and compared with other nonlinear prediction methodologies. These novel methodologies are then demonstrated on applications such as wind energy, cyber security and classification of social networks.
A General Model for Estimating Macroevolutionary Landscapes.
Boucher, Florian C; Démery, Vincent; Conti, Elena; Harmon, Luke J; Uyeda, Josef
2018-03-01
The evolution of quantitative characters over long timescales is often studied using stochastic diffusion models. The current toolbox available to students of macroevolution is however limited to two main models: Brownian motion and the Ornstein-Uhlenbeck process, plus some of their extensions. Here, we present a very general model for inferring the dynamics of quantitative characters evolving under both random diffusion and deterministic forces of any possible shape and strength, which can accommodate interesting evolutionary scenarios like directional trends, disruptive selection, or macroevolutionary landscapes with multiple peaks. This model is based on a general partial differential equation widely used in statistical mechanics: the Fokker-Planck equation, also known in population genetics as the Kolmogorov forward equation. We thus call the model FPK, for Fokker-Planck-Kolmogorov. We first explain how this model can be used to describe macroevolutionary landscapes over which quantitative traits evolve and, more importantly, we detail how it can be fitted to empirical data. Using simulations, we show that the model has good behavior both in terms of discrimination from alternative models and in terms of parameter inference. We provide R code to fit the model to empirical data using either maximum-likelihood or Bayesian estimation, and illustrate the use of this code with two empirical examples of body mass evolution in mammals. FPK should greatly expand the set of macroevolutionary scenarios that can be studied since it opens the way to estimating macroevolutionary landscapes of any conceivable shape. [Adaptation; bounds; diffusion; FPK model; macroevolution; maximum-likelihood estimation; MCMC methods; phylogenetic comparative data; selection.].
Regional Wave Propagation in Southeastern United States
NASA Astrophysics Data System (ADS)
Jemberie, A. L.; Langston, C. A.
2003-12-01
Broad band seismograms from the April 29, 2003, M4.6 Fort Payne, Alabama earthquake are analyzed to infer mechanisms of crustal wave propagation, crust and upper mantle velocity structure in southeastern United States, and source parameters of the event. In particular, we are interested in producing deterministic models of the distance attenuation of earthquake ground motions through computation of synthetic seismograms. The method first requires constraining the source parameters of an earthquake and then modeling the amplitude and times of broadband arrivals within the waveforms to infer appropriate layered earth models. A first look at seismograms recorded by stations outside the Mississippi Embayment (ME) show clear body phases such P, sP, Pnl, Sn and Lg. The ME signals are qualitatively different from others because they have longer durations and large surface waves. A straightforward interpretation of P wave arrival times shows a typical upper mantle velocity of 8.18 km/s. However, there is evidence of significantly higher P phase velocities at epicentral distances between 400 and 600km, that may be caused by a high velocity upper mantle anomaly; triplication of P-waves is seen in these seismograms. The arrival time differences between regional P and the depth phase sP at different stations are used to constrain the depth of the earthquake. The source depth lies between 9.5 km and 13km which is somewhat more shallow than the network location that was constrained to 15km depth. The Fort Payne earthquake is the largest earthquake to have occurred within the Eastern Tennessee Seismic Zone.
NASA Astrophysics Data System (ADS)
Krumholz, Mark R.; Fumagalli, Michele; da Silva, Robert L.; Rendahl, Theodore; Parra, Jonathan
2015-09-01
Stellar population synthesis techniques for predicting the observable light emitted by a stellar population have extensive applications in numerous areas of astronomy. However, accurate predictions for small populations of young stars, such as those found in individual star clusters, star-forming dwarf galaxies, and small segments of spiral galaxies, require that the population be treated stochastically. Conversely, accurate deductions of the properties of such objects also require consideration of stochasticity. Here we describe a comprehensive suite of modular, open-source software tools for tackling these related problems. These include the following: a greatly-enhanced version of the SLUG code introduced by da Silva et al., which computes spectra and photometry for stochastically or deterministically sampled stellar populations with nearly arbitrary star formation histories, clustering properties, and initial mass functions; CLOUDY_SLUG, a tool that automatically couples SLUG-computed spectra with the CLOUDY radiative transfer code in order to predict stochastic nebular emission; BAYESPHOT, a general-purpose tool for performing Bayesian inference on the physical properties of stellar systems based on unresolved photometry; and CLUSTER_SLUG and SFR_SLUG, a pair of tools that use BAYESPHOT on a library of SLUG models to compute the mass, age, and extinction of mono-age star clusters, and the star formation rate of galaxies, respectively. The latter two tools make use of an extensive library of pre-computed stellar population models, which are included in the software. The complete package is available at http://www.slugsps.com.
Communication: Nanoscale electrostatic theory of epistructural fields at the protein-water interface
NASA Astrophysics Data System (ADS)
Fernández, Ariel
2012-12-01
Nanoscale solvent confinement at the protein-water interface promotes dipole orientations that are not aligned with the internal electrostatic field of a protein, yielding what we term epistructural polarization. To quantify this effect, an equation is derived from first principles relating epistructural polarization with the magnitude of local distortions in water coordination causative of interfacial tension. The equation defines a nanoscale electrostatic model of water and enables an estimation of protein denaturation free energies and the inference of hot spots for protein associations. The theoretical results are validated vis-à-vis calorimetric data, revealing the destabilizing effect of epistructural polarization and its molecular origin.
Fernández, Ariel
2012-12-21
Nanoscale solvent confinement at the protein-water interface promotes dipole orientations that are not aligned with the internal electrostatic field of a protein, yielding what we term epistructural polarization. To quantify this effect, an equation is derived from first principles relating epistructural polarization with the magnitude of local distortions in water coordination causative of interfacial tension. The equation defines a nanoscale electrostatic model of water and enables an estimation of protein denaturation free energies and the inference of hot spots for protein associations. The theoretical results are validated vis-à-vis calorimetric data, revealing the destabilizing effect of epistructural polarization and its molecular origin.
Controllability of Deterministic Networks with the Identical Degree Sequence
Ma, Xiujuan; Zhao, Haixing; Wang, Binghong
2015-01-01
Controlling complex network is an essential problem in network science and engineering. Recent advances indicate that the controllability of complex network is dependent on the network's topology. Liu and Barabási, et.al speculated that the degree distribution was one of the most important factors affecting controllability for arbitrary complex directed network with random link weights. In this paper, we analysed the effect of degree distribution to the controllability for the deterministic networks with unweighted and undirected. We introduce a class of deterministic networks with identical degree sequence, called (x,y)-flower. We analysed controllability of the two deterministic networks ((1, 3)-flower and (2, 2)-flower) by exact controllability theory in detail and give accurate results of the minimum number of driver nodes for the two networks. In simulation, we compare the controllability of (x,y)-flower networks. Our results show that the family of (x,y)-flower networks have the same degree sequence, but their controllability is totally different. So the degree distribution itself is not sufficient to characterize the controllability of deterministic networks with unweighted and undirected. PMID:26020920
Inverse kinematic problem for a random gradient medium in geometric optics approximation
NASA Astrophysics Data System (ADS)
Petersen, N. V.
1990-03-01
Scattering at random inhomogeneities in a gradient medium results in systematic deviations of the rays and travel times of refracted body waves from those corresponding to the deterministic velocity component. The character of the difference depends on the parameters of the deterministic and random velocity component. However, at great distances to the source, independently of the velocity parameters (weakly or strongly inhomogeneous medium), the most probable depth of the ray turning point is smaller than that corresponding to the deterministic velocity component, the most probable travel times also being lower. The relative uncertainty in the deterministic velocity component, derived from the mean travel times using methods developed for laterally homogeneous media (for instance, the Herglotz-Wiechert method), is systematic in character, but does not exceed the contrast of velocity inhomogeneities by magnitude. The gradient of the deterministic velocity component has a significant effect on the travel-time fluctuations. The variance at great distances to the source is mainly controlled by shallow inhomogeneities. The travel-time flucutations are studied only for weakly inhomogeneous media.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Webb-Robertson, Bobbie-Jo M.; Matzke, Melissa M.; Datta, Susmita
As the capability of mass spectrometry-based proteomics has matured, tens of thousands of peptides can be measured simultaneously, which has the benefit of offering a systems view of protein expression. However, a major challenge is that with an increase in throughput, protein quantification estimation from the native measured peptides has become a computational task. A limitation to existing computationally-driven protein quantification methods is that most ignore protein variation, such as alternate splicing of the RNA transcript and post-translational modifications or other possible proteoforms, which will affect a significant fraction of the proteome. The consequence of this assumption is that statisticalmore » inference at the protein level, and consequently downstream analyses, such as network and pathway modeling, have only limited power for biomarker discovery. Here, we describe a Bayesian model (BP-Quant) that uses statistically derived peptides signatures to identify peptides that are outside the dominant pattern, or the existence of multiple over-expressed patterns to improve relative protein abundance estimates. It is a research-driven approach that utilizes the objectives of the experiment, defined in the context of a standard statistical hypothesis, to identify a set of peptides exhibiting similar statistical behavior relating to a protein. This approach infers that changes in relative protein abundance can be used as a surrogate for changes in function, without necessarily taking into account the effect of differential post-translational modifications, processing, or splicing in altering protein function. We verify the approach using a dilution study from mouse plasma samples and demonstrate that BP-Quant achieves similar accuracy as the current state-of-the-art methods at proteoform identification with significantly better specificity. BP-Quant is available as a MatLab ® and R packages at https://github.com/PNNL-Comp-Mass-Spec/BP-Quant.« less
Quasi-Static Probabilistic Structural Analyses Process and Criteria
NASA Technical Reports Server (NTRS)
Goldberg, B.; Verderaime, V.
1999-01-01
Current deterministic structural methods are easily applied to substructures and components, and analysts have built great design insights and confidence in them over the years. However, deterministic methods cannot support systems risk analyses, and it was recently reported that deterministic treatment of statistical data is inconsistent with error propagation laws that can result in unevenly conservative structural predictions. Assuming non-nal distributions and using statistical data formats throughout prevailing stress deterministic processes lead to a safety factor in statistical format, which integrated into the safety index, provides a safety factor and first order reliability relationship. The embedded safety factor in the safety index expression allows a historically based risk to be determined and verified over a variety of quasi-static metallic substructures consistent with the traditional safety factor methods and NASA Std. 5001 criteria.
Effect of Uncertainty on Deterministic Runway Scheduling
NASA Technical Reports Server (NTRS)
Gupta, Gautam; Malik, Waqar; Jung, Yoon C.
2012-01-01
Active runway scheduling involves scheduling departures for takeoffs and arrivals for runway crossing subject to numerous constraints. This paper evaluates the effect of uncertainty on a deterministic runway scheduler. The evaluation is done against a first-come- first-serve scheme. In particular, the sequence from a deterministic scheduler is frozen and the times adjusted to satisfy all separation criteria; this approach is tested against FCFS. The comparison is done for both system performance (throughput and system delay) and predictability, and varying levels of congestion are considered. The modeling of uncertainty is done in two ways: as equal uncertainty in availability at the runway as for all aircraft, and as increasing uncertainty for later aircraft. Results indicate that the deterministic approach consistently performs better than first-come-first-serve in both system performance and predictability.
Numerical Approach to Spatial Deterministic-Stochastic Models Arising in Cell Biology.
Schaff, James C; Gao, Fei; Li, Ye; Novak, Igor L; Slepchenko, Boris M
2016-12-01
Hybrid deterministic-stochastic methods provide an efficient alternative to a fully stochastic treatment of models which include components with disparate levels of stochasticity. However, general-purpose hybrid solvers for spatially resolved simulations of reaction-diffusion systems are not widely available. Here we describe fundamentals of a general-purpose spatial hybrid method. The method generates realizations of a spatially inhomogeneous hybrid system by appropriately integrating capabilities of a deterministic partial differential equation solver with a popular particle-based stochastic simulator, Smoldyn. Rigorous validation of the algorithm is detailed, using a simple model of calcium 'sparks' as a testbed. The solver is then applied to a deterministic-stochastic model of spontaneous emergence of cell polarity. The approach is general enough to be implemented within biologist-friendly software frameworks such as Virtual Cell.
Predicting Adverse Drug Effects from Literature- and Database-Mined Assertions.
La, Mary K; Sedykh, Alexander; Fourches, Denis; Muratov, Eugene; Tropsha, Alexander
2018-06-06
Given that adverse drug effects (ADEs) have led to post-market patient harm and subsequent drug withdrawal, failure of candidate agents in the drug development process, and other negative outcomes, it is essential to attempt to forecast ADEs and other relevant drug-target-effect relationships as early as possible. Current pharmacologic data sources, providing multiple complementary perspectives on the drug-target-effect paradigm, can be integrated to facilitate the inference of relationships between these entities. This study aims to identify both existing and unknown relationships between chemicals (C), protein targets (T), and ADEs (E) based on evidence in the literature. Cheminformatics and data mining approaches were employed to integrate and analyze publicly available clinical pharmacology data and literature assertions interrelating drugs, targets, and ADEs. Based on these assertions, a C-T-E relationship knowledge base was developed. Known pairwise relationships between chemicals, targets, and ADEs were collected from several pharmacological and biomedical data sources. These relationships were curated and integrated according to Swanson's paradigm to form C-T-E triangles. Missing C-E edges were then inferred as C-E relationships. Unreported associations between drugs, targets, and ADEs were inferred, and inferences were prioritized as testable hypotheses. Several C-E inferences, including testosterone → myocardial infarction, were identified using inferences based on the literature sources published prior to confirmatory case reports. Timestamping approaches confirmed the predictive ability of this inference strategy on a larger scale. The presented workflow, based on free-access databases and an association-based inference scheme, provided novel C-E relationships that have been validated post hoc in case reports. With refinement of prioritization schemes for the generated C-E inferences, this workflow may provide an effective computational method for the early detection of potential drug candidate ADEs that can be followed by targeted experimental investigations.
Quality of Computationally Inferred Gene Ontology Annotations
Škunca, Nives; Altenhoff, Adrian; Dessimoz, Christophe
2012-01-01
Gene Ontology (GO) has established itself as the undisputed standard for protein function annotation. Most annotations are inferred electronically, i.e. without individual curator supervision, but they are widely considered unreliable. At the same time, we crucially depend on those automated annotations, as most newly sequenced genomes are non-model organisms. Here, we introduce a methodology to systematically and quantitatively evaluate electronic annotations. By exploiting changes in successive releases of the UniProt Gene Ontology Annotation database, we assessed the quality of electronic annotations in terms of specificity, reliability, and coverage. Overall, we not only found that electronic annotations have significantly improved in recent years, but also that their reliability now rivals that of annotations inferred by curators when they use evidence other than experiments from primary literature. This work provides the means to identify the subset of electronic annotations that can be relied upon—an important outcome given that >98% of all annotations are inferred without direct curation. PMID:22693439
Revealing networks from dynamics: an introduction
NASA Astrophysics Data System (ADS)
Timme, Marc; Casadiego, Jose
2014-08-01
What can we learn from the collective dynamics of a complex network about its interaction topology? Taking the perspective from nonlinear dynamics, we briefly review recent progress on how to infer structural connectivity (direct interactions) from accessing the dynamics of the units. Potential applications range from interaction networks in physics, to chemical and metabolic reactions, protein and gene regulatory networks as well as neural circuits in biology and electric power grids or wireless sensor networks in engineering. Moreover, we briefly mention some standard ways of inferring effective or functional connectivity.
LACTB is a filament-forming protein localized in mitochondria
Polianskyte, Zydrune; Peitsaro, Nina; Dapkunas, Arvydas; Liobikas, Julius; Soliymani, Rabah; Lalowski, Maciej; Speer, Oliver; Seitsonen, Jani; Butcher, Sarah; Cereghetti, Grazia M.; Linder, Matts D.; Merckel, Michael; Thompson, James; Eriksson, Ove
2009-01-01
LACTB is a mammalian active-site serine protein that has evolved from a bacterial penicillin-binding protein. Penicillin-binding proteins are involved in the metabolism of peptidoglycan, the major bacterial cell wall constituent, implying that LACTB has been endowed with novel biochemical properties during eukaryote evolution. Here we demonstrate that LACTB is localized in the mitochondrial intermembrane space, where it is polymerized into stable filaments with a length extending more than a hundred nanometers. We infer that LACTB, through polymerization, promotes intramitochondrial membrane organization and micro-compartmentalization. These findings have implications for our understanding of mitochondrial evolution and function. PMID:19858488
Inferring drug-disease associations based on known protein complexes.
Yu, Liang; Huang, Jianbin; Ma, Zhixin; Zhang, Jing; Zou, Yapeng; Gao, Lin
2015-01-01
Inferring drug-disease associations is critical in unveiling disease mechanisms, as well as discovering novel functions of available drugs, or drug repositioning. Previous work is primarily based on drug-gene-disease relationship, which throws away many important information since genes execute their functions through interacting others. To overcome this issue, we propose a novel methodology that discover the drug-disease association based on protein complexes. Firstly, the integrated heterogeneous network consisting of drugs, protein complexes, and disease are constructed, where we assign weights to the drug-disease association by using probability. Then, from the tripartite network, we get the indirect weighted relationships between drugs and diseases. The larger the weight, the higher the reliability of the correlation. We apply our method to mental disorders and hypertension, and validate the result by using comparative toxicogenomics database. Our ranked results can be directly reinforced by existing biomedical literature, suggesting that our proposed method obtains higher specificity and sensitivity. The proposed method offers new insight into drug-disease discovery. Our method is publicly available at http://1.complexdrug.sinaapp.com/Drug_Complex_Disease/Data_Download.html.
Novel Computational Approaches to Drug Discovery
NASA Astrophysics Data System (ADS)
Skolnick, Jeffrey; Brylinski, Michal
2010-01-01
New approaches to protein functional inference based on protein structure and evolution are described. First, FINDSITE, a threading based approach to protein function prediction, is summarized. Then, the results of large scale benchmarking of ligand binding site prediction, ligand screening, including applications to HIV protease, and GO molecular functional inference are presented. A key advantage of FINDSITE is its ability to use low resolution, predicted structures as well as high resolution experimental structures. Then, an extension of FINDSITE to ligand screening in GPCRs using predicted GPCR structures, FINDSITE/QDOCKX, is presented. This is a particularly difficult case as there are few experimentally solved GPCR structures. Thus, we first train on a subset of known binding ligands for a set of GPCRs; this is then followed by benchmarking against a large ligand library. For the virtual ligand screening of a number of Dopamine receptors, encouraging results are seen, with significant enrichment in identified ligands over those found in the training set. Thus, FINDSITE and its extensions represent a powerful approach to the successful prediction of a variety of molecular functions.
Predicting the binding preference of transcription factors to individual DNA k-mers.
Alleyne, Trevis M; Peña-Castillo, Lourdes; Badis, Gwenael; Talukder, Shaheynoor; Berger, Michael F; Gehrke, Andrew R; Philippakis, Anthony A; Bulyk, Martha L; Morris, Quaid D; Hughes, Timothy R
2009-04-15
Recognition of specific DNA sequences is a central mechanism by which transcription factors (TFs) control gene expression. Many TF-binding preferences, however, are unknown or poorly characterized, in part due to the difficulty associated with determining their specificity experimentally, and an incomplete understanding of the mechanisms governing sequence specificity. New techniques that estimate the affinity of TFs to all possible k-mers provide a new opportunity to study DNA-protein interaction mechanisms, and may facilitate inference of binding preferences for members of a given TF family when such information is available for other family members. We employed a new dataset consisting of the relative preferences of mouse homeodomains for all eight-base DNA sequences in order to ask how well we can predict the binding profiles of homeodomains when only their protein sequences are given. We evaluated a panel of standard statistical inference techniques, as well as variations of the protein features considered. Nearest neighbour among functionally important residues emerged among the most effective methods. Our results underscore the complexity of TF-DNA recognition, and suggest a rational approach for future analyses of TF families.
Discrete kinetic models from funneled energy landscape simulations.
Schafer, Nicholas P; Hoffman, Ryan M B; Burger, Anat; Craig, Patricio O; Komives, Elizabeth A; Wolynes, Peter G
2012-01-01
A general method for facilitating the interpretation of computer simulations of protein folding with minimally frustrated energy landscapes is detailed and applied to a designed ankyrin repeat protein (4ANK). In the method, groups of residues are assigned to foldons and these foldons are used to map the conformational space of the protein onto a set of discrete macrobasins. The free energies of the individual macrobasins are then calculated, informing practical kinetic analysis. Two simple assumptions about the universality of the rate for downhill transitions between macrobasins and the natural local connectivity between macrobasins lead to a scheme for predicting overall folding and unfolding rates, generating chevron plots under varying thermodynamic conditions, and inferring dominant kinetic folding pathways. To illustrate the approach, free energies of macrobasins were calculated from biased simulations of a non-additive structure-based model using two structurally motivated foldon definitions at the full and half ankyrin repeat resolutions. The calculated chevrons have features consistent with those measured in stopped flow chemical denaturation experiments. The dominant inferred folding pathway has an "inside-out", nucleation-propagation like character.
Inferring drug-disease associations based on known protein complexes
2015-01-01
Inferring drug-disease associations is critical in unveiling disease mechanisms, as well as discovering novel functions of available drugs, or drug repositioning. Previous work is primarily based on drug-gene-disease relationship, which throws away many important information since genes execute their functions through interacting others. To overcome this issue, we propose a novel methodology that discover the drug-disease association based on protein complexes. Firstly, the integrated heterogeneous network consisting of drugs, protein complexes, and disease are constructed, where we assign weights to the drug-disease association by using probability. Then, from the tripartite network, we get the indirect weighted relationships between drugs and diseases. The larger the weight, the higher the reliability of the correlation. We apply our method to mental disorders and hypertension, and validate the result by using comparative toxicogenomics database. Our ranked results can be directly reinforced by existing biomedical literature, suggesting that our proposed method obtains higher specificity and sensitivity. The proposed method offers new insight into drug-disease discovery. Our method is publicly available at http://1.complexdrug.sinaapp.com/Drug_Complex_Disease/Data_Download.html. PMID:26044949
Efficient room-temperature source of polarized single photons
Lukishova, Svetlana G.; Boyd, Robert W.; Stroud, Carlos R.
2007-08-07
An efficient technique for producing deterministically polarized single photons uses liquid-crystal hosts of either monomeric or oligomeric/polymeric form to preferentially align the single emitters for maximum excitation efficiency. Deterministic molecular alignment also provides deterministically polarized output photons; using planar-aligned cholesteric liquid crystal hosts as 1-D photonic-band-gap microcavities tunable to the emitter fluorescence band to increase source efficiency, using liquid crystal technology to prevent emitter bleaching. Emitters comprise soluble dyes, inorganic nanocrystals or trivalent rare-earth chelates.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Weiwen; Culley, David E.; Gritsenko, Marina A.
2006-11-03
ABSTRACT In the previous study, the whole-genome gene expression profiles of D. vulgaris in response to oxidative stress and heat shock were determined. The results showed 24-28% of the responsive genes were hypothetical proteins that have not been experimentally characterized or whose function can not be deduced by simple sequence comparison. To further explore the protecting mechanisms employed in D. vulgaris against the oxidative stress and heat shock, attempt was made in this study to infer functions of these hypothetical proteins by phylogenomic profiling along with detailed sequence comparison against various publicly available databases. By this approach we were abletomore » assign possible functions to 25 responsive hypothetical proteins. The findings included that DVU0725, induced by oxidative stress, may be involved in lipopolysaccharide biosynthesis, implying that the alternation of lipopolysaccharide on cell surface might service as a mechanism against oxidative stress in D. vulgaris. In addition, two responsive proteins, DVU0024 encoding a putative transcriptional regulator and DVU1670 encoding predicted redox protein, were sharing co-evolution atterns with rubrerythrin in Archaeoglobus fulgidus and Clostridium perfringens, respectively, implying that they might be part of the stress response and protective systems in D. vulgaris. The study demonstrated that phylogenomic profiling is a useful tool in interpretation of experimental genomics data, and also provided further insight on cellular response to oxidative stress and heat shock in D. vulgaris.« less
Intermolecular correlations are necessary to explain diffuse scattering from protein crystals
Peck, Ariana; Poitevin, Frederic; Lane, Thomas Joseph
2018-02-21
Conformational changes drive protein function, including catalysis, allostery, and signaling. X-ray diffuse scattering from protein crystals has frequently been cited as a probe of these correlated motions, with significant potential to advance our understanding of biological dynamics. However, recent work challenged this prevailing view, suggesting instead that diffuse scattering primarily originates from rigid body motions and could therefore be applied to improve structure determination. To investigate the nature of the disorder giving rise to diffuse scattering, and thus the potential applications of this signal, a diverse repertoire of disorder models was assessed for its ability to reproduce the diffuse signalmore » reconstructed from three protein crystals. This comparison revealed that multiple models of intramolecular conformational dynamics, including ensemble models inferred from the Bragg data, could not explain the signal. Models of rigid body or short-range liquid-like motions, in which dynamics are confined to the biological unit, showed modest agreement with the diffuse maps, but were unable to reproduce experimental features indicative of long-range correlations. Extending a model of liquid-like motions to include disorder across neighboring proteins in the crystal significantly improved agreement with all three systems and highlighted the contribution of intermolecular correlations to the observed signal. These findings anticipate a need to account for intermolecular disorder in order to advance the interpretation of diffuse scattering to either extract biological motions or aid structural inference.« less
Rajjou, Loïc; Belghazi, Maya; Huguet, Romain; Robin, Caroline; Moreau, Adrien; Job, Claudette; Job, Dominique
2006-07-01
The influence of salicylic acid (SA) on elicitation of defense mechanisms in Arabidopsis (Arabidopsis thaliana) seeds and seedlings was assessed by physiological measurements combined with global expression profiling (proteomics). Parallel experiments were carried out using the NahG transgenic plants expressing the bacterial gene encoding SA hydroxylase, which cannot accumulate the active form of this plant defense elicitor. SA markedly improved germination under salt stress. Proteomic analyses disclosed a specific accumulation of protein spots regulated by SA as inferred by silver-nitrate staining of two-dimensional gels, detection of carbonylated (oxidized) proteins, and neosynthesized proteins with [35S]-methionine. The combined results revealed several processes potentially affected by SA. This molecule enhanced the reinduction of the late maturation program during early stages of germination, thereby allowing the germinating seeds to reinforce their capacity to mount adaptive responses in environmental water stress. Other processes affected by SA concerned the quality of protein translation, the priming of seed metabolism, the synthesis of antioxidant enzymes, and the mobilization of seed storage proteins. All the observed effects are likely to improve seed vigor. Another aspect revealed by this study concerned the oxidative stress entailed by SA in germinating seeds, as inferred from a characterization of the carbonylated (oxidized) proteome. Finally, the proteomic data revealed a close interplay between abscisic signaling and SA elicitation of seed vigor.
Intermolecular correlations are necessary to explain diffuse scattering from protein crystals
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peck, Ariana; Poitevin, Frederic; Lane, Thomas Joseph
Conformational changes drive protein function, including catalysis, allostery, and signaling. X-ray diffuse scattering from protein crystals has frequently been cited as a probe of these correlated motions, with significant potential to advance our understanding of biological dynamics. However, recent work challenged this prevailing view, suggesting instead that diffuse scattering primarily originates from rigid body motions and could therefore be applied to improve structure determination. To investigate the nature of the disorder giving rise to diffuse scattering, and thus the potential applications of this signal, a diverse repertoire of disorder models was assessed for its ability to reproduce the diffuse signalmore » reconstructed from three protein crystals. This comparison revealed that multiple models of intramolecular conformational dynamics, including ensemble models inferred from the Bragg data, could not explain the signal. Models of rigid body or short-range liquid-like motions, in which dynamics are confined to the biological unit, showed modest agreement with the diffuse maps, but were unable to reproduce experimental features indicative of long-range correlations. Extending a model of liquid-like motions to include disorder across neighboring proteins in the crystal significantly improved agreement with all three systems and highlighted the contribution of intermolecular correlations to the observed signal. These findings anticipate a need to account for intermolecular disorder in order to advance the interpretation of diffuse scattering to either extract biological motions or aid structural inference.« less
Stochastic model of transcription factor-regulated gene expression
NASA Astrophysics Data System (ADS)
Karmakar, Rajesh; Bose, Indrani
2006-09-01
We consider a stochastic model of transcription factor (TF)-regulated gene expression. The model describes two genes, gene A and gene B, which synthesize the TFs and the target gene proteins, respectively. We show through analytic calculations that the TF fluctuations have a significant effect on the distribution of the target gene protein levels when the mean TF level falls in the highest sensitive region of the dose-response curve. We further study the effect of reducing the copy number of gene A from two to one. The enhanced TF fluctuations yield results different from those in the deterministic case. The probability that the target gene protein level exceeds a threshold value is calculated with the knowledge of the probability density functions associated with the TF and target gene protein levels. Numerical simulation results for a more detailed stochastic model are shown to be in agreement with those obtained through analytic calculations. The relevance of these results in the context of the genetic disorder haploinsufficiency is pointed out. Some experimental observations on the haploinsufficiency of the tumour suppressor gene, Nkx 3.1, are explained with the help of the stochastic model of TF-regulated gene expression.
Modeling the evolution of protein domain architectures using maximum parsimony.
Fong, Jessica H; Geer, Lewis Y; Panchenko, Anna R; Bryant, Stephen H
2007-02-09
Domains are basic evolutionary units of proteins and most proteins have more than one domain. Advances in domain modeling and collection are making it possible to annotate a large fraction of known protein sequences by a linear ordering of their domains, yielding their architecture. Protein domain architectures link evolutionarily related proteins and underscore their shared functions. Here, we attempt to better understand this association by identifying the evolutionary pathways by which extant architectures may have evolved. We propose a model of evolution in which architectures arise through rearrangements of inferred precursor architectures and acquisition of new domains. These pathways are ranked using a parsimony principle, whereby scenarios requiring the fewest number of independent recombination events, namely fission and fusion operations, are assumed to be more likely. Using a data set of domain architectures present in 159 proteomes that represent all three major branches of the tree of life allows us to estimate the history of over 85% of all architectures in the sequence database. We find that the distribution of rearrangement classes is robust with respect to alternative parsimony rules for inferring the presence of precursor architectures in ancestral species. Analyzing the most parsimonious pathways, we find 87% of architectures to gain complexity over time through simple changes, among which fusion events account for 5.6 times as many architectures as fission. Our results may be used to compute domain architecture similarities, for example, based on the number of historical recombination events separating them. Domain architecture "neighbors" identified in this way may lead to new insights about the evolution of protein function.
Modeling the Evolution of Protein Domain Architectures Using Maximum Parsimony
Fong, Jessica H.; Geer, Lewis Y.; Panchenko, Anna R.; Bryant, Stephen H.
2007-01-01
Domains are basic evolutionary units of proteins and most proteins have more than one domain. Advances in domain modeling and collection are making it possible to annotate a large fraction of known protein sequences by a linear ordering of their domains, yielding their architecture. Protein domain architectures link evolutionarily related proteins and underscore their shared functions. Here, we attempt to better understand this association by identifying the evolutionary pathways by which extant architectures may have evolved. We propose a model of evolution in which architectures arise through rearrangements of inferred precursor architectures and acquisition of new domains. These pathways are ranked using a parsimony principle, whereby scenarios requiring the fewest number of independent recombination events, namely fission and fusion operations, are assumed to be more likely. Using a data set of domain architectures present in 159 proteomes that represent all three major branches of the tree of life allows us to estimate the history of over 85% of all architectures in the sequence database. We find that the distribution of rearrangement classes is robust with respect to alternative parsimony rules for inferring the presence of precursor architectures in ancestral species. Analyzing the most parsimonious pathways, we find 87% of architectures to gain complexity over time through simple changes, among which fusion events account for 5.6 times as many architectures as fission. Our results may be used to compute domain architecture similarities, for example, based on the number of historical recombination events separating them. Domain architecture “neighbors” identified in this way may lead to new insights about the evolution of protein function. PMID:17166515
Determination of gas phase protein ion densities via ion mobility analysis with charge reduction.
Maisser, Anne; Premnath, Vinay; Ghosh, Abhimanyu; Nguyen, Tuan Anh; Attoui, Michel; Hogan, Christopher J
2011-12-28
We use a charge reduction electrospray (ESI) source and subsequent ion mobility analysis with a differential mobility analyzer (DMA, with detection via both a Faraday cage electrometer and a condensation particle counter) to infer the densities of single and multiprotein ions of cytochrome C, lysozyme, myoglobin, ovalbumin, and bovine serum albumin produced from non-denaturing (20 mM aqueous ammonium acetate) and denaturing (1 : 49.5 : 49.5, formic acid : methanol : water) ESI. Charge reduction is achieved through use of a Po-210 radioactive source, which generates roughly equal concentrations of positive and negative ions. Ions produced by the source collide with and reduce the charge on ESI generated drops, preventing Coulombic fissions, and unlike typical protein ESI, leading to gas-phase protein ions with +1 to +3 excess charges. Therefore, charge reduction serves to effectively mitigate any role that Coulombic stretching may play on the structure of the gas phase ions. Density inference is made via determination of the mobility diameter, and correspondingly the spherical equivalent protein volume. Through this approach it is found that for both non-denaturing and denaturing ESI-generated ions, gas-phase protein ions are relatively compact, with average densities of 0.97 g cm(-3) and 0.86 g cm(-3), respectively. Ions from non-denaturing ESI are found to be slightly more compact than predicted from the protein crystal structures, suggesting that low charge state protein ions in the gas phase are slightly denser than their solution conformations. While a slight difference is detected between the ions produced with non-denaturing and denaturing ESI, the denatured ions are found to be much more dense than those examined previously by drift tube mobility analysis, in which charge reduction was not employed. This indicates that Coulombic stretching is typically what leads to non-compact ions in the gas-phase, and suggests that for gas phase measurements to be correlated to biomolecular structures in solution, low charge state ions should be analyzed. Further, to determine if different solution conditions give rise to ions of different structure, ions of similar charge state should be compared. Non-denatured protein ion densities are found to be in excellent agreement with non-denatured protein ion densities inferred from prior DMA and drift tube measurements made without charge reduction (all ions with densities in the 0.85-1.10 g cm(-3) range), showing that these ions are not strongly influenced by Coulombic stretching nor by analysis method.
Berger, Stephanie; Procko, Erik; Margineantu, Daciana; Lee, Erinna F; Shen, Betty W; Zelter, Alex; Silva, Daniel-Adriano; Chawla, Kusum; Herold, Marco J; Garnier, Jean-Marc; Johnson, Richard; MacCoss, Michael J; Lessene, Guillaume; Davis, Trisha N; Stayton, Patrick S; Stoddard, Barry L; Fairlie, W Douglas; Hockenbery, David M; Baker, David
2016-11-02
Many cancers overexpress one or more of the six human pro-survival BCL2 family proteins to evade apoptosis. To determine which BCL2 protein or proteins block apoptosis in different cancers, we computationally designed three-helix bundle protein inhibitors specific for each BCL2 pro-survival protein. Following in vitro optimization, each inhibitor binds its target with high picomolar to low nanomolar affinity and at least 300-fold specificity. Expression of the designed inhibitors in human cancer cell lines revealed unique dependencies on BCL2 proteins for survival which could not be inferred from other BCL2 profiling methods. Our results show that designed inhibitors can be generated for each member of a closely-knit protein family to probe the importance of specific protein-protein interactions in complex biological processes.
MEANS: python package for Moment Expansion Approximation, iNference and Simulation
Fan, Sisi; Geissmann, Quentin; Lakatos, Eszter; Lukauskas, Saulius; Ale, Angelique; Babtie, Ann C.; Kirk, Paul D. W.; Stumpf, Michael P. H.
2016-01-01
Motivation: Many biochemical systems require stochastic descriptions. Unfortunately these can only be solved for the simplest cases and their direct simulation can become prohibitively expensive, precluding thorough analysis. As an alternative, moment closure approximation methods generate equations for the time-evolution of the system’s moments and apply a closure ansatz to obtain a closed set of differential equations; that can become the basis for the deterministic analysis of the moments of the outputs of stochastic systems. Results: We present a free, user-friendly tool implementing an efficient moment expansion approximation with parametric closures that integrates well with the IPython interactive environment. Our package enables the analysis of complex stochastic systems without any constraints on the number of species and moments studied and the type of rate laws in the system. In addition to the approximation method our package provides numerous tools to help non-expert users in stochastic analysis. Availability and implementation: https://github.com/theosysbio/means Contacts: m.stumpf@imperial.ac.uk or e.lakatos13@imperial.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153663
MEANS: python package for Moment Expansion Approximation, iNference and Simulation.
Fan, Sisi; Geissmann, Quentin; Lakatos, Eszter; Lukauskas, Saulius; Ale, Angelique; Babtie, Ann C; Kirk, Paul D W; Stumpf, Michael P H
2016-09-15
Many biochemical systems require stochastic descriptions. Unfortunately these can only be solved for the simplest cases and their direct simulation can become prohibitively expensive, precluding thorough analysis. As an alternative, moment closure approximation methods generate equations for the time-evolution of the system's moments and apply a closure ansatz to obtain a closed set of differential equations; that can become the basis for the deterministic analysis of the moments of the outputs of stochastic systems. We present a free, user-friendly tool implementing an efficient moment expansion approximation with parametric closures that integrates well with the IPython interactive environment. Our package enables the analysis of complex stochastic systems without any constraints on the number of species and moments studied and the type of rate laws in the system. In addition to the approximation method our package provides numerous tools to help non-expert users in stochastic analysis. https://github.com/theosysbio/means m.stumpf@imperial.ac.uk or e.lakatos13@imperial.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Protein Structure Determination using Metagenome sequence data
Ovchinnikov, Sergey; Park, Hahnbeom; Varghese, Neha; Huang, Po-Ssu; Pavlopoulos, Georgios A.; Kim, David E.; Kamisetty, Hetunandan; Kyrpides, Nikos C.; Baker, David
2017-01-01
Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families, and that metagenome sequence data more than triples the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact based structure matching and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the PDB. This approach provides the representative models for large protein families originally envisioned as the goal of the protein structure initiative at a fraction of the cost. PMID:28104891
Nanotransfer and nanoreplication using deterministically grown sacrificial nanotemplates
Melechko, Anatoli V [Oak Ridge, TN; McKnight, Timothy E [Greenback, TN; Guillorn, Michael A [Ithaca, NY; Ilic, Bojan [Ithaca, NY; Merkulov, Vladimir I [Knoxville, TN; Doktycz, Mitchel J [Knoxville, TN; Lowndes, Douglas H [Knoxville, TN; Simpson, Michael L [Knoxville, TN
2011-08-23
Methods, manufactures, machines and compositions are described for nanotransfer and nanoreplication using deterministically grown sacrificial nanotemplates. An apparatus, includes a substrate and a nanoreplicant structure coupled to a surface of the substrate.
Numerical Approach to Spatial Deterministic-Stochastic Models Arising in Cell Biology
Gao, Fei; Li, Ye; Novak, Igor L.; Slepchenko, Boris M.
2016-01-01
Hybrid deterministic-stochastic methods provide an efficient alternative to a fully stochastic treatment of models which include components with disparate levels of stochasticity. However, general-purpose hybrid solvers for spatially resolved simulations of reaction-diffusion systems are not widely available. Here we describe fundamentals of a general-purpose spatial hybrid method. The method generates realizations of a spatially inhomogeneous hybrid system by appropriately integrating capabilities of a deterministic partial differential equation solver with a popular particle-based stochastic simulator, Smoldyn. Rigorous validation of the algorithm is detailed, using a simple model of calcium ‘sparks’ as a testbed. The solver is then applied to a deterministic-stochastic model of spontaneous emergence of cell polarity. The approach is general enough to be implemented within biologist-friendly software frameworks such as Virtual Cell. PMID:27959915
Stochasticity and determinism in models of hematopoiesis.
Kimmel, Marek
2014-01-01
This chapter represents a novel view of modeling in hematopoiesis, synthesizing both deterministic and stochastic approaches. Whereas the stochastic models work in situations where chance dominates, for example when the number of cells is small, or under random mutations, the deterministic models are more important for large-scale, normal hematopoiesis. New types of models are on the horizon. These models attempt to account for distributed environments such as hematopoietic niches and their impact on dynamics. Mixed effects of such structures and chance events are largely unknown and constitute both a challenge and promise for modeling. Our discussion is presented under the separate headings of deterministic and stochastic modeling; however, the connections between both are frequently mentioned. Four case studies are included to elucidate important examples. We also include a primer of deterministic and stochastic dynamics for the reader's use.
Raval, Alpan; Piana, Stefano; Eastwood, Michael P; Shaw, David E
2016-01-01
Molecular dynamics (MD) simulation is a well-established tool for the computational study of protein structure and dynamics, but its application to the important problem of protein structure prediction remains challenging, in part because extremely long timescales can be required to reach the native structure. Here, we examine the extent to which the use of low-resolution information in the form of residue-residue contacts, which can often be inferred from bioinformatics or experimental studies, can accelerate the determination of protein structure in simulation. We incorporated sets of 62, 31, or 15 contact-based restraints in MD simulations of ubiquitin, a benchmark system known to fold to the native state on the millisecond timescale in unrestrained simulations. One-third of the restrained simulations folded to the native state within a few tens of microseconds-a speedup of over an order of magnitude compared with unrestrained simulations and a demonstration of the potential for limited amounts of structural information to accelerate structure determination. Almost all of the remaining ubiquitin simulations reached near-native conformations within a few tens of microseconds, but remained trapped there, apparently due to the restraints. We discuss potential methodological improvements that would facilitate escape from these near-native traps and allow more simulations to quickly reach the native state. Finally, using a target from the Critical Assessment of protein Structure Prediction (CASP) experiment, we show that distance restraints can improve simulation accuracy: In our simulations, restraints stabilized the native state of the protein, enabling a reasonable structural model to be inferred. © 2015 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Simple Math is Enough: Two Examples of Inferring Functional Associations from Genomic Data
NASA Technical Reports Server (NTRS)
Liang, Shoudan
2003-01-01
Non-random features in the genomic data are usually biologically meaningful. The key is to choose the feature well. Having a p-value based score prioritizes the findings. If two proteins share a unusually large number of common interaction partners, they tend to be involved in the same biological process. We used this finding to predict the functions of 81 un-annotated proteins in yeast.
Hleap, Jose Sergio; Blouin, Christian
2018-01-01
The Glycoside Hydrolase Family 13 (GH13) is both evolutionarily diverse and relevant to many industrial applications. Its members hydrolyze starch into smaller carbohydrates and members of the family have been bioengineered to improve catalytic function under industrial environments. We introduce a framework to analyze the response to selection of GH13 protein structures given some phylogenetic and simulated dynamic information. We find that the TIM-barrel (a conserved protein fold consisting of eight α-helices and eight parallel β-strands that alternate along the peptide backbone, common to all amylases) is not selectable since it is under purifying selection. We also show a method to rank important residues with higher inferred response to selection. These residues can be altered to effect change in properties. In this work, we define fitness as inferred thermodynamic stability. We show that under the developed framework, residues 112Y, 122K, 124D, 125W, and 126P are good candidates to increase the stability of the truncated α-amylase protein from Geobacillus thermoleovorans (PDB code: 4E2O; α-1,4-glucan-4-glucanohydrolase; EC 3.2.1.1). Overall, this paper demonstrates the feasibility of a framework for the analysis of protein structures for any other fitness landscape.
2018-01-01
The Glycoside Hydrolase Family 13 (GH13) is both evolutionarily diverse and relevant to many industrial applications. Its members hydrolyze starch into smaller carbohydrates and members of the family have been bioengineered to improve catalytic function under industrial environments. We introduce a framework to analyze the response to selection of GH13 protein structures given some phylogenetic and simulated dynamic information. We find that the TIM-barrel (a conserved protein fold consisting of eight α-helices and eight parallel β-strands that alternate along the peptide backbone, common to all amylases) is not selectable since it is under purifying selection. We also show a method to rank important residues with higher inferred response to selection. These residues can be altered to effect change in properties. In this work, we define fitness as inferred thermodynamic stability. We show that under the developed framework, residues 112Y, 122K, 124D, 125W, and 126P are good candidates to increase the stability of the truncated α-amylase protein from Geobacillus thermoleovorans (PDB code: 4E2O; α-1,4-glucan-4-glucanohydrolase; EC 3.2.1.1). Overall, this paper demonstrates the feasibility of a framework for the analysis of protein structures for any other fitness landscape. PMID:29698417
Failed rib region prediction in a human body model during crash events with precrash braking.
Guleyupoglu, B; Koya, B; Barnard, R; Gayzik, F S
2018-02-28
The objective of this study is 2-fold. We used a validated human body finite element model to study the predicted chest injury (focusing on rib fracture as a function of element strain) based on varying levels of simulated precrash braking. Furthermore, we compare deterministic and probabilistic methods of rib injury prediction in the computational model. The Global Human Body Models Consortium (GHBMC) M50-O model was gravity settled in the driver position of a generic interior equipped with an advanced 3-point belt and airbag. Twelve cases were investigated with permutations for failure, precrash braking system, and crash severity. The severities used were median (17 kph), severe (34 kph), and New Car Assessment Program (NCAP; 56.4 kph). Cases with failure enabled removed rib cortical bone elements once 1.8% effective plastic strain was exceeded. Alternatively, a probabilistic framework found in the literature was used to predict rib failure. Both the probabilistic and deterministic methods take into consideration location (anterior, lateral, and posterior). The deterministic method is based on a rubric that defines failed rib regions dependent on a threshold for contiguous failed elements. The probabilistic method depends on age-based strain and failure functions. Kinematics between both methods were similar (peak max deviation: ΔX head = 17 mm; ΔZ head = 4 mm; ΔX thorax = 5 mm; ΔZ thorax = 1 mm). Seat belt forces at the time of probabilistic failed region initiation were lower than those at deterministic failed region initiation. The probabilistic method for rib fracture predicted more failed regions in the rib (an analog for fracture) than the deterministic method in all but 1 case where they were equal. The failed region patterns between models are similar; however, there are differences that arise due to stress reduced from element elimination that cause probabilistic failed regions to continue to rise after no deterministic failed region would be predicted. Both the probabilistic and deterministic methods indicate similar trends with regards to the effect of precrash braking; however, there are tradeoffs. The deterministic failed region method is more spatially sensitive to failure and is more sensitive to belt loads. The probabilistic failed region method allows for increased capability in postprocessing with respect to age. The probabilistic failed region method predicted more failed regions than the deterministic failed region method due to force distribution differences.
Energy Minimization of Discrete Protein Titration State Models Using Graph Theory.
Purvine, Emilie; Monson, Kyle; Jurrus, Elizabeth; Star, Keith; Baker, Nathan A
2016-08-25
There are several applications in computational biophysics that require the optimization of discrete interacting states, for example, amino acid titration states, ligand oxidation states, or discrete rotamer angles. Such optimization can be very time-consuming as it scales exponentially in the number of sites to be optimized. In this paper, we describe a new polynomial time algorithm for optimization of discrete states in macromolecular systems. This algorithm was adapted from image processing and uses techniques from discrete mathematics and graph theory to restate the optimization problem in terms of "maximum flow-minimum cut" graph analysis. The interaction energy graph, a graph in which vertices (amino acids) and edges (interactions) are weighted with their respective energies, is transformed into a flow network in which the value of the minimum cut in the network equals the minimum free energy of the protein and the cut itself encodes the state that achieves the minimum free energy. Because of its deterministic nature and polynomial time performance, this algorithm has the potential to allow for the ionization state of larger proteins to be discovered.
Energy Minimization of Discrete Protein Titration State Models Using Graph Theory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Purvine, Emilie AH; Monson, Kyle E.; Jurrus, Elizabeth R.
There are several applications in computational biophysics which require the optimization of discrete interacting states; e.g., amino acid titration states, ligand oxidation states, or discrete rotamer angles. Such optimization can be very time-consuming as it scales exponentially in the number of sites to be optimized. In this paper, we describe a new polynomial-time algorithm for optimization of discrete states in macromolecular systems. This algorithm was adapted from image processing and uses techniques from discrete mathematics and graph theory to restate the optimization problem in terms of maximum flow-minimum cut graph analysis. The interaction energy graph, a graph in which verticesmore » (amino acids) and edges (interactions) are weighted with their respective energies, is transformed into a flow network in which the value of the minimum cut in the network equals the minimum free energy of the protein, and the cut itself encodes the state that achieves the minimum free energy. Because of its deterministic nature and polynomial-time performance, this algorithm has the potential to allow for the ionization state of larger proteins to be discovered.« less
Protobiological informatoin, bidirectional recognition and reverse translation
NASA Technical Reports Server (NTRS)
Fox, S. W.; Nakashima, T.; Przybylski, A.; Vaughan, G.
1986-01-01
Emergence of protobiological information has been suggested by experiments in which heated mixtures of alpha-amino acids order themselves into a self limited array of thermal proteins. The polymers display selective catalytic, hormonal, and other activities. Interactions of varied cationic thermal proteins with polynucleotides indicate selective recognition in both directions. Reverse translation is partly a missing link in the molecular evolution flowsheet. The self ordering of amino acids serves conceptually as a deterministic evolutionary precursor of the modern coding mechanism. The possibility for the evolution of information at an early nontemplated protein stage is supported by findings of electrical signals from proteinoid microspheres prepared with no DNA/RNA in their history. The deposition of thermal copolyamino acids on lipid membranes in the Mueller-Rudin apparatus has here been found to produce electrical behavior like that evoked by bacterial EIM polypeptide. A new procedure is to make a film of membrane on the electrode; the results provide maximal repeatability. The principle of nonrandom biomacromolecular specificity identified by these studies in molecular evolution have been extrapolated to principles of evolution of advanced organisms.
Energy Minimization of Discrete Protein Titration State Models Using Graph Theory
Purvine, Emilie; Monson, Kyle; Jurrus, Elizabeth; Star, Keith; Baker, Nathan A.
2016-01-01
There are several applications in computational biophysics which require the optimization of discrete interacting states; e.g., amino acid titration states, ligand oxidation states, or discrete rotamer angles. Such optimization can be very time-consuming as it scales exponentially in the number of sites to be optimized. In this paper, we describe a new polynomial-time algorithm for optimization of discrete states in macromolecular systems. This algorithm was adapted from image processing and uses techniques from discrete mathematics and graph theory to restate the optimization problem in terms of “maximum flow-minimum cut” graph analysis. The interaction energy graph, a graph in which vertices (amino acids) and edges (interactions) are weighted with their respective energies, is transformed into a flow network in which the value of the minimum cut in the network equals the minimum free energy of the protein, and the cut itself encodes the state that achieves the minimum free energy. Because of its deterministic nature and polynomial-time performance, this algorithm has the potential to allow for the ionization state of larger proteins to be discovered. PMID:27089174
Schrag, Yann; Tremea, Alessandro; Lagger, Cyril; Ohana, Noé; Mohr, Christine
2016-01-01
Studies indicated that people behave less responsibly after exposure to information containing deterministic statements as compared to free will statements or neutral statements. Thus, deterministic primes should lead to enhanced risk-taking behavior. We tested this prediction in two studies with healthy participants. In experiment 1, we tested 144 students (24 men) in the laboratory using the Iowa Gambling Task. In experiment 2, we tested 274 participants (104 men) online using the Balloon Analogue Risk Task. In the Iowa Gambling Task, the free will priming condition resulted in more risky decisions than both the deterministic and neutral priming conditions. We observed no priming effects on risk-taking behavior in the Balloon Analogue Risk Task. To explain these unpredicted findings, we consider the somatic marker hypothesis, a gain frequency approach as well as attention to gains and / or inattention to losses. In addition, we highlight the necessity to consider both pro free will and deterministic priming conditions in future studies. Importantly, our and previous results indicate that the effects of pro free will and deterministic priming do not oppose each other on a frequently assumed continuum. PMID:27018854
Schrag, Yann; Tremea, Alessandro; Lagger, Cyril; Ohana, Noé; Mohr, Christine
2016-01-01
Studies indicated that people behave less responsibly after exposure to information containing deterministic statements as compared to free will statements or neutral statements. Thus, deterministic primes should lead to enhanced risk-taking behavior. We tested this prediction in two studies with healthy participants. In experiment 1, we tested 144 students (24 men) in the laboratory using the Iowa Gambling Task. In experiment 2, we tested 274 participants (104 men) online using the Balloon Analogue Risk Task. In the Iowa Gambling Task, the free will priming condition resulted in more risky decisions than both the deterministic and neutral priming conditions. We observed no priming effects on risk-taking behavior in the Balloon Analogue Risk Task. To explain these unpredicted findings, we consider the somatic marker hypothesis, a gain frequency approach as well as attention to gains and / or inattention to losses. In addition, we highlight the necessity to consider both pro free will and deterministic priming conditions in future studies. Importantly, our and previous results indicate that the effects of pro free will and deterministic priming do not oppose each other on a frequently assumed continuum.
The CTD2 Center at Emory University has developed a computational methodology to combine high-throughput knockdown data with known protein network topologies to infer the importance of protein-protein interactions (PPIs) for the survival of cancer cells. Applying these data to the Achilles shRNA results, the CCLE cell line characterizations, and known and newly identified PPIs provides novel insights for potential new drug targets for cancer therapies and identifies important PPI hubs.
Ji, Hong-Fang; Chen, Lei; Zhang, Hong-Yu
2008-08-01
Protein redox reactions are one of the most basic and important biochemical actions. As amino acids are weak redox mediators, most protein redox functions are undertaken by protein cofactors, which include organic ligands and transition metal ions. Since both kinds of redox cofactors were available in the pre-protein RNA world, it is challenging to explore which one was more involved in redox processes of primitive proteins? In this paper, using an examination of the redox cofactor usage of putative ancient proteins, we infer that organic ligands participated more frequently than transition metals in redox reactions of primitive proteins, at least as protein cofactors. This is further supported by the relative abundance of amino acids in the primordial world. Supplementary material for this article can be found on the BioEssays website. (c) 2008 Wiley Periodicals, Inc.
Cheng, Feixiong; Li, Weihua; Wu, Zengrui; Wang, Xichuan; Zhang, Chen; Li, Jie; Liu, Guixia; Tang, Yun
2013-04-22
Prediction of polypharmacological profiles of drugs enables us to investigate drug side effects and further find their new indications, i.e. drug repositioning, which could reduce the costs while increase the productivity of drug discovery. Here we describe a new computational framework to predict polypharmacological profiles of drugs by the integration of chemical, side effect, and therapeutic space. On the basis of our previous developed drug side effects database, named MetaADEDB, a drug side effect similarity inference (DSESI) method was developed for drug-target interaction (DTI) prediction on a known DTI network connecting 621 approved drugs and 893 target proteins. The area under the receiver operating characteristic curve was 0.882 ± 0.011 averaged from 100 simulated tests of 10-fold cross-validation for the DSESI method, which is comparative with drug structural similarity inference and drug therapeutic similarity inference methods. Seven new predicted candidate target proteins for seven approved drugs were confirmed by published experiments, with the successful hit rate more than 15.9%. Moreover, network visualization of drug-target interactions and off-target side effect associations provide new mechanism-of-action of three approved antipsychotic drugs in a case study. The results indicated that the proposed methods could be helpful for prediction of polypharmacological profiles of drugs.
Assigning protein functions by comparative genome analysis protein phylogenetic profiles
Pellegrini, Matteo; Marcotte, Edward M.; Thompson, Michael J.; Eisenberg, David; Grothe, Robert; Yeates, Todd O.
2003-05-13
A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.
Exploring Protein Function Using the Saccharomyces Genome Database.
Wong, Edith D
2017-01-01
Elucidating the function of individual proteins will help to create a comprehensive picture of cell biology, as well as shed light on human disease mechanisms, possible treatments, and cures. Due to its compact genome, and extensive history of experimentation and annotation, the budding yeast Saccharomyces cerevisiae is an ideal model organism in which to determine protein function. This information can then be leveraged to infer functions of human homologs. Despite the large amount of research and biological data about S. cerevisiae, many proteins' functions remain unknown. Here, we explore ways to use the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org ) to predict the function of proteins and gain insight into their roles in various cellular processes.
The Language of the Protein Universe
Scaiewicz, Andrea; Levitt, Michael
2015-01-01
Proteins, the main cell machinery which play a major roll in nearly every cellular process, have always been a central focus in biology. We live in the post-genomic era, and inferring information from massive data sets is a steadily growing universal challenge. The increasing availability of fully sequenced genomes can be regarded as the “Rosetta Stone” of the protein universe, allowing the understanding of genomes and their evolution, just as the original Rosetta Stone allowed Champollion to decipher the ancient Egyptian hieroglyphics. In this review, we consider aspects of the protein domain architectures repertoire that are closely related to those of human languages and aim to provide some insights about the language of proteins. PMID:26451980
Ion implantation for deterministic single atom devices
NASA Astrophysics Data System (ADS)
Pacheco, J. L.; Singh, M.; Perry, D. L.; Wendt, J. R.; Ten Eyck, G.; Manginell, R. P.; Pluym, T.; Luhman, D. R.; Lilly, M. P.; Carroll, M. S.; Bielejec, E.
2017-12-01
We demonstrate a capability of deterministic doping at the single atom level using a combination of direct write focused ion beam and solid-state ion detectors. The focused ion beam system can position a single ion to within 35 nm of a targeted location and the detection system is sensitive to single low energy heavy ions. This platform can be used to deterministically fabricate single atom devices in materials where the nanostructure and ion detectors can be integrated, including donor-based qubits in Si and color centers in diamond.
Counterfactual Quantum Deterministic Key Distribution
NASA Astrophysics Data System (ADS)
Zhang, Sheng; Wang, Jian; Tang, Chao-Jing
2013-01-01
We propose a new counterfactual quantum cryptography protocol concerning about distributing a deterministic key. By adding a controlled blocking operation module to the original protocol [T.G. Noh, Phys. Rev. Lett. 103 (2009) 230501], the correlation between the polarizations of the two parties, Alice and Bob, is extended, therefore, one can distribute both deterministic keys and random ones using our protocol. We have also given a simple proof of the security of our protocol using the technique we ever applied to the original protocol. Most importantly, our analysis produces a bound tighter than the existing ones.
Ion implantation for deterministic single atom devices
Pacheco, J. L.; Singh, M.; Perry, D. L.; ...
2017-12-04
Here, we demonstrate a capability of deterministic doping at the single atom level using a combination of direct write focused ion beam and solid-state ion detectors. The focused ion beam system can position a single ion to within 35 nm of a targeted location and the detection system is sensitive to single low energy heavy ions. This platform can be used to deterministically fabricate single atom devices in materials where the nanostructure and ion detectors can be integrated, including donor-based qubits in Si and color centers in diamond.
Deterministic quantum splitter based on time-reversed Hong-Ou-Mandel interference
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Jun; Lee, Kim Fook; Kumar, Prem
2007-09-15
By utilizing a fiber-based indistinguishable photon-pair source in the 1.55 {mu}m telecommunications band [J. Chen et al., Opt. Lett. 31, 2798 (2006)], we present the first, to the best of our knowledge, deterministic quantum splitter based on the principle of time-reversed Hong-Ou-Mandel quantum interference. The deterministically separated identical photons' indistinguishability is then verified by using a conventional Hong-Ou-Mandel quantum interference, which exhibits a near-unity dip visibility of 94{+-}1%, making this quantum splitter useful for various quantum information processing applications.
Eisenberg, David; Marcotte, Edward M.; Pellegrini, Matteo; Thompson, Michael J.; Yeates, Todd O.
2002-10-15
A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.
Ultrasensitive detection of protein translocated through toxin pores in droplet-interface bilayers
Fischer, Audrey; Holden, Matthew A.; Pentelute, Brad L.; Collier, R. John
2011-01-01
Many bacterial toxins form proteinaceous pores that facilitate the translocation of soluble effector proteins across cellular membranes. With anthrax toxin this process may be monitored in real time by electrophysiology, where fluctuations in ionic current through these pores inserted in model membranes are used to infer the translocation of individual protein molecules. However, detecting the minute quantities of translocated proteins has been a challenge. Here, we describe use of the droplet-interface bilayer system to follow the movement of proteins across a model membrane separating two submicroliter aqueous droplets. We report the capture and subsequent direct detection of as few as 100 protein molecules that have translocated through anthrax toxin pores. The droplet-interface bilayer system offers new avenues of approach to the study of protein translocation. PMID:21949363
Måren, Inger Elisabeth; Kapfer, Jutta; Aarrestad, Per Arild; Grytnes, John-Arvid; Vandvik, Vigdis
2018-01-01
Successional dynamics in plant community assembly may result from both deterministic and stochastic ecological processes. The relative importance of different ecological processes is expected to vary over the successional sequence, between different plant functional groups, and with the disturbance levels and land-use management regimes of the successional systems. We evaluate the relative importance of stochastic and deterministic processes in bryophyte and vascular plant community assembly after fire in grazed and ungrazed anthropogenic coastal heathlands in Northern Europe. A replicated series of post-fire successions (n = 12) were initiated under grazed and ungrazed conditions, and vegetation data were recorded in permanent plots over 13 years. We used redundancy analysis (RDA) to test for deterministic successional patterns in species composition repeated across the replicate successional series and analyses of co-occurrence to evaluate to what extent species respond synchronously along the successional gradient. Change in species co-occurrences over succession indicates stochastic successional dynamics at the species level (i.e., species equivalence), whereas constancy in co-occurrence indicates deterministic dynamics (successional niche differentiation). The RDA shows high and deterministic vascular plant community compositional change, especially early in succession. Co-occurrence analyses indicate stochastic species-level dynamics the first two years, which then give way to more deterministic replacements. Grazed and ungrazed successions are similar, but the early stage stochasticity is higher in ungrazed areas. Bryophyte communities in ungrazed successions resemble vascular plant communities. In contrast, bryophytes in grazed successions showed consistently high stochasticity and low determinism in both community composition and species co-occurrence. In conclusion, stochastic and individualistic species responses early in succession give way to more niche-driven dynamics in later successional stages. Grazing reduces predictability in both successional trends and species-level dynamics, especially in plant functional groups that are not well adapted to disturbance. © 2017 The Authors. Ecology, published by Wiley Periodicals, Inc., on behalf of the Ecological Society of America.
The Transcriptional Regulator CBP Has Defined Spatial Associations within Interphase Nuclei
McManus, Kirk J; Stephens, David A; Adams, Niall M; Islam, Suhail A; Freemont, Paul S; Hendzel, Michael J
2006-01-01
It is becoming increasingly clear that nuclear macromolecules and macromolecular complexes are compartmentalized through binding interactions into an apparent three-dimensionally ordered structure. This ordering, however, does not appear to be deterministic to the extent that chromatin and nonchromatin structures maintain a strict 3-D arrangement. Rather, spatial ordering within the cell nucleus appears to conform to stochastic rather than deterministic spatial relationships. The stochastic nature of organization becomes particularly problematic when any attempt is made to describe the spatial relationship between proteins involved in the regulation of the genome. The CREB–binding protein (CBP) is one such transcriptional regulator that, when visualised by confocal microscopy, reveals a highly punctate staining pattern comprising several hundred individual foci distributed within the nuclear volume. Markers for euchromatic sequences have similar patterns. Surprisingly, in most cases, the predicted one-to-one relationship between transcription factor and chromatin sequence is not observed. Consequently, to understand whether spatial relationships that are not coincident are nonrandom and potentially biologically important, it is necessary to develop statistical approaches. In this study, we report on the development of such an approach and apply it to understanding the role of CBP in mediating chromatin modification and transcriptional regulation. We have used nearest-neighbor distance measurements and probability analyses to study the spatial relationship between CBP and other nuclear subcompartments enriched in transcription factors, chromatin, and splicing factors. Our results demonstrate that CBP has an order of spatial association with other nuclear subcompartments. We observe closer associations between CBP and RNA polymerase II–enriched foci and SC35 speckles than nascent RNA or specific acetylated histones. Furthermore, we find that CBP has a significantly higher probability of being close to its known in vivo substrate histone H4 lysine 5 compared with the closely related H4 lysine 12. This study demonstrates that complex relationships not described by colocalization exist in the interphase nucleus and can be characterized and quantified. The subnuclear distribution of CBP is difficult to reconcile with a model where chromatin organization is the sole determinant of the nuclear organization of proteins that regulate transcription but is consistent with a close link between spatial associations and nuclear functions. PMID:17054391
Deterministic multidimensional nonuniform gap sampling.
Worley, Bradley; Powers, Robert
2015-12-01
Born from empirical observations in nonuniformly sampled multidimensional NMR data relating to gaps between sampled points, the Poisson-gap sampling method has enjoyed widespread use in biomolecular NMR. While the majority of nonuniform sampling schemes are fully randomly drawn from probability densities that vary over a Nyquist grid, the Poisson-gap scheme employs constrained random deviates to minimize the gaps between sampled grid points. We describe a deterministic gap sampling method, based on the average behavior of Poisson-gap sampling, which performs comparably to its random counterpart with the additional benefit of completely deterministic behavior. We also introduce a general algorithm for multidimensional nonuniform sampling based on a gap equation, and apply it to yield a deterministic sampling scheme that combines burst-mode sampling features with those of Poisson-gap schemes. Finally, we derive a relationship between stochastic gap equations and the expectation value of their sampling probability densities. Copyright © 2015 Elsevier Inc. All rights reserved.
A Comparison of Probabilistic and Deterministic Campaign Analysis for Human Space Exploration
NASA Technical Reports Server (NTRS)
Merrill, R. Gabe; Andraschko, Mark; Stromgren, Chel; Cirillo, Bill; Earle, Kevin; Goodliff, Kandyce
2008-01-01
Human space exploration is by its very nature an uncertain endeavor. Vehicle reliability, technology development risk, budgetary uncertainty, and launch uncertainty all contribute to stochasticity in an exploration scenario. However, traditional strategic analysis has been done in a deterministic manner, analyzing and optimizing the performance of a series of planned missions. History has shown that exploration scenarios rarely follow such a planned schedule. This paper describes a methodology to integrate deterministic and probabilistic analysis of scenarios in support of human space exploration. Probabilistic strategic analysis is used to simulate "possible" scenario outcomes, based upon the likelihood of occurrence of certain events and a set of pre-determined contingency rules. The results of the probabilistic analysis are compared to the nominal results from the deterministic analysis to evaluate the robustness of the scenario to adverse events and to test and optimize contingency planning.
First Order Reliability Application and Verification Methods for Semistatic Structures
NASA Technical Reports Server (NTRS)
Verderaime, Vincent
1994-01-01
Escalating risks of aerostructures stimulated by increasing size, complexity, and cost should no longer be ignored by conventional deterministic safety design methods. The deterministic pass-fail concept is incompatible with probability and risk assessments, its stress audits are shown to be arbitrary and incomplete, and it compromises high strength materials performance. A reliability method is proposed which combines first order reliability principles with deterministic design variables and conventional test technique to surmount current deterministic stress design and audit deficiencies. Accumulative and propagation design uncertainty errors are defined and appropriately implemented into the classical safety index expression. The application is reduced to solving for a factor that satisfies the specified reliability and compensates for uncertainty errors, and then using this factor as, and instead of, the conventional safety factor in stress analyses. The resulting method is consistent with current analytical skills and verification practices, the culture of most designers, and with the pace of semistatic structural designs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hall, David R; Bartholomew, David B; Moon, Justin
2009-09-08
An apparatus for fixing computational latency within a deterministic region on a network comprises a network interface modem, a high priority module and at least one deterministic peripheral device. The network interface modem is in communication with the network. The high priority module is in communication with the network interface modem. The at least one deterministic peripheral device is connected to the high priority module. The high priority module comprises a packet assembler/disassembler, and hardware for performing at least one operation. Also disclosed is an apparatus for executing at least one instruction on a downhole device within a deterministic region,more » the apparatus comprising a control device, a downhole network, and a downhole device. The control device is near the surface of a downhole tool string. The downhole network is integrated into the tool string. The downhole device is in communication with the downhole network.« less
Stochastic Petri Net extension of a yeast cell cycle model.
Mura, Ivan; Csikász-Nagy, Attila
2008-10-21
This paper presents the definition, solution and validation of a stochastic model of the budding yeast cell cycle, based on Stochastic Petri Nets (SPN). A specific family of SPNs is selected for building a stochastic version of a well-established deterministic model. We describe the procedure followed in defining the SPN model from the deterministic ODE model, a procedure that can be largely automated. The validation of the SPN model is conducted with respect to both the results provided by the deterministic one and the experimental results available from literature. The SPN model catches the behavior of the wild type budding yeast cells and a variety of mutants. We show that the stochastic model matches some characteristics of budding yeast cells that cannot be found with the deterministic model. The SPN model fine-tunes the simulation results, enriching the breadth and the quality of its outcome.
Effect of sample volume on metastable zone width and induction time
NASA Astrophysics Data System (ADS)
Kubota, Noriaki
2012-04-01
The metastable zone width (MSZW) and the induction time, measured for a large sample (say>0.1 L) are reproducible and deterministic, while, for a small sample (say<1 mL), these values are irreproducible and stochastic. Such behaviors of MSZW and induction time were theoretically discussed both with stochastic and deterministic models. Equations for the distribution of stochastic MSZW and induction time were derived. The average values of stochastic MSZW and induction time both decreased with an increase in sample volume, while, the deterministic MSZW and induction time remained unchanged. Such different behaviors with variation in sample volume were explained in terms of detection sensitivity of crystallization events. The average values of MSZW and induction time in the stochastic model were compared with the deterministic MSZW and induction time, respectively. Literature data reported for paracetamol aqueous solution were explained theoretically with the presented models.
Blocksome, Michael A.; Mamidala, Amith R.
2015-07-07
Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to a deterministic data communications network through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and the deterministic data communications network; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Blocksome, Michael A.; Mamidala, Amith R.
2015-07-14
Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to a deterministic data communications network through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and the deterministic data communications network; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Berger, Stephanie; Procko, Erik; Margineantu, Daciana; Lee, Erinna F; Shen, Betty W; Zelter, Alex; Silva, Daniel-Adriano; Chawla, Kusum; Herold, Marco J; Garnier, Jean-Marc; Johnson, Richard; MacCoss, Michael J; Lessene, Guillaume; Davis, Trisha N; Stayton, Patrick S; Stoddard, Barry L; Fairlie, W Douglas; Hockenbery, David M; Baker, David
2016-01-01
Many cancers overexpress one or more of the six human pro-survival BCL2 family proteins to evade apoptosis. To determine which BCL2 protein or proteins block apoptosis in different cancers, we computationally designed three-helix bundle protein inhibitors specific for each BCL2 pro-survival protein. Following in vitro optimization, each inhibitor binds its target with high picomolar to low nanomolar affinity and at least 300-fold specificity. Expression of the designed inhibitors in human cancer cell lines revealed unique dependencies on BCL2 proteins for survival which could not be inferred from other BCL2 profiling methods. Our results show that designed inhibitors can be generated for each member of a closely-knit protein family to probe the importance of specific protein-protein interactions in complex biological processes. DOI: http://dx.doi.org/10.7554/eLife.20352.001 PMID:27805565
Realistic Simulation for Body Area and Body-To-Body Networks
Alam, Muhammad Mahtab; Ben Hamida, Elyes; Ben Arbia, Dhafer; Maman, Mickael; Mani, Francesco; Denis, Benoit; D’Errico, Raffaele
2016-01-01
In this paper, we present an accurate and realistic simulation for body area networks (BAN) and body-to-body networks (BBN) using deterministic and semi-deterministic approaches. First, in the semi-deterministic approach, a real-time measurement campaign is performed, which is further characterized through statistical analysis. It is able to generate link-correlated and time-varying realistic traces (i.e., with consistent mobility patterns) for on-body and body-to-body shadowing and fading, including body orientations and rotations, by means of stochastic channel models. The full deterministic approach is particularly targeted to enhance IEEE 802.15.6 proposed channel models by introducing space and time variations (i.e., dynamic distances) through biomechanical modeling. In addition, it helps to accurately model the radio link by identifying the link types and corresponding path loss factors for line of sight (LOS) and non-line of sight (NLOS). This approach is particularly important for links that vary over time due to mobility. It is also important to add that the communication and protocol stack, including the physical (PHY), medium access control (MAC) and networking models, is developed for BAN and BBN, and the IEEE 802.15.6 compliance standard is provided as a benchmark for future research works of the community. Finally, the two approaches are compared in terms of the successful packet delivery ratio, packet delay and energy efficiency. The results show that the semi-deterministic approach is the best option; however, for the diversity of the mobility patterns and scenarios applicable, biomechanical modeling and the deterministic approach are better choices. PMID:27104537
Realistic Simulation for Body Area and Body-To-Body Networks.
Alam, Muhammad Mahtab; Ben Hamida, Elyes; Ben Arbia, Dhafer; Maman, Mickael; Mani, Francesco; Denis, Benoit; D'Errico, Raffaele
2016-04-20
In this paper, we present an accurate and realistic simulation for body area networks (BAN) and body-to-body networks (BBN) using deterministic and semi-deterministic approaches. First, in the semi-deterministic approach, a real-time measurement campaign is performed, which is further characterized through statistical analysis. It is able to generate link-correlated and time-varying realistic traces (i.e., with consistent mobility patterns) for on-body and body-to-body shadowing and fading, including body orientations and rotations, by means of stochastic channel models. The full deterministic approach is particularly targeted to enhance IEEE 802.15.6 proposed channel models by introducing space and time variations (i.e., dynamic distances) through biomechanical modeling. In addition, it helps to accurately model the radio link by identifying the link types and corresponding path loss factors for line of sight (LOS) and non-line of sight (NLOS). This approach is particularly important for links that vary over time due to mobility. It is also important to add that the communication and protocol stack, including the physical (PHY), medium access control (MAC) and networking models, is developed for BAN and BBN, and the IEEE 802.15.6 compliance standard is provided as a benchmark for future research works of the community. Finally, the two approaches are compared in terms of the successful packet delivery ratio, packet delay and energy efficiency. The results show that the semi-deterministic approach is the best option; however, for the diversity of the mobility patterns and scenarios applicable, biomechanical modeling and the deterministic approach are better choices.
Antibody-protein interactions: benchmark datasets and prediction tools evaluation
Ponomarenko, Julia V; Bourne, Philip E
2007-01-01
Background The ability to predict antibody binding sites (aka antigenic determinants or B-cell epitopes) for a given protein is a precursor to new vaccine design and diagnostics. Among the various methods of B-cell epitope identification X-ray crystallography is one of the most reliable methods. Using these experimental data computational methods exist for B-cell epitope prediction. As the number of structures of antibody-protein complexes grows, further interest in prediction methods using 3D structure is anticipated. This work aims to establish a benchmark for 3D structure-based epitope prediction methods. Results Two B-cell epitope benchmark datasets inferred from the 3D structures of antibody-protein complexes were defined. The first is a dataset of 62 representative 3D structures of protein antigens with inferred structural epitopes. The second is a dataset of 82 structures of antibody-protein complexes containing different structural epitopes. Using these datasets, eight web-servers developed for antibody and protein binding sites prediction have been evaluated. In no method did performance exceed a 40% precision and 46% recall. The values of the area under the receiver operating characteristic curve for the evaluated methods were about 0.6 for ConSurf, DiscoTope, and PPI-PRED methods and above 0.65 but not exceeding 0.70 for protein-protein docking methods when the best of the top ten models for the bound docking were considered; the remaining methods performed close to random. The benchmark datasets are included as a supplement to this paper. Conclusion It may be possible to improve epitope prediction methods through training on datasets which include only immune epitopes and through utilizing more features characterizing epitopes, for example, the evolutionary conservation score. Notwithstanding, overall poor performance may reflect the generality of antigenicity and hence the inability to decipher B-cell epitopes as an intrinsic feature of the protein. It is an open question as to whether ultimately discriminatory features can be found. PMID:17910770
({The) Solar System Large Planets influence on a new Maunder Miniμm}
NASA Astrophysics Data System (ADS)
Yndestad, Harald; Solheim, Jan-Erik
2016-04-01
In 1890´s G. Spörer and E. W. Maunder (1890) reported that the solar activity stopped in a period of 70 years from 1645 to 1715. Later a reconstruction of the solar activity confirms the grand minima Maunder (1640-1720), Spörer (1390-1550), Wolf (1270-1340), and the minima Oort (1010-1070) and Dalton (1785-1810) since the year 1000 A.D. (Usoskin et al. 2007). These minimum periods have been associated with less irradiation from the Sun and cold climate periods on Earth. An identification of a three grand Maunder type periods and two Dalton type periods in a period thousand years, indicates that sooner or later there will be a colder climate on Earth from a new Maunder- or Dalton- type period. The cause of these minimum periods, are not well understood. An expected new Maunder-type period is based on the properties of solar variability. If the solar variability has a deterministic element, we can estimate better a new Maunder grand minimum. A random solar variability can only explain the past. This investigation is based on the simple idea that if the solar variability has a deterministic property, it must have a deterministic source, as a first cause. If this deterministic source is known, we can compute better estimates the next expected Maunder grand minimum period. The study is based on a TSI ACRIM data series from 1700, a TSI ACRIM data series from 1000 A.D., sunspot data series from 1611 and a Solar Barycenter orbit data series from 1000. The analysis method is based on a wavelet spectrum analysis, to identify stationary periods, coincidence periods and their phase relations. The result shows that the TSI variability and the sunspots variability have deterministic oscillations, controlled by the large planets Jupiter, Uranus and Neptune, as the first cause. A deterministic model of TSI variability and sunspot variability confirms the known minimum and grand minimum periods since 1000. From this deterministic model we may expect a new Maunder type sunspot minimum period from about 2018 to 2055. The deterministic model of a TSI ACRIM data series from 1700 computes a new Maunder type grand minimum period from 2015 to 2071. A model of the longer TSI ACRIM data series from 1000 computes a new Dalton to Maunder type minimum irradiation period from 2047 to 2068.
Predicting helix–helix interactions from residue contacts in membrane proteins
Lo, Allan; Chiu, Yi-Yuan; Rødland, Einar Andreas; Lyu, Ping-Chiang; Sung, Ting-Yi; Hsu, Wen-Lian
2009-01-01
Motivation: Helix–helix interactions play a critical role in the structure assembly, stability and function of membrane proteins. On the molecular level, the interactions are mediated by one or more residue contacts. Although previous studies focused on helix-packing patterns and sequence motifs, few of them developed methods specifically for contact prediction. Results: We present a new hierarchical framework for contact prediction, with an application in membrane proteins. The hierarchical scheme consists of two levels: in the first level, contact residues are predicted from the sequence and their pairing relationships are further predicted in the second level. Statistical analyses on contact propensities are combined with other sequence and structural information for training the support vector machine classifiers. Evaluated on 52 protein chains using leave-one-out cross validation (LOOCV) and an independent test set of 14 protein chains, the two-level approach consistently improves the conventional direct approach in prediction accuracy, with 80% reduction of input for prediction. Furthermore, the predicted contacts are then used to infer interactions between pairs of helices. When at least three predicted contacts are required for an inferred interaction, the accuracy, sensitivity and specificity are 56%, 40% and 89%, respectively. Our results demonstrate that a hierarchical framework can be applied to eliminate false positives (FP) while reducing computational complexity in predicting contacts. Together with the estimated contact propensities, this method can be used to gain insights into helix-packing in membrane proteins. Availability: http://bio-cluster.iis.sinica.edu.tw/TMhit/ Contact: tsung@iis.sinica.edu.tw Supplementary information:Supplementary data are available at Bioinformatics online. PMID:19244388
Panzera, Alejandra; Leaché, Adam D; D'Elía, Guillermo; Victoriano, Pedro F
2017-01-01
The genus Liolaemus is one of the most ecologically diverse and species-rich genera of lizards worldwide. It currently includes more than 250 recognized species, which have been subject to many ecological and evolutionary studies. Nevertheless, Liolaemus lizards have a complex taxonomic history, mainly due to the incongruence between morphological and genetic data, incomplete taxon sampling, incomplete lineage sorting and hybridization. In addition, as many species have restricted and remote distributions, this has hampered their examination and inclusion in molecular systematic studies. The aims of this study are to infer a robust phylogeny for a subsample of lizards representing the Chilean clade (subgenus Liolaemus sensu stricto ), and to test the monophyly of several of the major species groups. We use a phylogenomic approach, targeting 541 ultra-conserved elements (UCEs) and 44 protein-coding genes for 16 taxa. We conduct a comparison of phylogenetic analyses using maximum-likelihood and several species tree inference methods. The UCEs provide stronger support for phylogenetic relationships compared to the protein-coding genes; however, the UCEs outnumber the protein-coding genes by 10-fold. On average, the protein-coding genes contain over twice the number of informative sites. Based on our phylogenomic analyses, all the groups sampled are polyphyletic. Liolaemus tenuis tenuis is difficult to place in the phylogeny, because only a few loci (nine) were recovered for this species. Topologies or support values did not change dramatically upon exclusion of L. t. tenuis from analyses, suggesting that missing data did not had a significant impact on phylogenetic inference in this data set. The phylogenomic analyses provide strong support for sister group relationships between L. fuscus , L. monticola , L. nigroviridis and L. nitidus , and L. platei and L. velosoi . Despite our limited taxon sampling, we have provided a reliable starting hypothesis for the relationships among many major groups of the Chilean clade of Liolaemus that will help future work aimed at resolving the Liolaemus phylogeny.
Pey, Jon; Valgepea, Kaspar; Rubio, Angel; Beasley, John E; Planes, Francisco J
2013-12-08
The study of cellular metabolism in the context of high-throughput -omics data has allowed us to decipher novel mechanisms of importance in biotechnology and health. To continue with this progress, it is essential to efficiently integrate experimental data into metabolic modeling. We present here an in-silico framework to infer relevant metabolic pathways for a particular phenotype under study based on its gene/protein expression data. This framework is based on the Carbon Flux Path (CFP) approach, a mixed-integer linear program that expands classical path finding techniques by considering additional biophysical constraints. In particular, the objective function of the CFP approach is amended to account for gene/protein expression data and influence obtained paths. This approach is termed integrative Carbon Flux Path (iCFP). We show that gene/protein expression data also influences the stoichiometric balancing of CFPs, which provides a more accurate picture of active metabolic pathways. This is illustrated in both a theoretical and real scenario. Finally, we apply this approach to find novel pathways relevant in the regulation of acetate overflow metabolism in Escherichia coli. As a result, several targets which could be relevant for better understanding of the phenomenon leading to impaired acetate overflow are proposed. A novel mathematical framework that determines functional pathways based on gene/protein expression data is presented and validated. We show that our approach is able to provide new insights into complex biological scenarios such as acetate overflow in Escherichia coli.
NASA Astrophysics Data System (ADS)
Aerts, Sven
2014-03-01
One of the problems facing any attempt to understand quantum theory is that the theory does not seem to offer an explanation of the way the probabilities arise. Moreover, it is a commonly held view that no such explanation is compatible with the mathematical structure of quantum theory, i.e. that the theory is inherently indeterministic, simply because nature is like that. We propose an abstract formalisation of the observation of a system in which the interaction between the system and the observer deterministically produces one of n possible outcomes. If the observer consistently manages to realize the outcome which maximizes the likelihood ratio that the outcome was inferred from the state of the system under study (and not from his own state), he will be called optimal. The probability for a repeated measurement on an ensemble of identical system states, is then derived as a measure over observer states. If the state of the system is a statistical mixture, the optimal observer produces an unbiased estimate of the components of the mixture. In case the state space is a complex Hilbert space, the resulting probability is equal to the one given by the Born rule. The proposal offers a concise interpretation for the meaning of the occurrence of a specific outcome as the unique outcome that, relative to the state of the system, is least dependent on the state of the observer. We note that a similar paradigm is used in the literature of perception to explain optical illusions in human visual perception. We argue that the result strengthens Helmholtz's view that all observation, is in fact a form a inference.
Ensemble Kalman filter inference of spatially-varying Manning's n coefficients in the coastal ocean
NASA Astrophysics Data System (ADS)
Siripatana, Adil; Mayo, Talea; Knio, Omar; Dawson, Clint; Maître, Olivier Le; Hoteit, Ibrahim
2018-07-01
Ensemble Kalman (EnKF) filtering is an established framework for large scale state estimation problems. EnKFs can also be used for state-parameter estimation, using the so-called "Joint-EnKF" approach. The idea is simply to augment the state vector with the parameters to be estimated and assign invariant dynamics for the time evolution of the parameters. In this contribution, we investigate the efficiency of the Joint-EnKF for estimating spatially-varying Manning's n coefficients used to define the bottom roughness in the Shallow Water Equations (SWEs) of a coastal ocean model. Observation System Simulation Experiments (OSSEs) are conducted using the ADvanced CIRCulation (ADCIRC) model, which solves a modified form of the Shallow Water Equations. A deterministic EnKF, the Singular Evolutive Interpolated Kalman (SEIK) filter, is used to estimate a vector of Manning's n coefficients defined at the model nodal points by assimilating synthetic water elevation data. It is found that with reasonable ensemble size (O (10)) , the filter's estimate converges to the reference Manning's field. To enhance performance, we have further reduced the dimension of the parameter search space through a Karhunen-Loéve (KL) expansion. We have also iterated on the filter update step to better account for the nonlinearity of the parameter estimation problem. We study the sensitivity of the system to the ensemble size, localization scale, dimension of retained KL modes, and number of iterations. The performance of the proposed framework in term of estimation accuracy suggests that a well-tuned Joint-EnKF provides a promising robust approach to infer spatially varying seabed roughness parameters in the context of coastal ocean modeling.
MaxEnt analysis of a water distribution network in Canberra, ACT, Australia
NASA Astrophysics Data System (ADS)
Waldrip, Steven H.; Niven, Robert K.; Abel, Markus; Schlegel, Michael; Noack, Bernd R.
2015-01-01
A maximum entropy (MaxEnt) method is developed to infer the state of a pipe flow network, for situations in which there is insufficient information to form a closed equation set. This approach substantially extends existing deterministic methods for the analysis of engineered flow networks (e.g. Newton's method or the Hardy Cross scheme). The network is represented as an undirected graph structure, in which the uncertainty is represented by a continuous relative entropy on the space of internal and external flow rates. The head losses (potential differences) on the network are treated as dependent variables, using specified pipe-flow resistance functions. The entropy is maximised subject to "observable" constraints on the mean values of certain flow rates and/or potential differences, and also "physical" constraints arising from the frictional properties of each pipe and from Kirchhoff's nodal and loop laws. A numerical method is developed in Matlab for solution of the integral equation system, based on multidimensional quadrature. Several nonlinear resistance functions (e.g. power-law and Colebrook) are investigated, necessitating numerical solution of the implicit Lagrangian by a double iteration scheme. The method is applied to a 1123-node, 1140-pipe water distribution network for the suburb of Torrens in the Australian Capital Territory, Australia, using network data supplied by water authority ACTEW Corporation Limited. A number of different assumptions are explored, including various network geometric representations, prior probabilities and constraint settings, yielding useful predictions of network demand and performance. We also propose this methodology be used in conjunction with in-flow monitoring systems, to obtain better inferences of user consumption without large investments in monitoring equipment and maintenance.
Controlling allosteric networks in proteins
NASA Astrophysics Data System (ADS)
Dokholyan, Nikolay
2013-03-01
We present a novel methodology based on graph theory and discrete molecular dynamics simulations for delineating allosteric pathways in proteins. We use this methodology to uncover the structural mechanisms responsible for coupling of distal sites on proteins and utilize it for allosteric modulation of proteins. We will present examples where inference of allosteric networks and its rewiring allows us to ``rescue'' cystic fibrosis transmembrane conductance regulator (CFTR), a protein associated with fatal genetic disease cystic fibrosis. We also use our methodology to control protein function allosterically. We design a novel protein domain that can be inserted into identified allosteric site of target protein. Using a drug that binds to our domain, we alter the function of the target protein. We successfully tested this methodology in vitro, in living cells and in zebrafish. We further demonstrate transferability of our allosteric modulation methodology to other systems and extend it to become ligh-activatable.
Genome-Wide SNP Genotyping to Infer the Effects on Gene Functions in Tomato
Hirakawa, Hideki; Shirasawa, Kenta; Ohyama, Akio; Fukuoka, Hiroyuki; Aoki, Koh; Rothan, Christophe; Sato, Shusei; Isobe, Sachiko; Tabata, Satoshi
2013-01-01
The genotype data of 7054 single nucleotide polymorphism (SNP) loci in 40 tomato lines, including inbred lines, F1 hybrids, and wild relatives, were collected using Illumina's Infinium and GoldenGate assay platforms, the latter of which was utilized in our previous study. The dendrogram based on the genotype data corresponded well to the breeding types of tomato and wild relatives. The SNPs were classified into six categories according to their positions in the genes predicted on the tomato genome sequence. The genes with SNPs were annotated by homology searches against the nucleotide and protein databases, as well as by domain searches, and they were classified into the functional categories defined by the NCBI's eukaryotic orthologous groups (KOG). To infer the SNPs' effects on the gene functions, the three-dimensional structures of the 843 proteins that were encoded by the genes with SNPs causing missense mutations were constructed by homology modelling, and 200 of these proteins were considered to carry non-synonymous amino acid substitutions in the predicted functional sites. The SNP information obtained in this study is available at the Kazusa Tomato Genomics Database (http://plant1.kazusa.or.jp/tomato/). PMID:23482505
Deterministic Computer-Controlled Polishing Process for High-Energy X-Ray Optics
NASA Technical Reports Server (NTRS)
Khan, Gufran S.; Gubarev, Mikhail; Speegle, Chet; Ramsey, Brian
2010-01-01
A deterministic computer-controlled polishing process for large X-ray mirror mandrels is presented. Using tool s influence function and material removal rate extracted from polishing experiments, design considerations of polishing laps and optimized operating parameters are discussed
Palmer, Tim N.; O’Shea, Michael
2015-01-01
How is the brain configured for creativity? What is the computational substrate for ‘eureka’ moments of insight? Here we argue that creative thinking arises ultimately from a synergy between low-energy stochastic and energy-intensive deterministic processing, and is a by-product of a nervous system whose signal-processing capability per unit of available energy has become highly energy optimised. We suggest that the stochastic component has its origin in thermal (ultimately quantum decoherent) noise affecting the activity of neurons. Without this component, deterministic computational models of the brain are incomplete. PMID:26528173
Deterministic and efficient quantum cryptography based on Bell's theorem
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen Zengbing; Pan Jianwei; Physikalisches Institut, Universitaet Heidelberg, Philosophenweg 12, 69120 Heidelberg
2006-05-15
We propose a double-entanglement-based quantum cryptography protocol that is both efficient and deterministic. The proposal uses photon pairs with entanglement both in polarization and in time degrees of freedom; each measurement in which both of the two communicating parties register a photon can establish one and only one perfect correlation, and thus deterministically create a key bit. Eavesdropping can be detected by violation of local realism. A variation of the protocol shows a higher security, similar to the six-state protocol, under individual attacks. Our scheme allows a robust implementation under the current technology.
Heart rate variability as determinism with jump stochastic parameters.
Zheng, Jiongxuan; Skufca, Joseph D; Bollt, Erik M
2013-08-01
We use measured heart rate information (RR intervals) to develop a one-dimensional nonlinear map that describes short term deterministic behavior in the data. Our study suggests that there is a stochastic parameter with persistence which causes the heart rate and rhythm system to wander about a bifurcation point. We propose a modified circle map with a jump process noise term as a model which can qualitatively capture such this behavior of low dimensional transient determinism with occasional (stochastically defined) jumps from one deterministic system to another within a one parameter family of deterministic systems.
NASA Astrophysics Data System (ADS)
Kim, Hojin; Choi, In Ho; Lee, Sanghyun; Won, Dong-Joon; Oh, Yong Suk; Kwon, Donghoon; Sung, Hyung Jin; Jeon, Sangmin; Kim, Joonwon
2017-04-01
This paper presents a deterministic bead-in-droplet ejection (BIDE) technique that regulates the precise distribution of microbeads in an ejected droplet. The deterministic BIDE was realized through the effective integration of a microfluidic single-particle handling technique with a liquid dispensing system. The integrated bead dispenser facilitates the transfer of the desired number of beads into a dispensing volume and the on-demand ejection of bead-encapsulated droplets. Single bead-encapsulated droplets were ejected every 3 s without any failure. Multiple-bead dispensing with deterministic control of the number of beads was demonstrated to emphasize the originality and quality of the proposed dispensing technique. The dispenser was mounted using a plug-socket type connection, and the dispensing process was completely automated using a programmed sequence without any microscopic observation. To demonstrate a potential application of the technique, bead-based streptavidin-biotin binding assay in an evaporating droplet was conducted using ultralow numbers of beads. The results evidenced the number of beads in the droplet crucially influences the reliability of the assay. Therefore, the proposed deterministic bead-in-droplet technology can be utilized to deliver desired beads onto a reaction site, particularly to reliably and efficiently enrich and detect target biomolecules.
Kim, Hojin; Choi, In Ho; Lee, Sanghyun; Won, Dong-Joon; Oh, Yong Suk; Kwon, Donghoon; Sung, Hyung Jin; Jeon, Sangmin; Kim, Joonwon
2017-04-10
This paper presents a deterministic bead-in-droplet ejection (BIDE) technique that regulates the precise distribution of microbeads in an ejected droplet. The deterministic BIDE was realized through the effective integration of a microfluidic single-particle handling technique with a liquid dispensing system. The integrated bead dispenser facilitates the transfer of the desired number of beads into a dispensing volume and the on-demand ejection of bead-encapsulated droplets. Single bead-encapsulated droplets were ejected every 3 s without any failure. Multiple-bead dispensing with deterministic control of the number of beads was demonstrated to emphasize the originality and quality of the proposed dispensing technique. The dispenser was mounted using a plug-socket type connection, and the dispensing process was completely automated using a programmed sequence without any microscopic observation. To demonstrate a potential application of the technique, bead-based streptavidin-biotin binding assay in an evaporating droplet was conducted using ultralow numbers of beads. The results evidenced the number of beads in the droplet crucially influences the reliability of the assay. Therefore, the proposed deterministic bead-in-droplet technology can be utilized to deliver desired beads onto a reaction site, particularly to reliably and efficiently enrich and detect target biomolecules.
Weinberg, Seth H.; Smith, Gregory D.
2012-01-01
Cardiac myocyte calcium signaling is often modeled using deterministic ordinary differential equations (ODEs) and mass-action kinetics. However, spatially restricted “domains” associated with calcium influx are small enough (e.g., 10−17 liters) that local signaling may involve 1–100 calcium ions. Is it appropriate to model the dynamics of subspace calcium using deterministic ODEs or, alternatively, do we require stochastic descriptions that account for the fundamentally discrete nature of these local calcium signals? To address this question, we constructed a minimal Markov model of a calcium-regulated calcium channel and associated subspace. We compared the expected value of fluctuating subspace calcium concentration (a result that accounts for the small subspace volume) with the corresponding deterministic model (an approximation that assumes large system size). When subspace calcium did not regulate calcium influx, the deterministic and stochastic descriptions agreed. However, when calcium binding altered channel activity in the model, the continuous deterministic description often deviated significantly from the discrete stochastic model, unless the subspace volume is unrealistically large and/or the kinetics of the calcium binding are sufficiently fast. This principle was also demonstrated using a physiologically realistic model of calmodulin regulation of L-type calcium channels introduced by Yue and coworkers. PMID:23509597
Kim, Hojin; Choi, In Ho; Lee, Sanghyun; Won, Dong-Joon; Oh, Yong Suk; Kwon, Donghoon; Sung, Hyung Jin; Jeon, Sangmin; Kim, Joonwon
2017-01-01
This paper presents a deterministic bead-in-droplet ejection (BIDE) technique that regulates the precise distribution of microbeads in an ejected droplet. The deterministic BIDE was realized through the effective integration of a microfluidic single-particle handling technique with a liquid dispensing system. The integrated bead dispenser facilitates the transfer of the desired number of beads into a dispensing volume and the on-demand ejection of bead-encapsulated droplets. Single bead–encapsulated droplets were ejected every 3 s without any failure. Multiple-bead dispensing with deterministic control of the number of beads was demonstrated to emphasize the originality and quality of the proposed dispensing technique. The dispenser was mounted using a plug-socket type connection, and the dispensing process was completely automated using a programmed sequence without any microscopic observation. To demonstrate a potential application of the technique, bead-based streptavidin–biotin binding assay in an evaporating droplet was conducted using ultralow numbers of beads. The results evidenced the number of beads in the droplet crucially influences the reliability of the assay. Therefore, the proposed deterministic bead-in-droplet technology can be utilized to deliver desired beads onto a reaction site, particularly to reliably and efficiently enrich and detect target biomolecules. PMID:28393911
NASA Astrophysics Data System (ADS)
Mukherjee, L.; Zhai, P.; Hu, Y.; Winker, D. M.
2016-12-01
Among the primary factors, which determine the polarized radiation, field of a turbid medium are the single scattering properties of the medium. When multiple types of scatterers are present, the single scattering properties of the scatterers need to be properly mixed in order to find the solutions to the vector radiative transfer theory (VRT). The VRT solvers can be divided into two types: deterministic and stochastic. The deterministic solver can only accept one set of single scattering property in its smallest discretized spatial volume. When the medium contains more than one kind of scatterer, their single scattering properties are averaged, and then used as input for the deterministic solver. The stochastic solver, can work with different kinds of scatterers explicitly. In this work, two different mixing schemes are studied using the Successive Order of Scattering (SOS) method and Monte Carlo (MC) methods. One scheme is used for deterministic and the other is used for the stochastic Monte Carlo method. It is found that the solutions from the two VRT solvers using two different mixing schemes agree with each other extremely well. This confirms the equivalence to the two mixing schemes and also provides a benchmark for the VRT solution for the medium studied.
Serang, Oliver
2014-01-01
Exact Bayesian inference can sometimes be performed efficiently for special cases where a function has commutative and associative symmetry of its inputs (called "causal independence"). For this reason, it is desirable to exploit such symmetry on big data sets. Here we present a method to exploit a general form of this symmetry on probabilistic adder nodes by transforming those probabilistic adder nodes into a probabilistic convolution tree with which dynamic programming computes exact probabilities. A substantial speedup is demonstrated using an illustration example that can arise when identifying splice forms with bottom-up mass spectrometry-based proteomics. On this example, even state-of-the-art exact inference algorithms require a runtime more than exponential in the number of splice forms considered. By using the probabilistic convolution tree, we reduce the runtime to O(k log(k)2) and the space to O(k log(k)) where k is the number of variables joined by an additive or cardinal operator. This approach, which can also be used with junction tree inference, is applicable to graphs with arbitrary dependency on counting variables or cardinalities and can be used on diverse problems and fields like forward error correcting codes, elemental decomposition, and spectral demixing. The approach also trivially generalizes to multiple dimensions.
Serang, Oliver
2014-01-01
Exact Bayesian inference can sometimes be performed efficiently for special cases where a function has commutative and associative symmetry of its inputs (called “causal independence”). For this reason, it is desirable to exploit such symmetry on big data sets. Here we present a method to exploit a general form of this symmetry on probabilistic adder nodes by transforming those probabilistic adder nodes into a probabilistic convolution tree with which dynamic programming computes exact probabilities. A substantial speedup is demonstrated using an illustration example that can arise when identifying splice forms with bottom-up mass spectrometry-based proteomics. On this example, even state-of-the-art exact inference algorithms require a runtime more than exponential in the number of splice forms considered. By using the probabilistic convolution tree, we reduce the runtime to and the space to where is the number of variables joined by an additive or cardinal operator. This approach, which can also be used with junction tree inference, is applicable to graphs with arbitrary dependency on counting variables or cardinalities and can be used on diverse problems and fields like forward error correcting codes, elemental decomposition, and spectral demixing. The approach also trivially generalizes to multiple dimensions. PMID:24626234
The language of the protein universe.
Scaiewicz, Andrea; Levitt, Michael
2015-12-01
Proteins, the main cell machinery which play a major role in nearly every cellular process, have always been a central focus in biology. We live in the post-genomic era, and inferring information from massive data sets is a steadily growing universal challenge. The increasing availability of fully sequenced genomes can be regarded as the 'Rosetta Stone' of the protein universe, allowing the understanding of genomes and their evolution, just as the original Rosetta Stone allowed Champollion to decipher the ancient Egyptian hieroglyphics. In this review, we consider aspects of the protein domain architectures repertoire that are closely related to those of human languages and aim to provide some insights about the language of proteins. Copyright © 2015 Elsevier Ltd. All rights reserved.
Determining protein function and interaction from genome analysis
Eisenberg, David; Marcotte, Edward M.; Thompson, Michael J.; Pellegrini, Matteo; Yeates, Todd O.
2004-08-03
A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.
Protein Function Prediction: Problems and Pitfalls.
Pearson, William R
2015-09-03
The characterization of new genomes based on their protein sets has been revolutionized by new sequencing technologies, but biologists seeking to exploit new sequence information are often frustrated by the challenges associated with accurately assigning biological functions to newly identified proteins. Here, we highlight some of the challenges in functional inference from sequence similarity. Investigators can improve the accuracy of function prediction by (1) being conservative about the evolutionary distance to a protein of known function; (2) considering the ambiguous meaning of "functional similarity," and (3) being aware of the limitations of annotations in functional databases. Protein function prediction does not offer "one-size-fits-all" solutions. Prediction strategies work better when the idiosyncrasies of function and functional annotation are better understood. Copyright © 2015 John Wiley & Sons, Inc.
Emerging functions of alternative splicing coupled with nonsense-mediated decay.
Hamid, Fursham M; Makeyev, Eugene V
2014-08-01
Higher eukaryotes rely on AS (alternative splicing) of pre-mRNAs (mRNA precursors) to generate more than one protein product from a single gene and to regulate mRNA stability and translational activity. An important example of the latter function involves an interplay between AS and NMD (nonsense-mediated decay), a cytoplasmic quality control mechanism eliminating mRNAs containing PTCs (premature translation termination codons). Although originally identified as an error surveillance process, AS-NMD additionally provides an efficient strategy for deterministic regulation of gene expression outputs. In this review, we discuss recently published examples of AS-NMD and delineate functional contexts where recurrent use of this mechanism orchestrates expression of important genes.
Extraction of intracellular protein from Glaciozyma antarctica for proteomics analysis
NASA Astrophysics Data System (ADS)
Faizura, S. Nor; Farahayu, K.; Faizal, A. B. Mohd; Asmahani, A. A. S.; Amir, R.; Nazalan, N.; Diba, A. B. Farah; Muhammad, M. Nor; Munir, A. M. Abdul
2013-11-01
Two preparation methods of crude extracts of psychrophilic yeast Glaciozyma antarctica were compared in order to obtain a good recovery of intracellular proteins. Extraction with mechanical procedures using sonication was found to be more effective for obtaining good yield compare to alkaline treatment method. The procedure is simple, rapid, and produce better yield. A total of 52 proteins were identified by combining both extraction methods. Most of the proteins identified in this study involves in the metabolic process including glycolysis pathway, pentose phosphate pathway, pyruyate decarboxylation and also urea cyle. Several chaperons were identified including probable cpr1-cyclophilin (peptidylprolyl isomerase), macrolide-binding protein fkbp12 and heat shock proteins which were postulate to accelerate proper protein folding. Characteristic of the fundamental cellular processes inferred from the expressed-proteome highlight the evolutionary and functional complexity existing in this domain of life.
Cooperativity and modularity in protein folding
Sasai, Masaki; Chikenji, George; Terada, Tomoki P.
2016-01-01
A simple statistical mechanical model proposed by Wako and Saitô has explained the aspects of protein folding surprisingly well. This model was systematically applied to multiple proteins by Muñoz and Eaton and has since been referred to as the Wako-Saitô-Muñoz-Eaton (WSME) model. The success of the WSME model in explaining the folding of many proteins has verified the hypothesis that the folding is dominated by native interactions, which makes the energy landscape globally biased toward native conformation. Using the WSME and other related models, Saitô emphasized the importance of the hierarchical pathway in protein folding; folding starts with the creation of contiguous segments having a native-like configuration and proceeds as growth and coalescence of these segments. The Φ-values calculated for barnase with the WSME model suggested that segments contributing to the folding nucleus are similar to the structural modules defined by the pattern of native atomic contacts. The WSME model was extended to explain folding of multi-domain proteins having a complex topology, which opened the way to comprehensively understanding the folding process of multi-domain proteins. The WSME model was also extended to describe allosteric transitions, indicating that the allosteric structural movement does not occur as a deterministic sequential change between two conformations but as a stochastic diffusive motion over the dynamically changing energy landscape. Statistical mechanical viewpoint on folding, as highlighted by the WSME model, has been renovated in the context of modern methods and ideas, and will continue to provide insights on equilibrium and dynamical features of proteins. PMID:28409080
Constructing an integrated gene similarity network for the identification of disease genes.
Tian, Zhen; Guo, Maozu; Wang, Chunyu; Xing, LinLin; Wang, Lei; Zhang, Yin
2017-09-20
Discovering novel genes that are involved human diseases is a challenging task in biomedical research. In recent years, several computational approaches have been proposed to prioritize candidate disease genes. Most of these methods are mainly based on protein-protein interaction (PPI) networks. However, since these PPI networks contain false positives and only cover less half of known human genes, their reliability and coverage are very low. Therefore, it is highly necessary to fuse multiple genomic data to construct a credible gene similarity network and then infer disease genes on the whole genomic scale. We proposed a novel method, named RWRB, to infer causal genes of interested diseases. First, we construct five individual gene (protein) similarity networks based on multiple genomic data of human genes. Then, an integrated gene similarity network (IGSN) is reconstructed based on similarity network fusion (SNF) method. Finally, we employee the random walk with restart algorithm on the phenotype-gene bilayer network, which combines phenotype similarity network, IGSN as well as phenotype-gene association network, to prioritize candidate disease genes. We investigate the effectiveness of RWRB through leave-one-out cross-validation methods in inferring phenotype-gene relationships. Results show that RWRB is more accurate than state-of-the-art methods on most evaluation metrics. Further analysis shows that the success of RWRB is benefited from IGSN which has a wider coverage and higher reliability comparing with current PPI networks. Moreover, we conduct a comprehensive case study for Alzheimer's disease and predict some novel disease genes that supported by literature. RWRB is an effective and reliable algorithm in prioritizing candidate disease genes on the genomic scale. Software and supplementary information are available at http://nclab.hit.edu.cn/~tianzhen/RWRB/ .
Deterministic models for traffic jams
NASA Astrophysics Data System (ADS)
Nagel, Kai; Herrmann, Hans J.
1993-10-01
We study several deterministic one-dimensional traffic models. For integer positions and velocities we find the typical high and low density phases separated by a simple transition. If positions and velocities are continuous variables the model shows self-organized critically driven by the slowest car.
The penumbra of learning: a statistical theory of synaptic tagging and capture.
Gershman, Samuel J
2014-01-01
Learning in humans and animals is accompanied by a penumbra: Learning one task benefits from learning an unrelated task shortly before or after. At the cellular level, the penumbra of learning appears when weak potentiation of one synapse is amplified by strong potentiation of another synapse on the same neuron during a critical time window. Weak potentiation sets a molecular tag that enables the synapse to capture plasticity-related proteins synthesized in response to strong potentiation at another synapse. This paper describes a computational model which formalizes synaptic tagging and capture in terms of statistical learning mechanisms. According to this model, synaptic strength encodes a probabilistic inference about the dynamically changing association between pre- and post-synaptic firing rates. The rate of change is itself inferred, coupling together different synapses on the same neuron. When the inputs to one synapse change rapidly, the inferred rate of change increases, amplifying learning at other synapses.
Li, Jieyue; Xiong, Liang; Schneider, Jeff; Murphy, Robert F
2012-06-15
Knowledge of the subcellular location of a protein is crucial for understanding its functions. The subcellular pattern of a protein is typically represented as the set of cellular components in which it is located, and an important task is to determine this set from microscope images. In this article, we address this classification problem using confocal immunofluorescence images from the Human Protein Atlas (HPA) project. The HPA contains images of cells stained for many proteins; each is also stained for three reference components, but there are many other components that are invisible. Given one such cell, the task is to classify the pattern type of the stained protein. We first randomly select local image regions within the cells, and then extract various carefully designed features from these regions. This region-based approach enables us to explicitly study the relationship between proteins and different cell components, as well as the interactions between these components. To achieve these two goals, we propose two discriminative models that extend logistic regression with structured latent variables. The first model allows the same protein pattern class to be expressed differently according to the underlying components in different regions. The second model further captures the spatial dependencies between the components within the same cell so that we can better infer these components. To learn these models, we propose a fast approximate algorithm for inference, and then use gradient-based methods to maximize the data likelihood. In the experiments, we show that the proposed models help improve the classification accuracies on synthetic data and real cellular images. The best overall accuracy we report in this article for classifying 942 proteins into 13 classes of patterns is about 84.6%, which to our knowledge is the best so far. In addition, the dependencies learned are consistent with prior knowledge of cell organization. http://murphylab.web.cmu.edu/software/.
An integrative approach to inferring biologically meaningful gene modules.
Cho, Ji-Hoon; Wang, Kai; Galas, David J
2011-07-26
The ability to construct biologically meaningful gene networks and modules is critical for contemporary systems biology. Though recent studies have demonstrated the power of using gene modules to shed light on the functioning of complex biological systems, most modules in these networks have shown little association with meaningful biological function. We have devised a method which directly incorporates gene ontology (GO) annotation in construction of gene modules in order to gain better functional association. We have devised a method, Semantic Similarity-Integrated approach for Modularization (SSIM) that integrates various gene-gene pairwise similarity values, including information obtained from gene expression, protein-protein interactions and GO annotations, in the construction of modules using affinity propagation clustering. We demonstrated the performance of the proposed method using data from two complex biological responses: 1. the osmotic shock response in Saccharomyces cerevisiae, and 2. the prion-induced pathogenic mouse model. In comparison with two previously reported algorithms, modules identified by SSIM showed significantly stronger association with biological functions. The incorporation of semantic similarity based on GO annotation with gene expression and protein-protein interaction data can greatly enhance the functional relevance of inferred gene modules. In addition, the SSIM approach can also reveal the hierarchical structure of gene modules to gain a broader functional view of the biological system. Hence, the proposed method can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level.
The meta-Gaussian Bayesian Processor of forecasts and associated preliminary experiments
NASA Astrophysics Data System (ADS)
Chen, Fajing; Jiao, Meiyan; Chen, Jing
2013-04-01
Public weather services are trending toward providing users with probabilistic weather forecasts, in place of traditional deterministic forecasts. Probabilistic forecasting techniques are continually being improved to optimize available forecasting information. The Bayesian Processor of Forecast (BPF), a new statistical method for probabilistic forecast, can transform a deterministic forecast into a probabilistic forecast according to the historical statistical relationship between observations and forecasts generated by that forecasting system. This technique accounts for the typical forecasting performance of a deterministic forecasting system in quantifying the forecast uncertainty. The meta-Gaussian likelihood model is suitable for a variety of stochastic dependence structures with monotone likelihood ratios. The meta-Gaussian BPF adopting this kind of likelihood model can therefore be applied across many fields, including meteorology and hydrology. The Bayes theorem with two continuous random variables and the normal-linear BPF are briefly introduced. The meta-Gaussian BPF for a continuous predictand using a single predictor is then presented and discussed. The performance of the meta-Gaussian BPF is tested in a preliminary experiment. Control forecasts of daily surface temperature at 0000 UTC at Changsha and Wuhan stations are used as the deterministic forecast data. These control forecasts are taken from ensemble predictions with a 96-h lead time generated by the National Meteorological Center of the China Meteorological Administration, the European Centre for Medium-Range Weather Forecasts, and the US National Centers for Environmental Prediction during January 2008. The results of the experiment show that the meta-Gaussian BPF can transform a deterministic control forecast of surface temperature from any one of the three ensemble predictions into a useful probabilistic forecast of surface temperature. These probabilistic forecasts quantify the uncertainty of the control forecast; accordingly, the performance of the probabilistic forecasts differs based on the source of the underlying deterministic control forecasts.
Benedetti-Cecchi, Lisandro; Canepa, Antonio; Fuentes, Veronica; Tamburello, Laura; Purcell, Jennifer E; Piraino, Stefano; Roberts, Jason; Boero, Ferdinando; Halpin, Patrick
2015-01-01
Jellyfish outbreaks are increasingly viewed as a deterministic response to escalating levels of environmental degradation and climate extremes. However, a comprehensive understanding of the influence of deterministic drivers and stochastic environmental variations favouring population renewal processes has remained elusive. This study quantifies the deterministic and stochastic components of environmental change that lead to outbreaks of the jellyfish Pelagia noctiluca in the Mediterranen Sea. Using data of jellyfish abundance collected at 241 sites along the Catalan coast from 2007 to 2010 we: (1) tested hypotheses about the influence of time-varying and spatial predictors of jellyfish outbreaks; (2) evaluated the relative importance of stochastic vs. deterministic forcing of outbreaks through the environmental bootstrap method; and (3) quantified return times of extreme events. Outbreaks were common in May and June and less likely in other summer months, which resulted in a negative relationship between outbreaks and SST. Cross- and along-shore advection by geostrophic flow were important concentrating forces of jellyfish, but most outbreaks occurred in the proximity of two canyons in the northern part of the study area. This result supported the recent hypothesis that canyons can funnel P. noctiluca blooms towards shore during upwelling. This can be a general, yet unappreciated mechanism leading to outbreaks of holoplanktonic jellyfish species. The environmental bootstrap indicated that stochastic environmental fluctuations have negligible effects on return times of outbreaks. Our analysis emphasized the importance of deterministic processes leading to jellyfish outbreaks compared to the stochastic component of environmental variation. A better understanding of how environmental drivers affect demographic and population processes in jellyfish species will increase the ability to anticipate jellyfish outbreaks in the future.
Isolating intrinsic noise sources in a stochastic genetic switch.
Newby, Jay M
2012-01-01
The stochastic mutual repressor model is analysed using perturbation methods. This simple model of a gene circuit consists of two genes and three promotor states. Either of the two protein products can dimerize, forming a repressor molecule that binds to the promotor of the other gene. When the repressor is bound to a promotor, the corresponding gene is not transcribed and no protein is produced. Either one of the promotors can be repressed at any given time or both can be unrepressed, leaving three possible promotor states. This model is analysed in its bistable regime in which the deterministic limit exhibits two stable fixed points and an unstable saddle, and the case of small noise is considered. On small timescales, the stochastic process fluctuates near one of the stable fixed points, and on large timescales, a metastable transition can occur, where fluctuations drive the system past the unstable saddle to the other stable fixed point. To explore how different intrinsic noise sources affect these transitions, fluctuations in protein production and degradation are eliminated, leaving fluctuations in the promotor state as the only source of noise in the system. The process without protein noise is then compared to the process with weak protein noise using perturbation methods and Monte Carlo simulations. It is found that some significant differences in the random process emerge when the intrinsic noise source is removed.
Individual Genetic Susceptibility
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eric J. Hall
2008-12-08
Risk estimates derived from epidemiological studies of exposed populations, as well as the maximum permissible doses allowed for occupational exposure and exposure of the public to ionizing radiation are all based on the assumption that the human population is uniform in its radiosensitivity, except for a small number of individuals, such as ATM homozygotes who are easily identified by their clinical symptoms. The hypothesis upon which this proposal is based is that the human population is not homogeneous in radiosensitiviry, but that radiosensitive sub-groups exist which are not easy to identify. These individuals would suffer an increased incidence of detrimentalmore » radiation effects, and distort the shape of the dose response relationship. The radiosensitivity of these groups depend on the expression levels of specific proteins. The plan was to investigate the effect of 3 relatively rare, high penetrate genes available in mice, namely Atm, mRad9 & Brca1. The purpose of radiation protection is to prevent! deterministic effects of clinical significance and limit stochastic effects to acceptable levels. We plan, therefore to compare with wild type animals the radiosensitivity of mice heterozygous for each of the genes mentioned above, as well as double heterozygotes for pairs of genes, using two biological endpoints: a) Ocular cataracts as an important and relevant deterministic effect, and b) Oncogenic transformation in cultured embryo fibroblasts, as a surrogate for carcinogenesis, the most relevant stochastic effect.« less
Effects of intrinsic stochasticity on delayed reaction-diffusion patterning systems.
Woolley, Thomas E; Baker, Ruth E; Gaffney, Eamonn A; Maini, Philip K; Seirin-Lee, Sungrim
2012-05-01
Cellular gene expression is a complex process involving many steps, including the transcription of DNA and translation of mRNA; hence the synthesis of proteins requires a considerable amount of time, from ten minutes to several hours. Since diffusion-driven instability has been observed to be sensitive to perturbations in kinetic delays, the application of Turing patterning mechanisms to the problem of producing spatially heterogeneous differential gene expression has been questioned. In deterministic systems a small delay in the reactions can cause a large increase in the time it takes a system to pattern. Recently, it has been observed that in undelayed systems intrinsic stochasticity can cause pattern initiation to occur earlier than in the analogous deterministic simulations. Here we are interested in adding both stochasticity and delays to Turing systems in order to assess whether stochasticity can reduce the patterning time scale in delayed Turing systems. As analytical insights to this problem are difficult to attain and often limited in their use, we focus on stochastically simulating delayed systems. We consider four different Turing systems and two different forms of delay. Our results are mixed and lead to the conclusion that, although the sensitivity to delays in the Turing mechanism is not completely removed by the addition of intrinsic noise, the effects of the delays are clearly ameliorated in certain specific cases.
Protein Interaction Profile Sequencing (PIP-seq).
Foley, Shawn W; Gregory, Brian D
2016-10-10
Every eukaryotic RNA transcript undergoes extensive post-transcriptional processing from the moment of transcription up through degradation. This regulation is performed by a distinct cohort of RNA-binding proteins which recognize their target transcript by both its primary sequence and secondary structure. Here, we describe protein interaction profile sequencing (PIP-seq), a technique that uses ribonuclease-based footprinting followed by high-throughput sequencing to globally assess both protein-bound RNA sequences and RNA secondary structure. PIP-seq utilizes single- and double-stranded RNA-specific nucleases in the absence of proteins to infer RNA secondary structure. These libraries are also compared to samples that undergo nuclease digestion in the presence of proteins in order to find enriched protein-bound sequences. Combined, these four libraries provide a comprehensive, transcriptome-wide view of RNA secondary structure and RNA protein interaction sites from a single experimental technique. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
Predicting cancer-relevant proteins using an improved molecular similarity ensemble approach.
Zhou, Bin; Sun, Qi; Kong, De-Xin
2016-05-31
In this study, we proposed an improved algorithm for identifying proteins relevant to cancer. The algorithm was named two-layer molecular similarity ensemble approach (TL-SEA). We applied TL-SEA to analyzing the correlation between anticancer compounds (against cell lines K562, MCF7 and A549) and active compounds against separate target proteins listed in BindingDB. Several associations between cancer types and related proteins were revealed using this chemoinformatics approach. An analysis of the literature showed that 26 of 35 predicted proteins were correlated with cancer cell proliferation, apoptosis or differentiation. Additionally, interactions between proteins in BindingDB and anticancer chemicals were also predicted. We discuss the roles of the most important predicted proteins in cancer biology and conclude that TL-SEA could be a useful tool for inferring novel proteins involved in cancer and revealing underlying molecular mechanisms.
Western Blotting of the Endocannabinoid System.
Wager-Miller, Jim; Mackie, Ken
2016-01-01
Measuring expression levels of G protein-coupled receptors (GPCRs) is an important step for understanding the distribution, function, and regulation of these receptors. A common approach for detecting proteins from complex biological systems is Western blotting. In this chapter, we describe a general approach to Western blotting protein components of the endocannabinoid system using sodium dodecyl sulfate-polyacrylamide gel electrophoresis and nitrocellulose membranes, with a focus on detecting type 1 cannabinoid (CB1) receptors. When this technique is carefully used, specifically with validation of the primary antibodies, it can provide quantitative information on protein expression levels. Additional information can also be inferred from Western blotting such as potential posttranslational modifications that can be further evaluated by specific analytical techniques.
Cognitive Diagnostic Analysis Using Hierarchically Structured Skills
ERIC Educational Resources Information Center
Su, Yu-Lan
2013-01-01
This dissertation proposes two modified cognitive diagnostic models (CDMs), the deterministic, inputs, noisy, "and" gate with hierarchy (DINA-H) model and the deterministic, inputs, noisy, "or" gate with hierarchy (DINO-H) model. Both models incorporate the hierarchical structures of the cognitive skills in the model estimation…
Deterministic Mean-Field Ensemble Kalman Filtering
Law, Kody J. H.; Tembine, Hamidou; Tempone, Raul
2016-05-03
The proof of convergence of the standard ensemble Kalman filter (EnKF) from Le Gland, Monbet, and Tran [Large sample asymptotics for the ensemble Kalman filter, in The Oxford Handbook of Nonlinear Filtering, Oxford University Press, Oxford, UK, 2011, pp. 598--631] is extended to non-Gaussian state-space models. In this paper, a density-based deterministic approximation of the mean-field limit EnKF (DMFEnKF) is proposed, consisting of a PDE solver and a quadrature rule. Given a certain minimal order of convergence κ between the two, this extends to the deterministic filter approximation, which is therefore asymptotically superior to standard EnKF for dimension d
Active temporal multiplexing of indistinguishable heralded single photons
Xiong, C.; Zhang, X.; Liu, Z.; Collins, M. J.; Mahendra, A.; Helt, L. G.; Steel, M. J.; Choi, D. -Y.; Chae, C. J.; Leong, P. H. W.; Eggleton, B. J.
2016-01-01
It is a fundamental challenge in quantum optics to deterministically generate indistinguishable single photons through non-deterministic nonlinear optical processes, due to the intrinsic coupling of single- and multi-photon-generation probabilities in these processes. Actively multiplexing photons generated in many temporal modes can decouple these probabilities, but key issues are to minimize resource requirements to allow scalability, and to ensure indistinguishability of the generated photons. Here we demonstrate the multiplexing of photons from four temporal modes solely using fibre-integrated optics and off-the-shelf electronic components. We show a 100% enhancement to the single-photon output probability without introducing additional multi-photon noise. Photon indistinguishability is confirmed by a fourfold Hong–Ou–Mandel quantum interference with a 91±16% visibility after subtracting multi-photon noise due to high pump power. Our demonstration paves the way for scalable multiplexing of many non-deterministic photon sources to a single near-deterministic source, which will be of benefit to future quantum photonic technologies. PMID:26996317
Frisenda, Riccardo; Navarro-Moratalla, Efrén; Gant, Patricia; Pérez De Lara, David; Jarillo-Herrero, Pablo; Gorbachev, Roman V; Castellanos-Gomez, Andres
2018-01-02
Designer heterostructures can now be assembled layer-by-layer with unmatched precision thanks to the recently developed deterministic placement methods to transfer two-dimensional (2D) materials. This possibility constitutes the birth of a very active research field on the so-called van der Waals heterostructures. Moreover, these deterministic placement methods also open the door to fabricate complex devices, which would be otherwise very difficult to achieve by conventional bottom-up nanofabrication approaches, and to fabricate fully-encapsulated devices with exquisite electronic properties. The integration of 2D materials with existing technologies such as photonic and superconducting waveguides and fiber optics is another exciting possibility. Here, we review the state-of-the-art of the deterministic placement methods, describing and comparing the different alternative methods available in the literature, and we illustrate their potential to fabricate van der Waals heterostructures, to integrate 2D materials into complex devices and to fabricate artificial bilayer structures where the layers present a user-defined rotational twisting angle.
First-order reliability application and verification methods for semistatic structures
NASA Astrophysics Data System (ADS)
Verderaime, V.
1994-11-01
Escalating risks of aerostructures stimulated by increasing size, complexity, and cost should no longer be ignored in conventional deterministic safety design methods. The deterministic pass-fail concept is incompatible with probability and risk assessments; stress audits are shown to be arbitrary and incomplete, and the concept compromises the performance of high-strength materials. A reliability method is proposed that combines first-order reliability principles with deterministic design variables and conventional test techniques to surmount current deterministic stress design and audit deficiencies. Accumulative and propagation design uncertainty errors are defined and appropriately implemented into the classical safety-index expression. The application is reduced to solving for a design factor that satisfies the specified reliability and compensates for uncertainty errors, and then using this design factor as, and instead of, the conventional safety factor in stress analyses. The resulting method is consistent with current analytical skills and verification practices, the culture of most designers, and the development of semistatic structural designs.
Yin, Shen; Gao, Huijun; Qiu, Jianbin; Kaynak, Okyay
2017-11-01
Data-driven fault detection plays an important role in industrial systems due to its applicability in case of unknown physical models. In fault detection, disturbances must be taken into account as an inherent characteristic of processes. Nevertheless, fault detection for nonlinear processes with deterministic disturbances still receive little attention, especially in data-driven field. To solve this problem, a just-in-time learning-based data-driven (JITL-DD) fault detection method for nonlinear processes with deterministic disturbances is proposed in this paper. JITL-DD employs JITL scheme for process description with local model structures to cope with processes dynamics and nonlinearity. The proposed method provides a data-driven fault detection solution for nonlinear processes with deterministic disturbances, and owns inherent online adaptation and high accuracy of fault detection. Two nonlinear systems, i.e., a numerical example and a sewage treatment process benchmark, are employed to show the effectiveness of the proposed method.
Deterministic Mean-Field Ensemble Kalman Filtering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Law, Kody J. H.; Tembine, Hamidou; Tempone, Raul
The proof of convergence of the standard ensemble Kalman filter (EnKF) from Le Gland, Monbet, and Tran [Large sample asymptotics for the ensemble Kalman filter, in The Oxford Handbook of Nonlinear Filtering, Oxford University Press, Oxford, UK, 2011, pp. 598--631] is extended to non-Gaussian state-space models. In this paper, a density-based deterministic approximation of the mean-field limit EnKF (DMFEnKF) is proposed, consisting of a PDE solver and a quadrature rule. Given a certain minimal order of convergence κ between the two, this extends to the deterministic filter approximation, which is therefore asymptotically superior to standard EnKF for dimension d
NASA Astrophysics Data System (ADS)
Wang, Fengyu
Traditional deterministic reserve requirements rely on ad-hoc, rule of thumb methods to determine adequate reserve in order to ensure a reliable unit commitment. Since congestion and uncertainties exist in the system, both the quantity and the location of reserves are essential to ensure system reliability and market efficiency. The modeling of operating reserves in the existing deterministic reserve requirements acquire the operating reserves on a zonal basis and do not fully capture the impact of congestion. The purpose of a reserve zone is to ensure that operating reserves are spread across the network. Operating reserves are shared inside each reserve zone, but intra-zonal congestion may block the deliverability of operating reserves within a zone. Thus, improving reserve policies such as reserve zones may improve the location and deliverability of reserve. As more non-dispatchable renewable resources are integrated into the grid, it will become increasingly difficult to predict the transfer capabilities and the network congestion. At the same time, renewable resources require operators to acquire more operating reserves. With existing deterministic reserve requirements unable to ensure optimal reserve locations, the importance of reserve location and reserve deliverability will increase. While stochastic programming can be used to determine reserve by explicitly modelling uncertainties, there are still scalability as well as pricing issues. Therefore, new methods to improve existing deterministic reserve requirements are desired. One key barrier of improving existing deterministic reserve requirements is its potential market impacts. A metric, quality of service, is proposed in this thesis to evaluate the price signal and market impacts of proposed hourly reserve zones. Three main goals of this thesis are: 1) to develop a theoretical and mathematical model to better locate reserve while maintaining the deterministic unit commitment and economic dispatch structure, especially with the consideration of renewables, 2) to develop a market settlement scheme of proposed dynamic reserve policies such that the market efficiency is improved, 3) to evaluate the market impacts and price signal of the proposed dynamic reserve policies.
Parameter Estimation in Epidemiology: from Simple to Complex Dynamics
NASA Astrophysics Data System (ADS)
Aguiar, Maíra; Ballesteros, Sebastién; Boto, João Pedro; Kooi, Bob W.; Mateus, Luís; Stollenwerk, Nico
2011-09-01
We revisit the parameter estimation framework for population biological dynamical systems, and apply it to calibrate various models in epidemiology with empirical time series, namely influenza and dengue fever. When it comes to more complex models like multi-strain dynamics to describe the virus-host interaction in dengue fever, even most recently developed parameter estimation techniques, like maximum likelihood iterated filtering, come to their computational limits. However, the first results of parameter estimation with data on dengue fever from Thailand indicate a subtle interplay between stochasticity and deterministic skeleton. The deterministic system on its own already displays complex dynamics up to deterministic chaos and coexistence of multiple attractors.
Inherent Conservatism in Deterministic Quasi-Static Structural Analysis
NASA Technical Reports Server (NTRS)
Verderaime, V.
1997-01-01
The cause of the long-suspected excessive conservatism in the prevailing structural deterministic safety factor has been identified as an inherent violation of the error propagation laws when reducing statistical data to deterministic values and then combining them algebraically through successive structural computational processes. These errors are restricted to the applied stress computations, and because mean and variations of the tolerance limit format are added, the errors are positive, serially cumulative, and excessively conservative. Reliability methods circumvent these errors and provide more efficient and uniform safe structures. The document is a tutorial on the deficiencies and nature of the current safety factor and of its improvement and transition to absolute reliability.
Statistical Inference for Big Data Problems in Molecular Biophysics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ramanathan, Arvind; Savol, Andrej; Burger, Virginia
2012-01-01
We highlight the role of statistical inference techniques in providing biological insights from analyzing long time-scale molecular simulation data. Technologi- cal and algorithmic improvements in computation have brought molecular simu- lations to the forefront of techniques applied to investigating the basis of living systems. While these longer simulations, increasingly complex reaching petabyte scales presently, promise a detailed view into microscopic behavior, teasing out the important information has now become a true challenge on its own. Mining this data for important patterns is critical to automating therapeutic intervention discovery, improving protein design, and fundamentally understanding the mech- anistic basis of cellularmore » homeostasis.« less
MIRA: An R package for DNA methylation-based inference of regulatory activity.
Lawson, John T; Tomazou, Eleni M; Bock, Christoph; Sheffield, Nathan C
2018-03-01
DNA methylation contains information about the regulatory state of the cell. MIRA aggregates genome-scale DNA methylation data into a DNA methylation profile for independent region sets with shared biological annotation. Using this profile, MIRA infers and scores the collective regulatory activity for each region set. MIRA facilitates regulatory analysis in situations where classical regulatory assays would be difficult and allows public sources of open chromatin and protein binding regions to be leveraged for novel insight into the regulatory state of DNA methylation datasets. R package available on Bioconductor: http://bioconductor.org/packages/release/bioc/html/MIRA.html. nsheffield@virginia.edu.
Sampled-Data Consensus of Linear Multi-agent Systems With Packet Losses.
Zhang, Wenbing; Tang, Yang; Huang, Tingwen; Kurths, Jurgen
In this paper, the consensus problem is studied for a class of multi-agent systems with sampled data and packet losses, where random and deterministic packet losses are considered, respectively. For random packet losses, a Bernoulli-distributed white sequence is used to describe packet dropouts among agents in a stochastic way. For deterministic packet losses, a switched system with stable and unstable subsystems is employed to model packet dropouts in a deterministic way. The purpose of this paper is to derive consensus criteria, such that linear multi-agent systems with sampled-data and packet losses can reach consensus. By means of the Lyapunov function approach and the decomposition method, the design problem of a distributed controller is solved in terms of convex optimization. The interplay among the allowable bound of the sampling interval, the probability of random packet losses, and the rate of deterministic packet losses are explicitly derived to characterize consensus conditions. The obtained criteria are closely related to the maximum eigenvalue of the Laplacian matrix versus the second minimum eigenvalue of the Laplacian matrix, which reveals the intrinsic effect of communication topologies on consensus performance. Finally, simulations are given to show the effectiveness of the proposed results.In this paper, the consensus problem is studied for a class of multi-agent systems with sampled data and packet losses, where random and deterministic packet losses are considered, respectively. For random packet losses, a Bernoulli-distributed white sequence is used to describe packet dropouts among agents in a stochastic way. For deterministic packet losses, a switched system with stable and unstable subsystems is employed to model packet dropouts in a deterministic way. The purpose of this paper is to derive consensus criteria, such that linear multi-agent systems with sampled-data and packet losses can reach consensus. By means of the Lyapunov function approach and the decomposition method, the design problem of a distributed controller is solved in terms of convex optimization. The interplay among the allowable bound of the sampling interval, the probability of random packet losses, and the rate of deterministic packet losses are explicitly derived to characterize consensus conditions. The obtained criteria are closely related to the maximum eigenvalue of the Laplacian matrix versus the second minimum eigenvalue of the Laplacian matrix, which reveals the intrinsic effect of communication topologies on consensus performance. Finally, simulations are given to show the effectiveness of the proposed results.
Comparison of space radiation calculations for deterministic and Monte Carlo transport codes
NASA Astrophysics Data System (ADS)
Lin, Zi-Wei; Adams, James; Barghouty, Abdulnasser; Randeniya, Sharmalee; Tripathi, Ram; Watts, John; Yepes, Pablo
For space radiation protection of astronauts or electronic equipments, it is necessary to develop and use accurate radiation transport codes. Radiation transport codes include deterministic codes, such as HZETRN from NASA and UPROP from the Naval Research Laboratory, and Monte Carlo codes such as FLUKA, the Geant4 toolkit and HETC-HEDS. The deterministic codes and Monte Carlo codes complement each other in that deterministic codes are very fast while Monte Carlo codes are more elaborate. Therefore it is important to investigate how well the results of deterministic codes compare with those of Monte Carlo transport codes and where they differ. In this study we evaluate these different codes in their space radiation applications by comparing their output results in the same given space radiation environments, shielding geometry and material. Typical space radiation environments such as the 1977 solar minimum galactic cosmic ray environment are used as the well-defined input, and simple geometries made of aluminum, water and/or polyethylene are used to represent the shielding material. We then compare various outputs of these codes, such as the dose-depth curves and the flux spectra of different fragments and other secondary particles. These comparisons enable us to learn more about the main differences between these space radiation transport codes. At the same time, they help us to learn the qualitative and quantitative features that these transport codes have in common.
NASA Astrophysics Data System (ADS)
Delvecchio, S.; Antoni, J.
2012-02-01
This paper addresses the use of a cyclostationary blind source separation algorithm (namely RRCR) to extract angle deterministic signals from mechanical rotating machines in presence of stationary speed fluctuations. This means that only phase fluctuations while machine is running in steady-state conditions are considered while run-up or run-down speed variations are not taken into account. The machine is also supposed to run in idle conditions so non-stationary phenomena due to the load are not considered. It is theoretically assessed that in such operating conditions the deterministic (periodic) signal in the angle domain becomes cyclostationary at first and second orders in the time domain. This fact justifies the use of the RRCR algorithm, which is able to directly extract the angle deterministic signal from the time domain without performing any kind of interpolation. This is particularly valuable when angular resampling fails because of uncontrolled speed fluctuations. The capability of the proposed approach is verified by means of simulated and actual vibration signals captured on a pneumatic screwdriver handle. In this particular case not only the extraction of the angle deterministic part can be performed but also the separation of the main sources of excitation (i.e. motor shaft imbalance, epyciloidal gear meshing and air pressure forces) affecting the user hand during operations.
Tag-mediated cooperation with non-deterministic genotype-phenotype mapping
NASA Astrophysics Data System (ADS)
Zhang, Hong; Chen, Shu
2016-01-01
Tag-mediated cooperation provides a helpful framework for resolving evolutionary social dilemmas. However, most of the previous studies have not taken into account genotype-phenotype distinction in tags, which may play an important role in the process of evolution. To take this into consideration, we introduce non-deterministic genotype-phenotype mapping into a tag-based model with spatial prisoner's dilemma. By our definition, the similarity between genotypic tags does not directly imply the similarity between phenotypic tags. We find that the non-deterministic mapping from genotypic tag to phenotypic tag has non-trivial effects on tag-mediated cooperation. Although we observe that high levels of cooperation can be established under a wide variety of conditions especially when the decisiveness is moderate, the uncertainty in the determination of phenotypic tags may have a detrimental effect on the tag mechanism by disturbing the homophilic interaction structure which can explain the promotion of cooperation in tag systems. Furthermore, the non-deterministic mapping may undermine the robustness of the tag mechanism with respect to various factors such as the structure of the tag space and the tag flexibility. This observation warns us about the danger of applying the classical tag-based models to the analysis of empirical phenomena if genotype-phenotype distinction is significant in real world. Non-deterministic genotype-phenotype mapping thus provides a new perspective to the understanding of tag-mediated cooperation.
Protein Structure and Function Prediction Using I-TASSER
Yang, Jianyi; Zhang, Yang
2016-01-01
I-TASSER is a hierarchical protocol for automated protein structure prediction and structure-based function annotation. Starting from the amino acid sequence of target proteins, I-TASSER first generates full-length atomic structural models from multiple threading alignments and iterative structural assembly simulations followed by atomic-level structure refinement. The biological functions of the protein, including ligand-binding sites, enzyme commission number, and gene ontology terms, are then inferred from known protein function databases based on sequence and structure profile comparisons. I-TASSER is freely available as both an on-line server and a stand-alone package. This unit describes how to use the I-TASSER protocol to generate structure and function prediction and how to interpret the prediction results, as well as alternative approaches for further improving the I-TASSER modeling quality for distant-homologous and multi-domain protein targets. PMID:26678386
Statistical inference in single molecule measurements of protein adsorption
NASA Astrophysics Data System (ADS)
Armstrong, Megan J.; Tsitkov, Stanislav; Hess, Henry
2018-02-01
Significant effort has been invested into understanding the dynamics of protein adsorption on surfaces, in particular to predict protein behavior at the specialized surfaces of biomedical technologies like hydrogels, nanoparticles, and biosensors. Recently, the application of fluorescent single molecule imaging to this field has permitted the tracking of individual proteins and their stochastic contribution to the aggregate dynamics of adsorption. However, the interpretation of these results is complicated by (1) the finite time available to observe effectively infinite adsorption timescales and (2) the contribution of photobleaching kinetics to adsorption kinetics. Here, we perform a protein adsorption simulation to introduce specific survival analysis methods that overcome the first complication. Additionally, we collect single molecule residence time data from the adsorption of fibrinogen to glass and use survival analysis to distinguish photobleaching kinetics from protein adsorption kinetics.
Protein-protein interactions (PPIs) mediate the transmission and regulation of oncogenic signals that are essential to cellular proliferation and survival, and thus represent potential targets for anti-cancer therapeutic discovery. Despite their significance, there is no method to experimentally disrupt and interrogate the essentiality of individual endogenous PPIs. The ability to computationally predict or infer PPI essentiality would help prioritize PPIs for drug discovery and help advance understanding of cancer biology.
Interactions in Micellar Solutions of β-Casein
NASA Astrophysics Data System (ADS)
Leclerc, E.; Calmettes, P.
1997-01-01
β-casein is a flexible amphiphilic milk protein which forms spherical micelles in very dilute solution. The magnitude of the weight-average interactions between the solute particles has been inferred from small-angle neutron scattering experiments. At relatively high protein concentrations the interactions between micelles are repulsive, whatever the temperature. At lower concentration these interactions vanish and become more and more attractive when the critical micelle concentration is approached. Although indispensable for micelle formation, this fact seems to have not been previously reported.
Liu, Xuewu; Huang, Yuxiao; Liang, Jiao; Zhang, Shuai; Li, Yinghui; Wang, Jun; Shen, Yan; Xu, Zhikai; Zhao, Ya
2014-11-30
The invasion of red blood cells (RBCs) by malarial parasites is an essential step in the life cycle of Plasmodium falciparum. Human-parasite surface protein interactions play a critical role in this process. Although several interactions between human and parasite proteins have been discovered, the mechanism related to invasion remains poorly understood because numerous human-parasite protein interactions have not yet been identified. High-throughput screening experiments are not feasible for malarial parasites due to difficulty in expressing the parasite proteins. Here, we performed computational prediction of the PPIs involved in malaria parasite invasion to elucidate the mechanism by which invasion occurs. In this study, an expectation maximization algorithm was used to estimate the probabilities of domain-domain interactions (DDIs). Estimates of DDI probabilities were then used to infer PPI probabilities. We found that our prediction performance was better than that based on the information of D. melanogaster alone when information related to the six species was used. Prediction performance was assessed using protein interaction data from S. cerevisiae, indicating that the predicted results were reliable. We then used the estimates of DDI probabilities to infer interactions between 490 parasite and 3,787 human membrane proteins. A small-scale dataset was used to illustrate the usability of our method in predicting interactions between human and parasite proteins. The positive predictive value (PPV) was lower than that observed in S. cerevisiae. We integrated gene expression data to improve prediction accuracy and to reduce false positives. We identified 80 membrane proteins highly expressed in the schizont stage by fast Fourier transform method. Approximately 221 erythrocyte membrane proteins were identified using published mass spectral datasets. A network consisting of 205 interactions was predicted. Results of network analysis suggest that SNARE proteins of parasites and APP of humans may function in the invasion of RBCs by parasites. We predicted a small-scale PPI network that may be involved in parasite invasion of RBCs by integrating DDI information and expression profiles. Experimental studies should be conducted to validate the predicted interactions. The predicted PPIs help elucidate the mechanism of parasite invasion and provide directions for future experimental investigations.
Dual Coordination of Post Translational Modifications in Human Protein Networks
Woodsmith, Jonathan; Kamburov, Atanas; Stelzl, Ulrich
2013-01-01
Post-translational modifications (PTMs) regulate protein activity, stability and interaction profiles and are critical for cellular functioning. Further regulation is gained through PTM interplay whereby modifications modulate the occurrence of other PTMs or act in combination. Integration of global acetylation, ubiquitination and tyrosine or serine/threonine phosphorylation datasets with protein interaction data identified hundreds of protein complexes that selectively accumulate each PTM, indicating coordinated targeting of specific molecular functions. A second layer of PTM coordination exists in these complexes, mediated by PTM integration (PTMi) spots. PTMi spots represent very dense modification patterns in disordered protein regions and showed an equally high mutation rate as functional protein domains in cancer, inferring equivocal importance for cellular functioning. Systematic PTMi spot identification highlighted more than 300 candidate proteins for combinatorial PTM regulation. This study reveals two global PTM coordination mechanisms and emphasizes dataset integration as requisite in proteomic PTM studies to better predict modification impact on cellular signaling. PMID:23505349
Choosing an Optimal Database for Protein Identification from Tandem Mass Spectrometry Data.
Kumar, Dhirendra; Yadav, Amit Kumar; Dash, Debasis
2017-01-01
Database searching is the preferred method for protein identification from digital spectra of mass to charge ratios (m/z) detected for protein samples through mass spectrometers. The search database is one of the major influencing factors in discovering proteins present in the sample and thus in deriving biological conclusions. In most cases the choice of search database is arbitrary. Here we describe common search databases used in proteomic studies and their impact on final list of identified proteins. We also elaborate upon factors like composition and size of the search database that can influence the protein identification process. In conclusion, we suggest that choice of the database depends on the type of inferences to be derived from proteomics data. However, making additional efforts to build a compact and concise database for a targeted question should generally be rewarding in achieving confident protein identifications.
A Unit on Deterministic Chaos for Student Teachers
ERIC Educational Resources Information Center
Stavrou, D.; Assimopoulos, S.; Skordoulis, C.
2013-01-01
A unit aiming to introduce pre-service teachers of primary education to the limited predictability of deterministic chaotic systems is presented. The unit is based on a commercial chaotic pendulum system connected with a data acquisition interface. The capabilities and difficulties in understanding the notion of limited predictability of 18…
A Deterministic Annealing Approach to Clustering AIRS Data
NASA Technical Reports Server (NTRS)
Guillaume, Alexandre; Braverman, Amy; Ruzmaikin, Alexander
2012-01-01
We will examine the validity of means and standard deviations as a basis for climate data products. We will explore the conditions under which these two simple statistics are inadequate summaries of the underlying empirical probability distributions by contrasting them with a nonparametric, method called Deterministic Annealing technique
The Total Exposure Model (TEM) uses deterministic and stochastic methods to estimate the exposure of a person performing daily activities of eating, drinking, showering, and bathing. There were 250 time histories generated, by subject with activities, for the three exposure ro...
Integrability and Chaos: The Classical Uncertainty
ERIC Educational Resources Information Center
Masoliver, Jaume; Ros, Ana
2011-01-01
In recent years there has been a considerable increase in the publishing of textbooks and monographs covering what was formerly known as random or irregular deterministic motion, now referred to as deterministic chaos. There is still substantial interest in a matter that is included in many graduate and even undergraduate courses on classical…
The development of the deterministic nonlinear PDEs in particle physics to stochastic case
NASA Astrophysics Data System (ADS)
Abdelrahman, Mahmoud A. E.; Sohaly, M. A.
2018-06-01
In the present work, accuracy method called, Riccati-Bernoulli Sub-ODE technique is used for solving the deterministic and stochastic case of the Phi-4 equation and the nonlinear Foam Drainage equation. Also, the control on the randomness input is studied for stability stochastic process solution.
Contemporary Genetics for Gender Researchers: Not Your Grandma's Genetics Anymore
ERIC Educational Resources Information Center
Salk, Rachel H.; Hyde, Janet S.
2012-01-01
Over the past century, much of genetics was deterministic, and feminist researchers framed justified criticisms of genetics research. However, over the past two decades, genetics research has evolved remarkably and has moved far from earlier deterministic approaches. Our article provides a brief primer on modern genetics, emphasizing contemporary…
ERIC Educational Resources Information Center
Rambe, Patient; Nel, Liezel
2015-01-01
The discourse of social media adoption in higher education has often been funnelled through utopian and dystopian perspectives, which are polarised but determinist theorisations of human engagement with educational technologies. Consequently, these determinist approaches have obscured a broadened grasp of the situated, socially constructed nature…
In silico methods for design of biological therapeutics.
Roy, Ankit; Nair, Sanjana; Sen, Neeladri; Soni, Neelesh; Madhusudhan, M S
2017-12-01
It has been twenty years since the first rationally designed small molecule drug was introduced into the market. Since then, we have progressed from designing small molecules to designing biotherapeutics. This class of therapeutics includes designed proteins, peptides and nucleic acids that could more effectively combat drug resistance and even act in cases where the disease is caused because of a molecular deficiency. Computational methods are crucial in this design exercise and this review discusses the various elements of designing biotherapeutic proteins and peptides. Many of the techniques discussed here, such as the deterministic and stochastic design methods, are generally used in protein design. We have devoted special attention to the design of antibodies and vaccines. In addition to the methods for designing these molecules, we have included a comprehensive list of all biotherapeutics approved for clinical use. Also included is an overview of methods that predict the binding affinity, cell penetration ability, half-life, solubility, immunogenicity and toxicity of the designed therapeutics. Biotherapeutics are only going to grow in clinical importance and are set to herald a new generation of disease management and cure. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Adiabatic reduction of a model of stochastic gene expression with jump Markov process.
Yvinec, Romain; Zhuge, Changjing; Lei, Jinzhi; Mackey, Michael C
2014-04-01
This paper considers adiabatic reduction in a model of stochastic gene expression with bursting transcription considered as a jump Markov process. In this model, the process of gene expression with auto-regulation is described by fast/slow dynamics. The production of mRNA is assumed to follow a compound Poisson process occurring at a rate depending on protein levels (the phenomena called bursting in molecular biology) and the production of protein is a linear function of mRNA numbers. When the dynamics of mRNA is assumed to be a fast process (due to faster mRNA degradation than that of protein) we prove that, with appropriate scalings in the burst rate, jump size or translational rate, the bursting phenomena can be transmitted to the slow variable. We show that, depending on the scaling, the reduced equation is either a stochastic differential equation with a jump Poisson process or a deterministic ordinary differential equation. These results are significant because adiabatic reduction techniques seem to have not been rigorously justified for a stochastic differential system containing a jump Markov process. We expect that the results can be generalized to adiabatic methods in more general stochastic hybrid systems.
Algorithm, applications and evaluation for protein comparison by Ramanujan Fourier transform.
Zhao, Jian; Wang, Jiasong; Hua, Wei; Ouyang, Pingkai
2015-12-01
The amino acid sequence of a protein determines its chemical properties, chain conformation and biological functions. Protein sequence comparison is of great importance to identify similarities of protein structures and infer their functions. Many properties of a protein correspond to the low-frequency signals within the sequence. Low frequency modes in protein sequences are linked to the secondary structures, membrane protein types, and sub-cellular localizations of the proteins. In this paper, we present Ramanujan Fourier transform (RFT) with a fast algorithm to analyze the low-frequency signals of protein sequences. The RFT method is applied to similarity analysis of protein sequences with the Resonant Recognition Model (RRM). The results show that the proposed fast RFT method on protein comparison is more efficient than commonly used discrete Fourier transform (DFT). RFT can detect common frequencies as significant feature for specific protein families, and the RFT spectrum heat-map of protein sequences demonstrates the information conservation in the sequence comparison. The proposed method offers a new tool for pattern recognition, feature extraction and structural analysis on protein sequences. Copyright © 2015 Elsevier Ltd. All rights reserved.
Integrated Module and Gene-Specific Regulatory Inference Implicates Upstream Signaling Networks
Roy, Sushmita; Lagree, Stephen; Hou, Zhonggang; Thomson, James A.; Stewart, Ron; Gasch, Audrey P.
2013-01-01
Regulatory networks that control gene expression are important in diverse biological contexts including stress response and development. Each gene's regulatory program is determined by module-level regulation (e.g. co-regulation via the same signaling system), as well as gene-specific determinants that can fine-tune expression. We present a novel approach, Modular regulatory network learning with per gene information (MERLIN), that infers regulatory programs for individual genes while probabilistically constraining these programs to reveal module-level organization of regulatory networks. Using edge-, regulator- and module-based comparisons of simulated networks of known ground truth, we find MERLIN reconstructs regulatory programs of individual genes as well or better than existing approaches of network reconstruction, while additionally identifying modular organization of the regulatory networks. We use MERLIN to dissect global transcriptional behavior in two biological contexts: yeast stress response and human embryonic stem cell differentiation. Regulatory modules inferred by MERLIN capture co-regulatory relationships between signaling proteins and downstream transcription factors thereby revealing the upstream signaling systems controlling transcriptional responses. The inferred networks are enriched for regulators with genetic or physical interactions, supporting the inference, and identify modules of functionally related genes bound by the same transcriptional regulators. Our method combines the strengths of per-gene and per-module methods to reveal new insights into transcriptional regulation in stress and development. PMID:24146602
Learning Quantitative Sequence-Function Relationships from Massively Parallel Experiments
NASA Astrophysics Data System (ADS)
Atwal, Gurinder S.; Kinney, Justin B.
2016-03-01
A fundamental aspect of biological information processing is the ubiquity of sequence-function relationships—functions that map the sequence of DNA, RNA, or protein to a biochemically relevant activity. Most sequence-function relationships in biology are quantitative, but only recently have experimental techniques for effectively measuring these relationships been developed. The advent of such "massively parallel" experiments presents an exciting opportunity for the concepts and methods of statistical physics to inform the study of biological systems. After reviewing these recent experimental advances, we focus on the problem of how to infer parametric models of sequence-function relationships from the data produced by these experiments. Specifically, we retrace and extend recent theoretical work showing that inference based on mutual information, not the standard likelihood-based approach, is often necessary for accurately learning the parameters of these models. Closely connected with this result is the emergence of "diffeomorphic modes"—directions in parameter space that are far less constrained by data than likelihood-based inference would suggest. Analogous to Goldstone modes in physics, diffeomorphic modes arise from an arbitrarily broken symmetry of the inference problem. An analytically tractable model of a massively parallel experiment is then described, providing an explicit demonstration of these fundamental aspects of statistical inference. This paper concludes with an outlook on the theoretical and computational challenges currently facing studies of quantitative sequence-function relationships.
Spatial capture–recapture with partial identity: An application to camera traps
Augustine, Ben C.; Royle, J. Andrew; Kelly, Marcella J.; Satter, Christopher B.; Alonso, Robert S.; Boydston, Erin E.; Crooks, Kevin R.
2018-01-01
Camera trapping surveys frequently capture individuals whose identity is only known from a single flank. The most widely used methods for incorporating these partial identity individuals into density analyses discard some of the partial identity capture histories, reducing precision, and, while not previously recognized, introducing bias. Here, we present the spatial partial identity model (SPIM), which uses the spatial location where partial identity samples are captured to probabilistically resolve their complete identities, allowing all partial identity samples to be used in the analysis. We show that the SPIM outperforms other analytical alternatives. We then apply the SPIM to an ocelot data set collected on a trapping array with double-camera stations and a bobcat data set collected on a trapping array with single-camera stations. The SPIM improves inference in both cases and, in the ocelot example, individual sex is determined from photographs used to further resolve partial identities—one of which is resolved to near certainty. The SPIM opens the door for the investigation of trapping designs that deviate from the standard two camera design, the combination of other data types between which identities cannot be deterministically linked, and can be extended to the problem of partial genotypes.
Efficient computation of optimal actions.
Todorov, Emanuel
2009-07-14
Optimal choice of actions is a fundamental problem relevant to fields as diverse as neuroscience, psychology, economics, computer science, and control engineering. Despite this broad relevance the abstract setting is similar: we have an agent choosing actions over time, an uncertain dynamical system whose state is affected by those actions, and a performance criterion that the agent seeks to optimize. Solving problems of this kind remains hard, in part, because of overly generic formulations. Here, we propose a more structured formulation that greatly simplifies the construction of optimal control laws in both discrete and continuous domains. An exhaustive search over actions is avoided and the problem becomes linear. This yields algorithms that outperform Dynamic Programming and Reinforcement Learning, and thereby solve traditional problems more efficiently. Our framework also enables computations that were not possible before: composing optimal control laws by mixing primitives, applying deterministic methods to stochastic systems, quantifying the benefits of error tolerance, and inferring goals from behavioral data via convex optimization. Development of a general class of easily solvable problems tends to accelerate progress--as linear systems theory has done, for example. Our framework may have similar impact in fields where optimal choice of actions is relevant.
Probabilistic graphs as a conceptual and computational tool in hydrology and water management
NASA Astrophysics Data System (ADS)
Schoups, Gerrit
2014-05-01
Originally developed in the fields of machine learning and artificial intelligence, probabilistic graphs constitute a general framework for modeling complex systems in the presence of uncertainty. The framework consists of three components: 1. Representation of the model as a graph (or network), with nodes depicting random variables in the model (e.g. parameters, states, etc), which are joined together by factors. Factors are local probabilistic or deterministic relations between subsets of variables, which, when multiplied together, yield the joint distribution over all variables. 2. Consistent use of probability theory for quantifying uncertainty, relying on basic rules of probability for assimilating data into the model and expressing unknown variables as a function of observations (via the posterior distribution). 3. Efficient, distributed approximation of the posterior distribution using general-purpose algorithms that exploit model structure encoded in the graph. These attributes make probabilistic graphs potentially useful as a conceptual and computational tool in hydrology and water management (and beyond). Conceptually, they can provide a common framework for existing and new probabilistic modeling approaches (e.g. by drawing inspiration from other fields of application), while computationally they can make probabilistic inference feasible in larger hydrological models. The presentation explores, via examples, some of these benefits.
Opportunities for Fluid Dynamics Research in the Forensic Discipline of Bloodstain Pattern Analysis
NASA Astrophysics Data System (ADS)
Attinger, Daniel; Moore, Craig; Donaldson, Adam; Jafari, Arian; Stone, Howard
2013-11-01
This review [Forensic Science International, vol. 231, pp. 375-396, 2013] highlights research opportunities for fluid dynamics (FD) studies related to the forensic discipline of bloodstain pattern analysis (BPA). The need for better integrating FD and BPA is mentioned in a 2009 report by the US National Research Council, entitled ``Strengthening Forensic Science in the United States: A Path Forward''. BPA aims for practical answers to specific questions of the kind: ``How did a bloodletting incident happen?'' FD, on the other hand, aims to quantitatively describe the transport of fluids and the related causes, with general equations. BPA typically solves the indirect problem of inspecting stains in a crime scene to infer the most probable bloodletting incident that produced these patterns. FD typically defines the initial and boundary conditions of a fluid system and from there describe how the system evolves in time and space, most often in a deterministic manner. We review four topics in BPA with strong connections to FD: the generation of drops, their flight, their impact and the formation of stains. Future research on these topics would deliver new quantitative tools and methods for BPA, and present new multiphase flow problems for FD.
Inferring extinction risks from sighting records.
Thompson, C J; Lee, T E; Stone, L; McCarthy, M A; Burgman, M A
2013-12-07
Estimating the probability that a species is extinct based on historical sighting records is important when deciding how much effort and money to invest in conservation policies. The framework we offer is more general than others in the literature to date. Our formulation allows for definite and uncertain observations, and thus better accommodates the realities of sighting record quality. Typically, the probability of observing a species given it is extant/extinct is challenging to define, especially when the possibility of a false observation is included. As such, we assume that observation probabilities derive from a representative probability density function. We incorporate this randomness in two different ways ("quenched" versus "annealed") using a framework that is equivalent to a Bayes formulation. The two methods can lead to significantly different estimates for extinction. In the case of definite sightings only, we provide an explicit deterministic calculation (in which observation probabilities are point estimates). Furthermore, our formulation replicates previous work in certain limiting cases. In the case of uncertain sightings, we allow for the possibility of several independent observational types (specimen, photographs, etc.). The method is applied to the Caribbean monk seal, Monachus tropicalis (which has only definite sightings), and synthetic data, with uncertain sightings. © 2013 Elsevier Ltd. All rights reserved.
Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference
Karcher, Michael D.; Palacios, Julia A.; Bedford, Trevor; Suchard, Marc A.; Minin, Vladimir N.
2016-01-01
Phylodynamics seeks to estimate effective population size fluctuations from molecular sequences of individuals sampled from a population of interest. One way to accomplish this task formulates an observed sequence data likelihood exploiting a coalescent model for the sampled individuals’ genealogy and then integrating over all possible genealogies via Monte Carlo or, less efficiently, by conditioning on one genealogy estimated from the sequence data. However, when analyzing sequences sampled serially through time, current methods implicitly assume either that sampling times are fixed deterministically by the data collection protocol or that their distribution does not depend on the size of the population. Through simulation, we first show that, when sampling times do probabilistically depend on effective population size, estimation methods may be systematically biased. To correct for this deficiency, we propose a new model that explicitly accounts for preferential sampling by modeling the sampling times as an inhomogeneous Poisson process dependent on effective population size. We demonstrate that in the presence of preferential sampling our new model not only reduces bias, but also improves estimation precision. Finally, we compare the performance of the currently used phylodynamic methods with our proposed model through clinically-relevant, seasonal human influenza examples. PMID:26938243
Modeling and mitigating natural hazards: Stationarity is immortal!
NASA Astrophysics Data System (ADS)
Montanari, Alberto; Koutsoyiannis, Demetris
2014-12-01
Environmental change is a reason of relevant concern as it is occurring at an unprecedented pace and might increase natural hazards. Moreover, it is deemed to imply a reduced representativity of past experience and data on extreme hydroclimatic events. The latter concern has been epitomized by the statement that "stationarity is dead." Setting up policies for mitigating natural hazards, including those triggered by floods and droughts, is an urgent priority in many countries, which implies practical activities of management, engineering design, and construction. These latter necessarily need to be properly informed, and therefore, the research question on the value of past data is extremely important. We herein argue that there are mechanisms in hydrological systems that are time invariant, which may need to be interpreted through data inference. In particular, hydrological predictions are based on assumptions which should include stationarity. In fact, any hydrological model, including deterministic and nonstationary approaches, is affected by uncertainty and therefore should include a random component that is stationary. Given that an unnecessary resort to nonstationarity may imply a reduction of predictive capabilities, a pragmatic approach, based on the exploitation of past experience and data is a necessary prerequisite for setting up mitigation policies for environmental risk.
Structural connectivity asymmetry in the neonatal brain.
Ratnarajah, Nagulan; Rifkin-Graboi, Anne; Fortier, Marielle V; Chong, Yap Seng; Kwek, Kenneth; Saw, Seang-Mei; Godfrey, Keith M; Gluckman, Peter D; Meaney, Michael J; Qiu, Anqi
2013-07-15
Asymmetry of the neonatal brain is not yet understood at the level of structural connectivity. We utilized DTI deterministic tractography and structural network analysis based on graph theory to determine the pattern of structural connectivity asymmetry in 124 normal neonates. We tracted white matter axonal pathways characterizing interregional connections among brain regions and inferred asymmetry in left and right anatomical network properties. Our findings revealed that in neonates, small-world characteristics were exhibited, but did not differ between the two hemispheres, suggesting that neighboring brain regions connect tightly with each other, and that one region is only a few paths away from any other region within each hemisphere. Moreover, the neonatal brain showed greater structural efficiency in the left hemisphere than that in the right. In neonates, brain regions involved in motor, language, and memory functions play crucial roles in efficient communication in the left hemisphere, while brain regions involved in emotional processes play crucial roles in efficient communication in the right hemisphere. These findings suggest that even at birth, the topology of each cerebral hemisphere is organized in an efficient and compact manner that maps onto asymmetric functional specializations seen in adults, implying lateralized brain functions in infancy. Copyright © 2013 Elsevier Inc. All rights reserved.
A Simple Label Switching Algorithm for Semisupervised Structural SVMs.
Balamurugan, P; Shevade, Shirish; Sundararajan, S
2015-10-01
In structured output learning, obtaining labeled data for real-world applications is usually costly, while unlabeled examples are available in abundance. Semisupervised structured classification deals with a small number of labeled examples and a large number of unlabeled structured data. In this work, we consider semisupervised structural support vector machines with domain constraints. The optimization problem, which in general is not convex, contains the loss terms associated with the labeled and unlabeled examples, along with the domain constraints. We propose a simple optimization approach that alternates between solving a supervised learning problem and a constraint matching problem. Solving the constraint matching problem is difficult for structured prediction, and we propose an efficient and effective label switching method to solve it. The alternating optimization is carried out within a deterministic annealing framework, which helps in effective constraint matching and avoiding poor local minima, which are not very useful. The algorithm is simple and easy to implement. Further, it is suitable for any structured output learning problem where exact inference is available. Experiments on benchmark sequence labeling data sets and a natural language parsing data set show that the proposed approach, though simple, achieves comparable generalization performance.
Fuzzy adaptive interacting multiple model nonlinear filter for integrated navigation sensor fusion.
Tseng, Chien-Hao; Chang, Chih-Wen; Jwo, Dah-Jing
2011-01-01
In this paper, the application of the fuzzy interacting multiple model unscented Kalman filter (FUZZY-IMMUKF) approach to integrated navigation processing for the maneuvering vehicle is presented. The unscented Kalman filter (UKF) employs a set of sigma points through deterministic sampling, such that a linearization process is not necessary, and therefore the errors caused by linearization as in the traditional extended Kalman filter (EKF) can be avoided. The nonlinear filters naturally suffer, to some extent, the same problem as the EKF for which the uncertainty of the process noise and measurement noise will degrade the performance. As a structural adaptation (model switching) mechanism, the interacting multiple model (IMM), which describes a set of switching models, can be utilized for determining the adequate value of process noise covariance. The fuzzy logic adaptive system (FLAS) is employed to determine the lower and upper bounds of the system noise through the fuzzy inference system (FIS). The resulting sensor fusion strategy can efficiently deal with the nonlinear problem for the vehicle navigation. The proposed FUZZY-IMMUKF algorithm shows remarkable improvement in the navigation estimation accuracy as compared to the relatively conventional approaches such as the UKF and IMMUKF.
NASA Astrophysics Data System (ADS)
García-Fornaris, I.; Millán, H.; Jardim, R. F.; Govea-Alcaide, E.
2013-06-01
We investigated the transport Barkhausen-like noise (TBN) by using nonlinear time series analysis. TBN signals were measured in (Bi,Pb)2Sr2Ca2Cu3O10+δ ceramic samples subjected to different uniaxial compacting pressures (UCP). These samples display similar intragranular properties but different intergranular features. We found positive Lyapunov exponents in all samples, λm≥0.062, indicating the nonlinear dynamics of the experimental TBN signals. It was also observed higher values of the embedding dimension, m >9, and the Kaplan-Yorke dimension, DKY>2.9. Between samples, the behavior of λm and DKY with increasing excitation current is quite different. Such a behavior is explained in terms of changes in the microstructure associated with the UCP. In addition, determinism tests indicated that the TBN masked determinist components, as inferred by |k →| values larger than 0.70 in most of the cases. Evidence on the existence of empirical attractors by reconstructing the phase spaces has been also found. All obtained results are useful indicators of the interplay between the uniaxial compacting pressure, differences in the microstructure of the samples, and the TBN signal dynamics.
ROCS: a Reproducibility Index and Confidence Score for Interaction Proteomics Studies
2012-01-01
Background Affinity-Purification Mass-Spectrometry (AP-MS) provides a powerful means of identifying protein complexes and interactions. Several important challenges exist in interpreting the results of AP-MS experiments. First, the reproducibility of AP-MS experimental replicates can be low, due both to technical variability and the dynamic nature of protein interactions in the cell. Second, the identification of true protein-protein interactions in AP-MS experiments is subject to inaccuracy due to high false negative and false positive rates. Several experimental approaches can be used to mitigate these drawbacks, including the use of replicated and control experiments and relative quantification to sensitively distinguish true interacting proteins from false ones. Methods To address the issues of reproducibility and accuracy of protein-protein interactions, we introduce a two-step method, called ROCS, which makes use of Indicator Prey Proteins to select reproducible AP-MS experiments, and of Confidence Scores to select specific protein-protein interactions. The Indicator Prey Proteins account for measures of protein identifiability as well as protein reproducibility, effectively allowing removal of outlier experiments that contribute noise and affect downstream inferences. The filtered set of experiments is then used in the Protein-Protein Interaction (PPI) scoring step. Prey protein scoring is done by computing a Confidence Score, which accounts for the probability of occurrence of prey proteins in the bait experiments relative to the control experiment, where the significance cutoff parameter is estimated by simultaneously controlling false positives and false negatives against metrics of false discovery rate and biological coherence respectively. In summary, the ROCS method relies on automatic objective criterions for parameter estimation and error-controlled procedures. Results We illustrate the performance of our method by applying it to five previously published AP-MS experiments, each containing well characterized protein interactions, allowing for systematic benchmarking of ROCS. We show that our method may be used on its own to make accurate identification of specific, biologically relevant protein-protein interactions, or in combination with other AP-MS scoring methods to significantly improve inferences. Conclusions Our method addresses important issues encountered in AP-MS datasets, making ROCS a very promising tool for this purpose, either on its own or in conjunction with other methods. We anticipate that our methodology may be used more generally in proteomics studies and databases, where experimental reproducibility issues arise. The method is implemented in the R language, and is available as an R package called “ROCS”, freely available from the CRAN repository http://cran.r-project.org/. PMID:22682516
Phylogenetic and Protein Sequence Analysis of Bacterial Chemoreceptors.
Ortega, Davi R; Zhulin, Igor B
2018-01-01
Identifying chemoreceptors in sequenced bacterial genomes, revealing their domain architecture, inferring their evolutionary relationships, and comparing them to chemoreceptors of known function become important steps in genome annotation and chemotaxis research. Here, we describe bioinformatics procedures that enable such analyses, using two closely related bacterial genomes as examples.
Vitellogenin is often used to infer exposure of an organism to estrogenic substances. Vitellogenin gene induction and protein levels increase, up to a point, with concentration of estrogen and duration of exposure. A biomarker such as vitellogenin should exhibit sufficient sens...
Deterministic chaos in an ytterbium-doped mode-locked fiber laser
NASA Astrophysics Data System (ADS)
Mélo, Lucas B. A.; Palacios, Guillermo F. R.; Carelli, Pedro V.; Acioli, Lúcio H.; Rios Leite, José R.; de Miranda, Marcio H. G.
2018-05-01
We experimentally study the nonlinear dynamics of a femtosecond ytterbium doped mode-locked fiber laser. With the laser operating in the pulsed regime a route to chaos is presented, starting from stable mode-locking, period two, period four, chaos and period three regimes. Return maps and bifurcation diagrams were extracted from time series for each regime. The analysis of the time series with the laser operating in the quasi mode-locked regime presents deterministic chaos described by an unidimensional Rossler map. A positive Lyapunov exponent $\\lambda = 0.14$ confirms the deterministic chaos of the system. We suggest an explanation about the observed map by relating gain saturation and intra-cavity loss.
The viability of ADVANTG deterministic method for synthetic radiography generation
NASA Astrophysics Data System (ADS)
Bingham, Andrew; Lee, Hyoung K.
2018-07-01
Fast simulation techniques to generate synthetic radiographic images of high resolution are helpful when new radiation imaging systems are designed. However, the standard stochastic approach requires lengthy run time with poorer statistics at higher resolution. The investigation of the viability of a deterministic approach to synthetic radiography image generation was explored. The aim was to analyze a computational time decrease over the stochastic method. ADVANTG was compared to MCNP in multiple scenarios including a small radiography system prototype, to simulate high resolution radiography images. By using ADVANTG deterministic code to simulate radiography images the computational time was found to decrease 10 to 13 times compared to the MCNP stochastic approach while retaining image quality.
Structural Bioinformatics of the Interactome
Petrey, Donald; Honig, Barry
2014-01-01
The last decade has seen a dramatic expansion in the number and range of techniques available to obtain genome-wide information, and to analyze this information so as to infer both the function of individual molecules and how they interact to modulate the behavior of biological systems. Here we review these techniques, focusing on the construction of physical protein-protein interaction networks, and highlighting approaches that incorporate protein structure which is becoming an increasingly important component of systems-level computational techniques. We also discuss how network analyses are being applied to enhance the basic understanding of biological systems and their disregulation, and how they are being applied in drug development. PMID:24895853
2014-01-01
Background Non-small cell lung cancer (NSCLC) remains lethal despite the development of numerous drug therapy technologies. About 85% to 90% of lung cancers are NSCLC and the 5-year survival rate is at best still below 50%. Thus, it is important to find drugable target genes for NSCLC to develop an effective therapy for NSCLC. Results Integrated analysis of publically available gene expression and promoter methylation patterns of two highly aggressive NSCLC cell lines generated by in vivo selection was performed. We selected eleven critical genes that may mediate metastasis using recently proposed principal component analysis based unsupervised feature extraction. The eleven selected genes were significantly related to cancer diagnosis. The tertiary protein structure of the selected genes was inferred by Full Automatic Modeling System, a profile-based protein structure inference software, to determine protein functions and to specify genes that could be potential drug targets. Conclusions We identified eleven potentially critical genes that may mediate NSCLC metastasis using bioinformatic analysis of publically available data sets. These genes are potential target genes for the therapy of NSCLC. Among the eleven genes, TINAGL1 and B3GALNT1 are possible candidates for drug compounds that inhibit their gene expression. PMID:25521548
Turewicz, Michael; Kohl, Michael; Ahrens, Maike; Mayer, Gerhard; Uszkoreit, Julian; Naboulsi, Wael; Bracht, Thilo; Megger, Dominik A; Sitek, Barbara; Marcus, Katrin; Eisenacher, Martin
2017-11-10
The analysis of high-throughput mass spectrometry-based proteomics data must address the specific challenges of this technology. To this end, the comprehensive proteomics workflow offered by the de.NBI service center BioInfra.Prot provides indispensable components for the computational and statistical analysis of this kind of data. These components include tools and methods for spectrum identification and protein inference, protein quantification, expression analysis as well as data standardization and data publication. All particular methods of the workflow which address these tasks are state-of-the-art or cutting edge. As has been shown in previous publications, each of these methods is adequate to solve its specific task and gives competitive results. However, the methods included in the workflow are continuously reviewed, updated and improved to adapt to new scientific developments. All of these particular components and methods are available as stand-alone BioInfra.Prot services or as a complete workflow. Since BioInfra.Prot provides manifold fast communication channels to get access to all components of the workflow (e.g., via the BioInfra.Prot ticket system: bioinfraprot@rub.de) users can easily benefit from this service and get support by experts. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Nepusz, Tamás; Sasidharan, Rajkumar; Paccanaro, Alberto
2010-03-09
An important problem in genomics is the automatic inference of groups of homologous proteins from pairwise sequence similarities. Several approaches have been proposed for this task which are "local" in the sense that they assign a protein to a cluster based only on the distances between that protein and the other proteins in the set. It was shown recently that global methods such as spectral clustering have better performance on a wide variety of datasets. However, currently available implementations of spectral clustering methods mostly consist of a few loosely coupled Matlab scripts that assume a fair amount of familiarity with Matlab programming and hence they are inaccessible for large parts of the research community. SCPS (Spectral Clustering of Protein Sequences) is an efficient and user-friendly implementation of a spectral method for inferring protein families. The method uses only pairwise sequence similarities, and is therefore practical when only sequence information is available. SCPS was tested on difficult sets of proteins whose relationships were extracted from the SCOP database, and its results were extensively compared with those obtained using other popular protein clustering algorithms such as TribeMCL, hierarchical clustering and connected component analysis. We show that SCPS is able to identify many of the family/superfamily relationships correctly and that the quality of the obtained clusters as indicated by their F-scores is consistently better than all the other methods we compared it with. We also demonstrate the scalability of SCPS by clustering the entire SCOP database (14,183 sequences) and the complete genome of the yeast Saccharomyces cerevisiae (6,690 sequences). Besides the spectral method, SCPS also implements connected component analysis and hierarchical clustering, it integrates TribeMCL, it provides different cluster quality tools, it can extract human-readable protein descriptions using GI numbers from NCBI, it interfaces with external tools such as BLAST and Cytoscape, and it can produce publication-quality graphical representations of the clusters obtained, thus constituting a comprehensive and effective tool for practical research in computational biology. Source code and precompiled executables for Windows, Linux and Mac OS X are freely available at http://www.paccanarolab.org/software/scps.
In an earlier study, Puente and Obregón [Water Resour. Res. 32(1996)2825] reported on the usage of a deterministic fractal–multifractal (FM) methodology to faithfully describe an 8.3 h high-resolution rainfall time series in Boston, gathered every 15 s ...
Seed availability constrains plant species sorting along a soil fertility gradient
Bryan L. Foster; Erin J. Questad; Cathy D. Collins; Cheryl A. Murphy; Timothy L. Dickson; Val H. Smith
2011-01-01
1. Spatial variation in species composition within and among communities may be caused by deterministic, niche-based species sorting in response to underlying environmental heterogeneity as well as by stochastic factors such as dispersal limitation and variable species pools. An important goal in ecology is to reconcile deterministic and stochastic perspectives of...
The Role of Probability and Intentionality in Preschoolers' Causal Generalizations
ERIC Educational Resources Information Center
Sobel, David M.; Sommerville, Jessica A.; Travers, Lea V.; Blumenthal, Emily J.; Stoddard, Emily
2009-01-01
Three experiments examined whether preschoolers recognize that the causal properties of objects generalize to new members of the same set given either deterministic or probabilistic data. Experiment 1 found that 3- and 4-year-olds were able to make such a generalization given deterministic data but were at chance when they observed probabilistic…
ERIC Educational Resources Information Center
Moreland, James D., Jr
2013-01-01
This research investigates the instantiation of a Service-Oriented Architecture (SOA) within a hard real-time (stringent time constraints), deterministic (maximum predictability) combat system (CS) environment. There are numerous stakeholders across the U.S. Department of the Navy who are affected by this development, and therefore the system…
CPT-based probabilistic and deterministic assessment of in situ seismic soil liquefaction potential
Moss, R.E.S.; Seed, R.B.; Kayen, R.E.; Stewart, J.P.; Der Kiureghian, A.; Cetin, K.O.
2006-01-01
This paper presents a complete methodology for both probabilistic and deterministic assessment of seismic soil liquefaction triggering potential based on the cone penetration test (CPT). A comprehensive worldwide set of CPT-based liquefaction field case histories were compiled and back analyzed, and the data then used to develop probabilistic triggering correlations. Issues investigated in this study include improved normalization of CPT resistance measurements for the influence of effective overburden stress, and adjustment to CPT tip resistance for the potential influence of "thin" liquefiable layers. The effects of soil type and soil character (i.e., "fines" adjustment) for the new correlations are based on a combination of CPT tip and sleeve resistance. To quantify probability for performancebased engineering applications, Bayesian "regression" methods were used, and the uncertainties of all variables comprising both the seismic demand and the liquefaction resistance were estimated and included in the analysis. The resulting correlations were developed using a Bayesian framework and are presented in both probabilistic and deterministic formats. The results are compared to previous probabilistic and deterministic correlations. ?? 2006 ASCE.
Comparison of Deterministic and Probabilistic Radial Distribution Systems Load Flow
NASA Astrophysics Data System (ADS)
Gupta, Atma Ram; Kumar, Ashwani
2017-12-01
Distribution system network today is facing the challenge of meeting increased load demands from the industrial, commercial and residential sectors. The pattern of load is highly dependent on consumer behavior and temporal factors such as season of the year, day of the week or time of the day. For deterministic radial distribution load flow studies load is taken as constant. But, load varies continually with a high degree of uncertainty. So, there is a need to model probable realistic load. Monte-Carlo Simulation is used to model the probable realistic load by generating random values of active and reactive power load from the mean and standard deviation of the load and for solving a Deterministic Radial Load Flow with these values. The probabilistic solution is reconstructed from deterministic data obtained for each simulation. The main contribution of the work is: Finding impact of probable realistic ZIP load modeling on balanced radial distribution load flow. Finding impact of probable realistic ZIP load modeling on unbalanced radial distribution load flow. Compare the voltage profile and losses with probable realistic ZIP load modeling for balanced and unbalanced radial distribution load flow.
NASA Technical Reports Server (NTRS)
Hathaway, Michael D.
1986-01-01
Measurements of the unsteady velocity field within the stator row of a transonic axial-flow fan were acquired using a laser anemometer. Measurements were obtained on axisymmetric surfaces located at 10 and 50 percent span from the shroud, with the fan operating at maximum efficiency at design speed. The ensemble-average and variance of the measured velocities are used to identify rotor-wake-generated (deterministic) unsteadiness and turbulence, respectively. Correlations of both deterministic and turbulent velocity fluctuations provide information on the characteristics of unsteady interactions within the stator row. These correlations are derived from the Navier-Stokes equation in a manner similar to deriving the Reynolds stress terms, whereby various averaging operators are used to average the aperiodic, deterministic, and turbulent velocity fluctuations which are known to be present in multistage turbomachines. The correlations of deterministic and turbulent velocity fluctuations throughout the axial fan stator row are presented. In particular, amplification and attenuation of both types of unsteadiness are shown to occur within the stator blade passage.
Precision production: enabling deterministic throughput for precision aspheres with MRF
NASA Astrophysics Data System (ADS)
Maloney, Chris; Entezarian, Navid; Dumas, Paul
2017-10-01
Aspherical lenses offer advantages over spherical optics by improving image quality or reducing the number of elements necessary in an optical system. Aspheres are no longer being used exclusively by high-end optical systems but are now replacing spherical optics in many applications. The need for a method of production-manufacturing of precision aspheres has emerged and is part of the reason that the optics industry is shifting away from artisan-based techniques towards more deterministic methods. Not only does Magnetorheological Finishing (MRF) empower deterministic figure correction for the most demanding aspheres but it also enables deterministic and efficient throughput for series production of aspheres. The Q-flex MRF platform is designed to support batch production in a simple and user friendly manner. Thorlabs routinely utilizes the advancements of this platform and has provided results from using MRF to finish a batch of aspheres as a case study. We have developed an analysis notebook to evaluate necessary specifications for implementing quality control metrics. MRF brings confidence to optical manufacturing by ensuring high throughput for batch processing of aspheres.
Down to the roughness scale assessment of piston-ring/liner contacts
NASA Astrophysics Data System (ADS)
Checo, H. M.; Jaramillo, A.; Ausas, R. F.; Jai, M.; Buscaglia, G. C.
2017-02-01
The effects of surface roughness in hydrodynamic bearings been accounted for through several approaches, the most widely used being averaging or stochastic techniques. With these the surface is not treated “as it is”, but by means of an assumed probability distribution for the roughness. The so called direct, deterministic or measured-surface simulation) solve the lubrication problem with realistic surfaces down to the roughness scale. This leads to expensive computational problems. Most researchers have tackled this problem considering non-moving surfaces and neglecting the ring dynamics to reduce the computational burden. What is proposed here is to solve the fully-deterministic simulation both in space and in time, so that the actual movement of the surfaces and the rings dynamics are taken into account. This simulation is much more complex than previous ones, as it is intrinsically transient. The feasibility of these fully-deterministic simulations is illustrated two cases: fully deterministic simulation of liner surfaces with diverse finishings (honed and coated bores) with constant piston velocity and load on the ring and also in real engine conditions.
Discrete-Time Deterministic $Q$ -Learning: A Novel Convergence Analysis.
Wei, Qinglai; Lewis, Frank L; Sun, Qiuye; Yan, Pengfei; Song, Ruizhuo
2017-05-01
In this paper, a novel discrete-time deterministic Q -learning algorithm is developed. In each iteration of the developed Q -learning algorithm, the iterative Q function is updated for all the state and control spaces, instead of updating for a single state and a single control in traditional Q -learning algorithm. A new convergence criterion is established to guarantee that the iterative Q function converges to the optimum, where the convergence criterion of the learning rates for traditional Q -learning algorithms is simplified. During the convergence analysis, the upper and lower bounds of the iterative Q function are analyzed to obtain the convergence criterion, instead of analyzing the iterative Q function itself. For convenience of analysis, the convergence properties for undiscounted case of the deterministic Q -learning algorithm are first developed. Then, considering the discounted factor, the convergence criterion for the discounted case is established. Neural networks are used to approximate the iterative Q function and compute the iterative control law, respectively, for facilitating the implementation of the deterministic Q -learning algorithm. Finally, simulation results and comparisons are given to illustrate the performance of the developed algorithm.
Using a mathematical model to evaluate the efficacy of TB control measures.
Gammaitoni, L.; Nucci, M. C.
1997-01-01
We evaluated the efficacy of recommended tuberculosis (TB) infection control measures by using a deterministic mathematical model for airborne contagion. We examined the percentage of purified protein derivative conversions under various exposure conditions, environmental controlstrategies, and respiratory protective devices. We conclude that environmental control cannot eliminate the risk for TB transmission during high-risk procedures; respiratory protective devices, and particularly high-efficiency particulate air masks, may provide nearly complete protection if used with air filtration or ultraviolet irradiation. Nevertheless, the efficiency of these control measures decreases as the infectivity of the source case increases. Therefore, administrative control measures (e.g., indentifying and isolating patients with infectious TB) are the most effective because they substantially reduce the rate of infection. PMID:9284378
An integrative approach to inferring biologically meaningful gene modules
2011-01-01
Background The ability to construct biologically meaningful gene networks and modules is critical for contemporary systems biology. Though recent studies have demonstrated the power of using gene modules to shed light on the functioning of complex biological systems, most modules in these networks have shown little association with meaningful biological function. We have devised a method which directly incorporates gene ontology (GO) annotation in construction of gene modules in order to gain better functional association. Results We have devised a method, Semantic Similarity-Integrated approach for Modularization (SSIM) that integrates various gene-gene pairwise similarity values, including information obtained from gene expression, protein-protein interactions and GO annotations, in the construction of modules using affinity propagation clustering. We demonstrated the performance of the proposed method using data from two complex biological responses: 1. the osmotic shock response in Saccharomyces cerevisiae, and 2. the prion-induced pathogenic mouse model. In comparison with two previously reported algorithms, modules identified by SSIM showed significantly stronger association with biological functions. Conclusions The incorporation of semantic similarity based on GO annotation with gene expression and protein-protein interaction data can greatly enhance the functional relevance of inferred gene modules. In addition, the SSIM approach can also reveal the hierarchical structure of gene modules to gain a broader functional view of the biological system. Hence, the proposed method can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level. PMID:21791051
Competition between Primary Nucleation and Autocatalysis in Amyloid Fibril Self-Assembly
Eden, Kym; Morris, Ryan; Gillam, Jay; MacPhee, Cait E.; Allen, Rosalind J.
2015-01-01
Kinetic measurements of the self-assembly of proteins into amyloid fibrils are often used to make inferences about molecular mechanisms. In particular, the lag time—the quiescent period before aggregates are detected—is often found to scale with the protein concentration as a power law, whose exponent has been used to infer the presence or absence of autocatalytic growth processes such as fibril fragmentation. Here we show that experimental data for lag time versus protein concentration can show signs of kinks: clear changes in scaling exponent, indicating changes in the dominant molecular mechanism determining the lag time. Classical models for the kinetics of fibril assembly suggest that at least two mechanisms are at play during the lag time: primary nucleation and autocatalytic growth. Using computer simulations and theoretical calculations, we investigate whether the competition between these two processes can account for the kinks which we observe in our and others’ experimental data. We derive theoretical conditions for the crossover between nucleation-dominated and growth-dominated regimes, and analyze their dependence on system volume and autocatalysis mechanism. Comparing these predictions to the data, we find that the experimentally observed kinks cannot be explained by a simple crossover between nucleation-dominated and autocatalytic growth regimes. Our results show that existing kinetic models fail to explain detailed features of lag time versus concentration curves, suggesting that new mechanistic understanding is needed. More broadly, our work demonstrates that care is needed in interpreting lag-time scaling exponents from protein assembly data. PMID:25650930
Genome-wide protein-protein interactions and protein function exploration in cyanobacteria
Lv, Qi; Ma, Weimin; Liu, Hui; Li, Jiang; Wang, Huan; Lu, Fang; Zhao, Chen; Shi, Tieliu
2015-01-01
Genome-wide network analysis is well implemented to study proteins of unknown function. Here, we effectively explored protein functions and the biological mechanism based on inferred high confident protein-protein interaction (PPI) network in cyanobacteria. We integrated data from seven different sources and predicted 1,997 PPIs, which were evaluated by experiments in molecular mechanism, text mining of literatures in proved direct/indirect evidences, and “interologs” in conservation. Combined the predicted PPIs with known PPIs, we obtained 4,715 no-redundant PPIs (involving 3,231 proteins covering over 90% of genome) to generate the PPI network. Based on the PPI network, terms in Gene ontology (GO) were assigned to function-unknown proteins. Functional modules were identified by dissecting the PPI network into sub-networks and analyzing pathway enrichment, with which we investigated novel function of underlying proteins in protein complexes and pathways. Examples of photosynthesis and DNA repair indicate that the network approach is a powerful tool in protein function analysis. Overall, this systems biology approach provides a new insight into posterior functional analysis of PPIs in cyanobacteria. PMID:26490033
Chao, Lin; Rang, Camilla Ulla; Proenca, Audrey Menegaz; Chao, Jasper Ubirajara
2016-01-01
Non-genetic phenotypic variation is common in biological organisms. The variation is potentially beneficial if the environment is changing. If the benefit is large, selection can favor the evolution of genetic assimilation, the process by which the expression of a trait is transferred from environmental to genetic control. Genetic assimilation is an important evolutionary transition, but it is poorly understood because the fitness costs and benefits of variation are often unknown. Here we show that the partitioning of damage by a mother bacterium to its two daughters can evolve through genetic assimilation. Bacterial phenotypes are also highly variable. Because gene-regulating elements can have low copy numbers, the variation is attributed to stochastic sampling. Extant Escherichia coli partition asymmetrically and deterministically more damage to the old daughter, the one receiving the mother’s old pole. By modeling in silico damage partitioning in a population, we show that deterministic asymmetry is advantageous because it increases fitness variance and hence the efficiency of natural selection. However, we find that symmetrical but stochastic partitioning can be similarly beneficial. To examine why bacteria evolved deterministic asymmetry, we modeled the effect of damage anchored to the mother’s old pole. While anchored damage strengthens selection for asymmetry by creating additional fitness variance, it has the opposite effect on symmetry. The difference results because anchored damage reinforces the polarization of partitioning in asymmetric bacteria. In symmetric bacteria, it dilutes the polarization. Thus, stochasticity alone may have protected early bacteria from damage, but deterministic asymmetry has evolved to be equally important in extant bacteria. We estimate that 47% of damage partitioning is deterministic in E. coli. We suggest that the evolution of deterministic asymmetry from stochasticity offers an example of Waddington’s genetic assimilation. Our model is able to quantify the evolution of the assimilation because it characterizes the fitness consequences of variation. PMID:26761487
Chao, Lin; Rang, Camilla Ulla; Proenca, Audrey Menegaz; Chao, Jasper Ubirajara
2016-01-01
Non-genetic phenotypic variation is common in biological organisms. The variation is potentially beneficial if the environment is changing. If the benefit is large, selection can favor the evolution of genetic assimilation, the process by which the expression of a trait is transferred from environmental to genetic control. Genetic assimilation is an important evolutionary transition, but it is poorly understood because the fitness costs and benefits of variation are often unknown. Here we show that the partitioning of damage by a mother bacterium to its two daughters can evolve through genetic assimilation. Bacterial phenotypes are also highly variable. Because gene-regulating elements can have low copy numbers, the variation is attributed to stochastic sampling. Extant Escherichia coli partition asymmetrically and deterministically more damage to the old daughter, the one receiving the mother's old pole. By modeling in silico damage partitioning in a population, we show that deterministic asymmetry is advantageous because it increases fitness variance and hence the efficiency of natural selection. However, we find that symmetrical but stochastic partitioning can be similarly beneficial. To examine why bacteria evolved deterministic asymmetry, we modeled the effect of damage anchored to the mother's old pole. While anchored damage strengthens selection for asymmetry by creating additional fitness variance, it has the opposite effect on symmetry. The difference results because anchored damage reinforces the polarization of partitioning in asymmetric bacteria. In symmetric bacteria, it dilutes the polarization. Thus, stochasticity alone may have protected early bacteria from damage, but deterministic asymmetry has evolved to be equally important in extant bacteria. We estimate that 47% of damage partitioning is deterministic in E. coli. We suggest that the evolution of deterministic asymmetry from stochasticity offers an example of Waddington's genetic assimilation. Our model is able to quantify the evolution of the assimilation because it characterizes the fitness consequences of variation.
Impact of refining the assessment of dietary exposure to cadmium in the European adult population.
Ferrari, Pietro; Arcella, Davide; Heraud, Fanny; Cappé, Stefano; Fabiansson, Stefan
2013-01-01
Exposure assessment constitutes an important step in any risk assessment of potentially harmful substances present in food. The European Food Safety Authority (EFSA) first assessed dietary exposure to cadmium in Europe using a deterministic framework, resulting in mean values of exposure in the range of health-based guidance values. Since then, the characterisation of foods has been refined to better match occurrence and consumption data, and a new strategy to handle left-censoring in occurrence data was devised. A probabilistic assessment was performed and compared with deterministic estimates, using occurrence values at the European level and consumption data from 14 national dietary surveys. Mean estimates in the probabilistic assessment ranged from 1.38 (95% CI = 1.35-1.44) to 2.08 (1.99-2.23) µg kg⁻¹ bodyweight (bw) week⁻¹ across the different surveys, which were less than 10% lower than deterministic (middle bound) mean values that ranged from 1.50 to 2.20 µg kg⁻¹ bw week⁻¹. Probabilistic 95th percentile estimates of dietary exposure ranged from 2.65 (2.57-2.72) to 4.99 (4.62-5.38) µg kg⁻¹ bw week⁻¹, which were, with the exception of one survey, between 3% and 17% higher than middle-bound deterministic estimates. Overall, the proportion of subjects exceeding the tolerable weekly intake of 2.5 µg kg⁻¹ bw ranged from 14.8% (13.6-16.0%) to 31.2% (29.7-32.5%) according to the probabilistic assessment. The results of this work indicate that mean values of dietary exposure to cadmium in the European population were of similar magnitude using determinist or probabilistic assessments. For higher exposure levels, probabilistic estimates were almost consistently larger than deterministic counterparts, thus reflecting the impact of using the full distribution of occurrence values to determine exposure levels. It is considered prudent to use probabilistic methodology should exposure estimates be close to or exceeding health-based guidance values.
Stochastic switching in biology: from genotype to phenotype
NASA Astrophysics Data System (ADS)
Bressloff, Paul C.
2017-03-01
There has been a resurgence of interest in non-equilibrium stochastic processes in recent years, driven in part by the observation that the number of molecules (genes, mRNA, proteins) involved in gene expression are often of order 1-1000. This means that deterministic mass-action kinetics tends to break down, and one needs to take into account the discrete, stochastic nature of biochemical reactions. One of the major consequences of molecular noise is the occurrence of stochastic biological switching at both the genotypic and phenotypic levels. For example, individual gene regulatory networks can switch between graded and binary responses, exhibit translational/transcriptional bursting, and support metastability (noise-induced switching between states that are stable in the deterministic limit). If random switching persists at the phenotypic level then this can confer certain advantages to cell populations growing in a changing environment, as exemplified by bacterial persistence in response to antibiotics. Gene expression at the single-cell level can also be regulated by changes in cell density at the population level, a process known as quorum sensing. In contrast to noise-driven phenotypic switching, the switching mechanism in quorum sensing is stimulus-driven and thus noise tends to have a detrimental effect. A common approach to modeling stochastic gene expression is to assume a large but finite system and to approximate the discrete processes by continuous processes using a system-size expansion. However, there is a growing need to have some familiarity with the theory of stochastic processes that goes beyond the standard topics of chemical master equations, the system-size expansion, Langevin equations and the Fokker-Planck equation. Examples include stochastic hybrid systems (piecewise deterministic Markov processes), large deviations and the Wentzel-Kramers-Brillouin (WKB) method, adiabatic reductions, and queuing/renewal theory. The major aim of this review is to provide a self-contained survey of these mathematical methods, mainly within the context of biological switching processes at both the genotypic and phenotypic levels. However, applications to other examples of biological switching are also discussed, including stochastic ion channels, diffusion in randomly switching environments, bacterial chemotaxis, and stochastic neural networks.
The binary protein-protein interaction landscape of Escherichia coli
Rajagopala, Seesandra V.; Vlasblom, James; Arnold, Roland; Franca-Koh, Jonathan; Pakala, Suman B.; Phanse, Sadhna; Ceol, Arnaud; Häuser, Roman; Siszler, Gabriella; Wuchty, Stefan; Emili, Andrew; Babu, Mohan; Aloy, Patrick; Pieper, Rembert; Uetz, Peter
2014-01-01
Efforts to map the Escherichia coli interactome have identified several hundred macromolecular complexes, but direct binary protein-protein interactions (PPIs) have not been surveyed on a large scale. Here we performed yeast two-hybrid screens of 3,305 baits against 3,606 preys (~70% of the E. coli proteome) in duplicate to generate a map of 2,234 interactions, approximately doubling the number of known binary PPIs in E. coli. Integration of binary PPIs and genetic interactions revealed functional dependencies among components involved in cellular processes, including envelope integrity, flagellum assembly and protein quality control. Many of the binary interactions that could be mapped within multi-protein complexes were informative regarding internal topology and indicated that interactions within complexes are significantly more conserved than those interactions connecting different complexes. This resource will be useful for inferring bacterial gene function and provides a draft reference of the basic physical wiring network of this evolutionarily significant model microbe. PMID:24561554
Direct Maximization of Protein Identifications from Tandem Mass Spectra*
Spivak, Marina; Weston, Jason; Tomazela, Daniela; MacCoss, Michael J.; Noble, William Stafford
2012-01-01
The goal of many shotgun proteomics experiments is to determine the protein complement of a complex biological mixture. For many mixtures, most methodological approaches fall significantly short of this goal. Existing solutions to this problem typically subdivide the task into two stages: first identifying a collection of peptides with a low false discovery rate and then inferring from the peptides a corresponding set of proteins. In contrast, we formulate the protein identification problem as a single optimization problem, which we solve using machine learning methods. This approach is motivated by the observation that the peptide and protein level tasks are cooperative, and the solution to each can be improved by using information about the solution to the other. The resulting algorithm directly controls the relevant error rate, can incorporate a wide variety of evidence and, for complex samples, provides 18–34% more protein identifications than the current state of the art approaches. PMID:22052992
A TALE-inspired computational screen for proteins that contain approximate tandem repeats.
Perycz, Malgorzata; Krwawicz, Joanna; Bochtler, Matthias
2017-01-01
TAL (transcription activator-like) effectors (TALEs) are bacterial proteins that are secreted from bacteria to plant cells to act as transcriptional activators. TALEs and related proteins (RipTALs, BurrH, MOrTL1 and MOrTL2) contain approximate tandem repeats that differ in conserved positions that define specificity. Using PERL, we screened ~47 million protein sequences for TALE-like architecture characterized by approximate tandem repeats (between 30 and 43 amino acids in length) and sequence variability in conserved positions, without requiring sequence similarity to TALEs. Candidate proteins were scored according to their propensity for nuclear localization, secondary structure, repeat sequence complexity, as well as covariation and predicted structural proximity of variable residues. Biological context was tentatively inferred from co-occurrence of other domains and interactome predictions. Approximate repeats with TALE-like features that merit experimental characterization were found in a protein of chestnut blight fungus, a eukaryotic plant pathogen.
A TALE-inspired computational screen for proteins that contain approximate tandem repeats
Krwawicz, Joanna
2017-01-01
TAL (transcription activator-like) effectors (TALEs) are bacterial proteins that are secreted from bacteria to plant cells to act as transcriptional activators. TALEs and related proteins (RipTALs, BurrH, MOrTL1 and MOrTL2) contain approximate tandem repeats that differ in conserved positions that define specificity. Using PERL, we screened ~47 million protein sequences for TALE-like architecture characterized by approximate tandem repeats (between 30 and 43 amino acids in length) and sequence variability in conserved positions, without requiring sequence similarity to TALEs. Candidate proteins were scored according to their propensity for nuclear localization, secondary structure, repeat sequence complexity, as well as covariation and predicted structural proximity of variable residues. Biological context was tentatively inferred from co-occurrence of other domains and interactome predictions. Approximate repeats with TALE-like features that merit experimental characterization were found in a protein of chestnut blight fungus, a eukaryotic plant pathogen. PMID:28617832
Comparison of probabilistic and deterministic fiber tracking of cranial nerves.
Zolal, Amir; Sobottka, Stephan B; Podlesek, Dino; Linn, Jennifer; Rieger, Bernhard; Juratli, Tareq A; Schackert, Gabriele; Kitzler, Hagen H
2017-09-01
OBJECTIVE The depiction of cranial nerves (CNs) using diffusion tensor imaging (DTI) is of great interest in skull base tumor surgery and DTI used with deterministic tracking methods has been reported previously. However, there are still no good methods usable for the elimination of noise from the resulting depictions. The authors have hypothesized that probabilistic tracking could lead to more accurate results, because it more efficiently extracts information from the underlying data. Moreover, the authors have adapted a previously described technique for noise elimination using gradual threshold increases to probabilistic tracking. To evaluate the utility of this new approach, a comparison is provided with this work between the gradual threshold increase method in probabilistic and deterministic tracking of CNs. METHODS Both tracking methods were used to depict CNs II, III, V, and the VII+VIII bundle. Depiction of 240 CNs was attempted with each of the above methods in 30 healthy subjects, which were obtained from 2 public databases: the Kirby repository (KR) and Human Connectome Project (HCP). Elimination of erroneous fibers was attempted by gradually increasing the respective thresholds (fractional anisotropy [FA] and probabilistic index of connectivity [PICo]). The results were compared with predefined ground truth images based on corresponding anatomical scans. Two label overlap measures (false-positive error and Dice similarity coefficient) were used to evaluate the success of both methods in depicting the CN. Moreover, the differences between these parameters obtained from the KR and HCP (with higher angular resolution) databases were evaluated. Additionally, visualization of 10 CNs in 5 clinical cases was attempted with both methods and evaluated by comparing the depictions with intraoperative findings. RESULTS Maximum Dice similarity coefficients were significantly higher with probabilistic tracking (p < 0.001; Wilcoxon signed-rank test). The false-positive error of the last obtained depiction was also significantly lower in probabilistic than in deterministic tracking (p < 0.001). The HCP data yielded significantly better results in terms of the Dice coefficient in probabilistic tracking (p < 0.001, Mann-Whitney U-test) and in deterministic tracking (p = 0.02). The false-positive errors were smaller in HCP data in deterministic tracking (p < 0.001) and showed a strong trend toward significance in probabilistic tracking (p = 0.06). In the clinical cases, the probabilistic method visualized 7 of 10 attempted CNs accurately, compared with 3 correct depictions with deterministic tracking. CONCLUSIONS High angular resolution DTI scans are preferable for the DTI-based depiction of the cranial nerves. Probabilistic tracking with a gradual PICo threshold increase is more effective for this task than the previously described deterministic tracking with a gradual FA threshold increase and might represent a method that is useful for depicting cranial nerves with DTI since it eliminates the erroneous fibers without manual intervention.
Bowden, Deborah L; Vargas-Caro, Carolina; Ovenden, Jennifer R; Bennett, Michael B; Bustamante, Carlos
2016-11-01
The complete mitochondrial genome of the grey nurse shark Carcharias taurus is described from 25 963 828 sequences obtained using Illumina NGS technology. Total length of the mitogenome is 16 715 bp, consisting of 2 rRNAs, 13 protein-coding regions, 22 tRNA and 2 non-coding regions thus updating the previously published mitogenome for this species. The phylogenomic reconstruction inferred from the mitogenome of 15 species of Lamniform and Carcharhiniform sharks supports the inclusion of C. taurus in a clade with the Lamnidae and Cetorhinidae. This complete mitogenome contributes to ongoing investigation into the monophyly of the Family Odontaspididae.
Reconstructed ancestral enzymes suggest long-term cooling of Earth's photic zone since the Archean
NASA Astrophysics Data System (ADS)
Garcia, Amanda K.; Schopf, J. William; Yokobori, Shin-ichi; Akanuma, Satoshi; Yamagishi, Akihiko
2017-05-01
Paleotemperatures inferred from the isotopic compositions (δ18O and δ30Si) of marine cherts suggest that Earth’s oceans cooled from 70 ± 15 °C in the Archean to the present ˜15 °C. This interpretation, however, has been subject to question due to uncertainties regarding oceanic isotopic compositions, diagenetic or metamorphic resetting of the isotopic record, and depositional environments. Analyses of the thermostability of reconstructed ancestral enzymes provide an independent method by which to assess the temperature history inferred from the isotopic evidence. Although previous studies have demonstrated extreme thermostability in reconstructed archaeal and bacterial proteins compatible with a hot early Earth, taxa investigated may have inhabited local thermal environments that differed significantly from average surface conditions. We here present thermostability measurements of reconstructed ancestral enzymatically active nucleoside diphosphate kinases (NDKs) derived from light-requiring prokaryotic and eukaryotic phototrophs having widely separated fossil-based divergence ages. The ancestral environmental temperatures thereby determined for these photic-zone organisms--shown in modern taxa to correlate strongly with NDK thermostability--are inferred to reflect ancient surface-environment paleotemperatures. Our results suggest that Earth's surface temperature decreased over geological time from ˜65-80 °C in the Archean, a finding consistent both with previous isotope-based and protein reconstruction-based interpretations. Interdisciplinary studies such as those reported here integrating genomic, geologic, and paleontologic data hold promise for providing new insight into the coevolution of life and environment over Earth history.
Chasman, Deborah; Walters, Kevin B.; Lopes, Tiago J. S.; Eisfeld, Amie J.; Kawaoka, Yoshihiro; Roy, Sushmita
2016-01-01
Mammalian host response to pathogenic infections is controlled by a complex regulatory network connecting regulatory proteins such as transcription factors and signaling proteins to target genes. An important challenge in infectious disease research is to understand molecular similarities and differences in mammalian host response to diverse sets of pathogens. Recently, systems biology studies have produced rich collections of omic profiles measuring host response to infectious agents such as influenza viruses at multiple levels. To gain a comprehensive understanding of the regulatory network driving host response to multiple infectious agents, we integrated host transcriptomes and proteomes using a network-based approach. Our approach combines expression-based regulatory network inference, structured-sparsity based regression, and network information flow to infer putative physical regulatory programs for expression modules. We applied our approach to identify regulatory networks, modules and subnetworks that drive host response to multiple influenza infections. The inferred regulatory network and modules are significantly enriched for known pathways of immune response and implicate apoptosis, splicing, and interferon signaling processes in the differential response of viral infections of different pathogenicities. We used the learned network to prioritize regulators and study virus and time-point specific networks. RNAi-based knockdown of predicted regulators had significant impact on viral replication and include several previously unknown regulators. Taken together, our integrated analysis identified novel module level patterns that capture strain and pathogenicity-specific patterns of expression and helped identify important regulators of host response to influenza infection. PMID:27403523
Systematic inference of functional phosphorylation events in yeast metabolism.
Chen, Yu; Wang, Yonghong; Nielsen, Jens
2017-07-01
Protein phosphorylation is a post-translational modification that affects proteins by changing their structure and conformation in a rapid and reversible way, and it is an important mechanism for metabolic regulation in cells. Phosphoproteomics enables high-throughput identification of phosphorylation events on metabolic enzymes, but identifying functional phosphorylation events still requires more detailed biochemical characterization. Therefore, development of computational methods for investigating unknown functions of a large number of phosphorylation events identified by phosphoproteomics has received increased attention. We developed a mathematical framework that describes the relationship between phosphorylation level of a metabolic enzyme and the corresponding flux through the enzyme. Using this framework, it is possible to quantitatively estimate contribution of phosphorylation events to flux changes. We showed that phosphorylation regulation analysis, combined with a systematic workflow and correlation analysis, can be used for inference of functional phosphorylation events in steady and dynamic conditions, respectively. Using this analysis, we assigned functionality to phosphorylation events of 17 metabolic enzymes in the yeast Saccharomyces cerevisiae , among which 10 are novel. Phosphorylation regulation analysis cannot only be extended for inference of other functional post-translational modifications but also be a promising scaffold for multi-omics data integration in systems biology. Matlab codes for flux balance analysis in this study are available in Supplementary material. yhwang@ecust.edu.cn or nielsenj@chalmers.se. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Reconstructed ancestral enzymes suggest long-term cooling of Earth's photic zone since the Archean.
Garcia, Amanda K; Schopf, J William; Yokobori, Shin-Ichi; Akanuma, Satoshi; Yamagishi, Akihiko
2017-05-02
Paleotemperatures inferred from the isotopic compositions (δ 18 O and δ 30 Si) of marine cherts suggest that Earth's oceans cooled from 70 ± 15 °C in the Archean to the present ∼15 °C. This interpretation, however, has been subject to question due to uncertainties regarding oceanic isotopic compositions, diagenetic or metamorphic resetting of the isotopic record, and depositional environments. Analyses of the thermostability of reconstructed ancestral enzymes provide an independent method by which to assess the temperature history inferred from the isotopic evidence. Although previous studies have demonstrated extreme thermostability in reconstructed archaeal and bacterial proteins compatible with a hot early Earth, taxa investigated may have inhabited local thermal environments that differed significantly from average surface conditions. We here present thermostability measurements of reconstructed ancestral enzymatically active nucleoside diphosphate kinases (NDKs) derived from light-requiring prokaryotic and eukaryotic phototrophs having widely separated fossil-based divergence ages. The ancestral environmental temperatures thereby determined for these photic-zone organisms--shown in modern taxa to correlate strongly with NDK thermostability--are inferred to reflect ancient surface-environment paleotemperatures. Our results suggest that Earth's surface temperature decreased over geological time from ∼65-80 °C in the Archean, a finding consistent both with previous isotope-based and protein reconstruction-based interpretations. Interdisciplinary studies such as those reported here integrating genomic, geologic, and paleontologic data hold promise for providing new insight into the coevolution of life and environment over Earth history.
Historical contingency and its biophysical basis in glucocorticoid receptor evolution.
Harms, Michael J; Thornton, Joseph W
2014-08-14
Understanding how chance historical events shape evolutionary processes is a central goal of evolutionary biology. Direct insights into the extent and causes of evolutionary contingency have been limited to experimental systems, because it is difficult to know what happened in the deep past and to characterize other paths that evolution could have followed. Here we combine ancestral protein reconstruction, directed evolution and biophysical analysis to explore alternative 'might-have-been' trajectories during the ancient evolution of a novel protein function. We previously found that the evolution of cortisol specificity in the ancestral glucocorticoid receptor (GR) was contingent on permissive substitutions, which had no apparent effect on receptor function but were necessary for GR to tolerate the large-effect mutations that caused the shift in specificity. Here we show that alternative mutations that could have permitted the historical function-switching substitutions are extremely rare in the ensemble of genotypes accessible to the ancestral GR. In a library of thousands of variants of the ancestral protein, we recovered historical permissive substitutions but no alternative permissive genotypes. Using biophysical analysis, we found that permissive mutations must satisfy at least three physical requirements--they must stabilize specific local elements of the protein structure, maintain the correct energetic balance between functional conformations, and be compatible with the ancestral and derived structures--thus revealing why permissive mutations are rare. These findings demonstrate that GR evolution depended strongly on improbable, non-deterministic events, and this contingency arose from intrinsic biophysical properties of the protein.
Protein analysis by time-resolved measurements with an electro-switchable DNA chip
Langer, Andreas; Hampel, Paul A.; Kaiser, Wolfgang; Knezevic, Jelena; Welte, Thomas; Villa, Valentina; Maruyama, Makiko; Svejda, Matej; Jähner, Simone; Fischer, Frank; Strasser, Ralf; Rant, Ulrich
2013-01-01
Measurements in stationary or mobile phases are fundamental principles in protein analysis. Although the immobilization of molecules on solid supports allows for the parallel analysis of interactions, properties like size or shape are usually inferred from the molecular mobility under the influence of external forces. However, as these principles are mutually exclusive, a comprehensive characterization of proteins usually involves a multi-step workflow. Here we show how these measurement modalities can be reconciled by tethering proteins to a surface via dynamically actuated nanolevers. Short DNA strands, which are switched by alternating electric fields, are employed as capture probes to bind target proteins. By swaying the proteins over nanometre amplitudes and comparing their motional dynamics to a theoretical model, the protein diameter can be quantified with Angström accuracy. Alterations in the tertiary protein structure (folding) and conformational changes are readily detected, and even post-translational modifications are revealed by time-resolved molecular dynamics measurements. PMID:23839273
Gura Sadovsky, Rotem; Brielle, Shlomi; Kaganovich, Daniel; England, Jeremy L
2017-03-14
The fluorescence microscopy methods presently used to characterize protein motion in cells infer protein motion from indirect observables, rather than measuring protein motion directly. Operationalizing these methods requires expertise that can constitute a barrier to their broad utilization. Here, we have developed PIPE (photo-converted intensity profile expansion) to directly measure the motion of tagged proteins and quantify it using an effective diffusion coefficient. PIPE works by pulsing photo-convertible fluorescent proteins, generating a peaked fluorescence signal at the pulsed region, and analyzing the spatial expansion of the signal. We demonstrate PIPE's success in measuring accurate diffusion coefficients in silico and in vitro and compare effective diffusion coefficients of native cellular proteins and free fluorophores in vivo. We apply PIPE to measure diffusion anomality in the cell and use it to distinguish free fluorophores from native cellular proteins. PIPE's direct measurement and ease of use make it appealing for cell biologists. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Hybrid regulatory models: a statistically tractable approach to model regulatory network dynamics.
Ocone, Andrea; Millar, Andrew J; Sanguinetti, Guido
2013-04-01
Computational modelling of the dynamics of gene regulatory networks is a central task of systems biology. For networks of small/medium scale, the dominant paradigm is represented by systems of coupled non-linear ordinary differential equations (ODEs). ODEs afford great mechanistic detail and flexibility, but calibrating these models to data is often an extremely difficult statistical problem. Here, we develop a general statistical inference framework for stochastic transcription-translation networks. We use a coarse-grained approach, which represents the system as a network of stochastic (binary) promoter and (continuous) protein variables. We derive an exact inference algorithm and an efficient variational approximation that allows scalable inference and learning of the model parameters. We demonstrate the power of the approach on two biological case studies, showing that the method allows a high degree of flexibility and is capable of testable novel biological predictions. http://homepages.inf.ed.ac.uk/gsanguin/software.html. Supplementary data are available at Bioinformatics online.
Steinbrück, Lars; McHardy, Alice Carolyn
2012-01-01
Distinguishing mutations that determine an organism's phenotype from (near-) neutral ‘hitchhikers’ is a fundamental challenge in genome research, and is relevant for numerous medical and biotechnological applications. For human influenza viruses, recognizing changes in the antigenic phenotype and a strains' capability to evade pre-existing host immunity is important for the production of efficient vaccines. We have developed a method for inferring ‘antigenic trees’ for the major viral surface protein hemagglutinin. In the antigenic tree, antigenic weights are assigned to all tree branches, which allows us to resolve the antigenic impact of the associated amino acid changes. Our technique predicted antigenic distances with comparable accuracy to antigenic cartography. Additionally, it identified both known and novel sites, and amino acid changes with antigenic impact in the evolution of influenza A (H3N2) viruses from 1968 to 2003. The technique can also be applied for inference of ‘phenotype trees’ and genotype–phenotype relationships from other types of pairwise phenotype distances. PMID:22532796
Robust model-based analysis of single-particle tracking experiments with Spot-On
Grimm, Jonathan B; Lavis, Luke D
2018-01-01
Single-particle tracking (SPT) has become an important method to bridge biochemistry and cell biology since it allows direct observation of protein binding and diffusion dynamics in live cells. However, accurately inferring information from SPT studies is challenging due to biases in both data analysis and experimental design. To address analysis bias, we introduce ‘Spot-On’, an intuitive web-interface. Spot-On implements a kinetic modeling framework that accounts for known biases, including molecules moving out-of-focus, and robustly infers diffusion constants and subpopulations from pooled single-molecule trajectories. To minimize inherent experimental biases, we implement and validate stroboscopic photo-activation SPT (spaSPT), which minimizes motion-blur bias and tracking errors. We validate Spot-On using experimentally realistic simulations and show that Spot-On outperforms other methods. We then apply Spot-On to spaSPT data from live mammalian cells spanning a wide range of nuclear dynamics and demonstrate that Spot-On consistently and robustly infers subpopulation fractions and diffusion constants. PMID:29300163
Robust model-based analysis of single-particle tracking experiments with Spot-On.
Hansen, Anders S; Woringer, Maxime; Grimm, Jonathan B; Lavis, Luke D; Tjian, Robert; Darzacq, Xavier
2018-01-04
Single-particle tracking (SPT) has become an important method to bridge biochemistry and cell biology since it allows direct observation of protein binding and diffusion dynamics in live cells. However, accurately inferring information from SPT studies is challenging due to biases in both data analysis and experimental design. To address analysis bias, we introduce 'Spot-On', an intuitive web-interface. Spot-On implements a kinetic modeling framework that accounts for known biases, including molecules moving out-of-focus, and robustly infers diffusion constants and subpopulations from pooled single-molecule trajectories. To minimize inherent experimental biases, we implement and validate stroboscopic photo-activation SPT (spaSPT), which minimizes motion-blur bias and tracking errors. We validate Spot-On using experimentally realistic simulations and show that Spot-On outperforms other methods. We then apply Spot-On to spaSPT data from live mammalian cells spanning a wide range of nuclear dynamics and demonstrate that Spot-On consistently and robustly infers subpopulation fractions and diffusion constants. © 2018, Hansen et al.
An experimental phylogeny to benchmark ancestral sequence reconstruction
Randall, Ryan N.; Radford, Caelan E.; Roof, Kelsey A.; Natarajan, Divya K.; Gaucher, Eric A.
2016-01-01
Ancestral sequence reconstruction (ASR) is a still-burgeoning method that has revealed many key mechanisms of molecular evolution. One criticism of the approach is an inability to validate its algorithms within a biological context as opposed to a computer simulation. Here we build an experimental phylogeny using the gene of a single red fluorescent protein to address this criticism. The evolved phylogeny consists of 19 operational taxonomic units (leaves) and 17 ancestral bifurcations (nodes) that display a wide variety of fluorescent phenotypes. The 19 leaves then serve as ‘modern' sequences that we subject to ASR analyses using various algorithms and to benchmark against the known ancestral genotypes and ancestral phenotypes. We confirm computer simulations that show all algorithms infer ancient sequences with high accuracy, yet we also reveal wide variation in the phenotypes encoded by incorrectly inferred sequences. Specifically, Bayesian methods incorporating rate variation significantly outperform the maximum parsimony criterion in phenotypic accuracy. Subsampling of extant sequences had minor effect on the inference of ancestral sequences. PMID:27628687
Lei, Zhao; Chen, Xiao Dong
2016-01-01
N-ethylmaleimide (NEM) was used to verify that no new disulfide crosslinks were formed during the fascinating rheology of the alkali cold-gelation of whey proteins, which show Sol-Gel-Sol transitions with time at pH > 11.5. These dynamic transitions involve the formation and subsequent destruction of non-covalent interactions between soluble whey aggregates. Therefore, incubation of aggregates with NEM was expected not to affect much the rheology. Experiments show that very little additions of NEM, such as 0.5 mol per mol of protein, delayed and significantly strengthened the metastable gels formed. Interactions between whey protein aggregates were surprisingly enhanced during incubation with NEM as inferred from oscillatory rheometry at different protein concentrations, dynamic swelling, Trp fluorescence and SDS-PAGE measurements. PMID:27732644