Bayesian phylogenetic analysis supports an agricultural origin of Japonic languages.
Lee, Sean; Hasegawa, Toshikazu
2011-12-22
Languages, like genes, evolve by a process of descent with modification. This striking similarity between biological and linguistic evolution allows us to apply phylogenetic methods to explore how languages, as well as the people who speak them, are related to one another through evolutionary history. Language phylogenies constructed with lexical data have so far revealed population expansions of Austronesian, Indo-European and Bantu speakers. However, how robustly a phylogenetic approach can chart the history of language evolution and what language phylogenies reveal about human prehistory must be investigated more thoroughly on a global scale. Here we report a phylogeny of 59 Japonic languages and dialects. We used this phylogeny to estimate time depth of its root and compared it with the time suggested by an agricultural expansion scenario for Japanese origin. In agreement with the scenario, our results indicate that Japonic languages descended from a common ancestor approximately 2182 years ago. Together with archaeological and biological evidence, our results suggest that the first farmers of Japan had a profound impact on the origins of both people and languages. On a broader level, our results are consistent with a theory that agricultural expansion is the principal factor for shaping global linguistic diversity.
Bibi, F; Vrba, E; Fack, F
2012-09-01
Given that most species that have ever existed on Earth are extinct, no evolutionary history can ever be complete without the inclusion of fossil taxa. Bovids (antelopes and relatives) are one of the most diverse clades of large mammals alive today, with over a hundred living species and hundreds of documented fossil species. With the advent of molecular phylogenetics, major advances have been made in the phylogeny of this clade; however, there has been little attempt to integrate the fossil record into the developing phylogenetic picture. We here describe a new large fossil caprin species from ca. 1.9-Ma deposits from the Middle Awash, Ethiopia. To place the new species phylogenetically, we perform a Bayesian analysis of a combined molecular (cytochrome b) and morphological (osteological) character supermatrix. We include all living species of Caprini, the new fossil species, a fossil takin from the Pliocene of Ethiopia (Budorcas churcheri), and the insular subfossil Myotragus balearicus. The combined analysis demonstrates successful incorporation of both living and fossil species within a single phylogeny based on both molecular and morphological evidence. Analysis of the combined supermatrix produces superior resolution than with either the molecular or morphological data sets considered alone. Parsimony and Bayesian analyses of the data set are also compared and shown to produce similar results. The combined phylogenetic analysis indicates that the new fossil species is nested within Capra, making it one of the earliest representatives of this clade, with implications for molecular clock calibration. Geographical optimization indicates no less than four independent dispersals into Africa by caprins since the Pliocene. © 2012 The Authors. Journal of Evolutionary Biology © 2012 European Society For Evolutionary Biology.
2016-01-01
Dengue fever is the most important arboviral disease in the tropical and sub-tropical countries of the world. Delhi, the metropolitan capital state of India, has reported many dengue outbreaks, with the last outbreak occurring in 2013. We have recently reported predominance of dengue virus serotype 2 during 2011–2014 in Delhi. In the present study, we report molecular characterization and evolutionary analysis of dengue serotype 2 viruses which were detected in 2011–2014 in Delhi. Envelope genes of 42 DENV-2 strains were sequenced in the study. All DENV-2 strains grouped within the Cosmopolitan genotype and further clustered into three lineages; Lineage I, II and III. Lineage III replaced lineage I during dengue fever outbreak of 2013. Further, a novel mutation Thr404Ile was detected in the stem region of the envelope protein of a single DENV-2 strain in 2014. Nucleotide substitution rate and time to the most recent common ancestor were determined by molecular clock analysis using Bayesian methods. A change in effective population size of Indian DENV-2 viruses was investigated through Bayesian skyline plot. The study will be a vital road map for investigation of epidemiology and evolutionary pattern of dengue viruses in India. PMID:26977703
Dengue fever is the most important arboviral disease in the tropical and sub-tropical countries of the world. Delhi, the metropolitan capital state of India, has reported many dengue outbreaks, with the last outbreak occurring in 2013. We have recently reported predominance of dengue virus serotype 2 during 2011-2014 in Delhi. In the present study, we report molecular characterization and evolutionary analysis of dengue serotype 2 viruses which were detected in 2011-2014 in Delhi. Envelope genes of 42 DENV-2 strains were sequenced in the study. All DENV-2 strains grouped within the Cosmopolitan genotype and further clustered into three lineages; Lineage I, II and III. Lineage III replaced lineage I during dengue fever outbreak of 2013. Further, a novel mutation Thr404Ile was detected in the stem region of the envelope protein of a single DENV-2 strain in 2014. Nucleotide substitution rate and time to the most recent common ancestor were determined by molecular clock analysis using Bayesian methods. A change in effective population size of Indian DENV-2 viruses was investigated through Bayesian skyline plot. The study will be a vital road map for investigation of epidemiology and evolutionary pattern of dengue viruses in India.
Concordance analysis in mitogenomic phylogenetics.
Weisrock, David W
2012-10-01
Here I advocate the utility of Bayesian concordance analysis as a mechanism for exploring the magnitude and source of phylogenetic signal in concatenated mitogenomic phylogenetic studies. While typically applied to the study of independently evolving gene trees, Bayesian concordance analysis can also be applied to linked, but individually analyzed, gene regions using a prior probability that reflects the expectation of similar phylogenetic reconstructions. For true branches in the mitogenomic tree, concordance factors should represent the number of gene regions that contain phylogenetic signal for a particular clade. As a demonstration of the application of Bayesian concordance analysis to empirical data, I analyzed two different salamander (Hynobiidae and Plethodontidae) mitogenomic data sets using a gene-based partitioning strategy. The results revealed many strongly supported clades in the concatenated trees that have high concordance factors, permitting the inference that these are robustly resolved through phylogenetic signal distributed across the mitogenome. In contrast, a number of strongly supported clades in the concatenated tree received low concordance factors, indicating that their reconstruction is either driven primarily by phylogenetic signal in a small number of gene regions, or that they are inconsistent reconstructions influenced by properties of the data that can produce inaccurate trees (e.g., compositional bias, selection, etc.). Exploration of the Bayesian joint posterior distribution of trees highlighted partitions that contribute phylogenetic information to similar clade reconstructions. This approach was particularly insightful in the hynobiid data, where different combinations of genes were identified that support alternative tree reconstructions. Concatenated analysis of these different subsets of genes highlighted through Bayesian concordance analysis produced strongly supported and contrasting trees, demonstrating the potential for
Bayesian phylogenetic estimation of fossil ages.
Drummond, Alexei J; Stadler, Tanja
2016-07-19
Recent advances have allowed for both morphological fossil evidence and molecular sequences to be integrated into a single combined inference of divergence dates under the rule of Bayesian probability. In particular, the fossilized birth-death tree prior and the Lewis-Mk model of discrete morphological evolution allow for the estimation of both divergence times and phylogenetic relationships between fossil and extant taxa. We exploit this statistical framework to investigate the internal consistency of these models by producing phylogenetic estimates of the age of each fossil in turn, within two rich and well-characterized datasets of fossil and extant species (penguins and canids). We find that the estimation accuracy of fossil ages is generally high with credible intervals seldom excluding the true age and median relative error in the two datasets of 5.7% and 13.2%, respectively. The median relative standard error (RSD) was 9.2% and 7.2%, respectively, suggesting good precision, although with some outliers. In fact, in the two datasets we analyse, the phylogenetic estimate of fossil age is on average less than 2 Myr from the mid-point age of the geological strata from which it was excavated. The high level of internal consistency found in our analyses suggests that the Bayesian statistical model employed is an adequate fit for both the geological and morphological data, and provides evidence from real data that the framework used can accurately model the evolution of discrete morphological traits coded from fossil and extant taxa. We anticipate that this approach will have diverse applications beyond divergence time dating, including dating fossils that are temporally unconstrained, testing of the 'morphological clock', and for uncovering potential model misspecification and/or data errors when controversial phylogenetic hypotheses are obtained based on combined divergence dating analyses.This article is part of the themed issue 'Dating species divergences using
Bayesian phylogenetic estimation of fossil ages
Recent advances have allowed for both morphological fossil evidence and molecular sequences to be integrated into a single combined inference of divergence dates under the rule of Bayesian probability. In particular, the fossilized birth–death tree prior and the Lewis-Mk model of discrete morphological evolution allow for the estimation of both divergence times and phylogenetic relationships between fossil and extant taxa. We exploit this statistical framework to investigate the internal consistency of these models by producing phylogenetic estimates of the age of each fossil in turn, within two rich and well-characterized datasets of fossil and extant species (penguins and canids). We find that the estimation accuracy of fossil ages is generally high with credible intervals seldom excluding the true age and median relative error in the two datasets of 5.7% and 13.2%, respectively. The median relative standard error (RSD) was 9.2% and 7.2%, respectively, suggesting good precision, although with some outliers. In fact, in the two datasets we analyse, the phylogenetic estimate of fossil age is on average less than 2 Myr from the mid-point age of the geological strata from which it was excavated. The high level of internal consistency found in our analyses suggests that the Bayesian statistical model employed is an adequate fit for both the geological and morphological data, and provides evidence from real data that the framework used can accurately model the evolution of discrete morphological traits coded from fossil and extant taxa. We anticipate that this approach will have diverse applications beyond divergence time dating, including dating fossils that are temporally unconstrained, testing of the ‘morphological clock', and for uncovering potential model misspecification and/or data errors when controversial phylogenetic hypotheses are obtained based on combined divergence dating analyses. This article is part of the themed issue ‘Dating species divergences
Cross-validation to select Bayesian hierarchical models in phylogenetics.
Duchêne, Sebastián; Duchêne, David A; Di Giallonardo, Francesca; Eden, John-Sebastian; Geoghegan, Jemma L; Holt, Kathryn E; Ho, Simon Y W; Holmes, Edward C
2016-05-26
Recent developments in Bayesian phylogenetic models have increased the range of inferences that can be drawn from molecular sequence data. Accordingly, model selection has become an important component of phylogenetic analysis. Methods of model selection generally consider the likelihood of the data under the model in question. In the context of Bayesian phylogenetics, the most common approach involves estimating the marginal likelihood, which is typically done by integrating the likelihood across model parameters, weighted by the prior. Although this method is accurate, it is sensitive to the presence of improper priors. We explored an alternative approach based on cross-validation that is widely used in evolutionary analysis. This involves comparing models according to their predictive performance. We analysed simulated data and a range of viral and bacterial data sets using a cross-validation approach to compare a variety of molecular clock and demographic models. Our results show that cross-validation can be effective in distinguishing between strict- and relaxed-clock models and in identifying demographic models that allow growth in population size over time. In most of our empirical data analyses, the model selected using cross-validation was able to match that selected using marginal-likelihood estimation. The accuracy of cross-validation appears to improve with longer sequence data, particularly when distinguishing between relaxed-clock models. Cross-validation is a useful method for Bayesian phylogenetic model selection. This method can be readily implemented even when considering complex models where selecting an appropriate prior for all parameters may be difficult.
Bayesian nonparametric clustering in phylogenetics: modeling antigenic evolution in influenza.
Cybis, Gabriela B; Sinsheimer, Janet S; Bedford, Trevor; Rambaut, Andrew; Lemey, Philippe; Suchard, Marc A
2017-01-18
Influenza is responsible for up to 500,000 deaths every year, and antigenic variability represents much of its epidemiological burden. To visualize antigenic differences across many viral strains, antigenic cartography methods use multidimensional scaling on binding assay data to map influenza antigenicity onto a low-dimensional space. Analysis of such assay data ideally leads to natural clustering of influenza strains of similar antigenicity that correlate with sequence evolution. To understand the dynamics of these antigenic groups, we present a framework that jointly models genetic and antigenic evolution by combining multidimensional scaling of binding assay data, Bayesian phylogenetic machinery and nonparametric clustering methods. We propose a phylogenetic Chinese restaurant process that extends the current process to incorporate the phylogenetic dependency structure between strains in the modeling of antigenic clusters. With this method, we are able to use the genetic information to better understand the evolution of antigenicity throughout epidemics, as shown in applications of this model to H1N1 influenza. Copyright © 2017 John Wiley & Sons, Ltd.
Yuan, Ying; MacKinnon, David P.
2009-01-01
In this article, we propose Bayesian analysis of mediation effects. Compared with conventional frequentist mediation analysis, the Bayesian approach has several advantages. First, it allows researchers to incorporate prior information into the mediation analysis, thus potentially improving the efficiency of estimates. Second, under the Bayesian…
Yuan, Ying; MacKinnon, David P.
2009-01-01
In this article, we propose Bayesian analysis of mediation effects. Compared with conventional frequentist mediation analysis, the Bayesian approach has several advantages. First, it allows researchers to incorporate prior information into the mediation analysis, thus potentially improving the efficiency of estimates. Second, under the Bayesian…
Angeletti, Silvia; Lo Presti, Alessandra; Cella, Eleonora; Dicuonzo, Giordano; Crea, Francesca; Palazzotti, Bernardetta; Dedej, Etleva; Ciccozzi, Massimo; De Florio, Lucia
2015-12-01
Clinical Candida isolates from two different hospitals in Rome were identified and clustered by MALDI-TOF MS system and their origin and evolution estimated by Bayesian phylogenetic analysis. The different species of Candida were correctly identified and clustered separately, confirming the ability of these techniques to discriminate between different Candida species. Focusing MALDI-TOF analysis on a single Candida species, Candida albicans and Candida parapsilosis strains clustered differently for hospital setting as well as for period of isolation than Candida glabrata and Candida tropicalis isolates. The evolutionary rates of C. albicans and C. parapsilosis (1.93×10(-2) and 1.17×10(-2)substitutions/site/year, respectively) were in agreement with a higher rate of mutation of these species, even in a narrow period, than what was observed in C. glabrata and C. tropicalis strains (6.99×10(-4) and 7.52×10(-3)substitutions/site/year, respectively). C. albicans resulted as the species with the highest between and within clades genetic distance values in agreement with the temporal-related clustering found by MALDI-TOF and the high evolutionary rate 1.93×10(-2)substitutions/site/year. Copyright © 2015 Elsevier B.V. All rights reserved.
Long-Branch Attraction Bias and Inconsistency in Bayesian Phylogenetics
Kolaczkowski, Bryan; Thornton, Joseph W.
2009-01-01
Bayesian inference (BI) of phylogenetic relationships uses the same probabilistic models of evolution as its precursor maximum likelihood (ML), so BI has generally been assumed to share ML's desirable statistical properties, such as largely unbiased inference of topology given an accurate model and increasingly reliable inferences as the amount of data increases. Here we show that BI, unlike ML, is biased in favor of topologies that group long branches together, even when the true model and prior distributions of evolutionary parameters over a group of phylogenies are known. Using experimental simulation studies and numerical and mathematical analyses, we show that this bias becomes more severe as more data are analyzed, causing BI to infer an incorrect tree as the maximum a posteriori phylogeny with asymptotically high support as sequence length approaches infinity. BI's long branch attraction bias is relatively weak when the true model is simple but becomes pronounced when sequence sites evolve heterogeneously, even when this complexity is incorporated in the model. This bias—which is apparent under both controlled simulation conditions and in analyses of empirical sequence data—also makes BI less efficient and less robust to the use of an incorrect evolutionary model than ML. Surprisingly, BI's bias is caused by one of the method's stated advantages—that it incorporates uncertainty about branch lengths by integrating over a distribution of possible values instead of estimating them from the data, as ML does. Our findings suggest that trees inferred using BI should be interpreted with caution and that ML may be a more reliable framework for modern phylogenetic analysis. PMID:20011052
Phylogenetic Analyses: A Toolbox Expanding towards Bayesian Methods
Aris-Brosou, Stéphane; Xia, Xuhua
2008-01-01
The reconstruction of phylogenies is becoming an increasingly simple activity. This is mainly due to two reasons: the democratization of computing power and the increased availability of sophisticated yet user-friendly software. This review describes some of the latest additions to the phylogenetic toolbox, along with some of their theoretical and practical limitations. It is shown that Bayesian methods are under heavy development, as they offer the possibility to solve a number of long-standing issues and to integrate several steps of the phylogenetic analyses into a single framework. Specific topics include not only phylogenetic reconstruction, but also the comparison of phylogenies, the detection of adaptive evolution, and the estimation of divergence times between species. PMID:18483574
Bayesian data analysis for newcomers.
Kruschke, John K; Liddell, Torrin M
2017-04-12
This article explains the foundational concepts of Bayesian data analysis using virtually no mathematical notation. Bayesian ideas already match your intuitions from everyday reasoning and from traditional data analysis. Simple examples of Bayesian data analysis are presented that illustrate how the information delivered by a Bayesian analysis can be directly interpreted. Bayesian approaches to null-value assessment are discussed. The article clarifies misconceptions about Bayesian methods that newcomers might have acquired elsewhere. We discuss prior distributions and explain how they are not a liability but an important asset. We discuss the relation of Bayesian data analysis to Bayesian models of mind, and we briefly discuss what methodological problems Bayesian data analysis is not meant to solve. After you have read this article, you should have a clear sense of how Bayesian data analysis works and the sort of information it delivers, and why that information is so intuitive and useful for drawing conclusions from data.
Golemba, Marcelo D; Di Lello, Federico A; Bessone, Fernando; Fay, Fabian; Benetti, Silvina; Jones, Leandro R; Campos, Rodolfo H
2010-01-18
Previous studies in Argentina have documented a general prevalence of Hepatitis C Virus (HCV) infection close to 2%. In addition, a high prevalence of HCV has been recently reported in different Argentinean small rural communities. In this work, we performed a study aimed at analyzing the origins and diversification patterns of an HCV outbreak in Wheelwright, a small rural town located in Santa Fe province (Argentina).A total of 89 out of 1814 blood samples collected from people living in Wheelwright, were positive for HCV infection. The highest prevalence (4.9%) was observed in people older than 50 years, with the highest level for the group aged between 70-79 years (22%). The RFLP analyses showed that 91% of the positive samples belonged to the HCV-1b genotype. The E1/E2 and NS5B genes were sequenced, and their phylogenetic analysis showed that the HCV-1b sequences from Wheelwright were monophyletic. Bayesian coalescent-based methods were used to estimate substitution rates and time of the most recent common ancestor (tMRCA). The mean estimated substitution rates and the tMRCA for E1/E2 with and without HVR1 and NS5B were 7.41E-03 s/s/y and 61 years, 5.05E-03 s/s/y and 58 years and 3.24E-03 s/s/y and 53 years, respectively. In summary, the tMRCA values, the demographic model with constant population size, and the fact that the highest prevalence of infection was observed in elder people support the hypothesis that the HCV-1b introduction in Wheelwright initially occurred at least five decades ago and that the early epidemic was characterized by a fast rate of virus transmission. The epidemic seems to have been controlled later on down to the standard transmission rates observed elsewhere.
2010-01-01
Voglmayr, Hermann; Riethmüller, Alexandra; Göker, Markus; Weiss, Michael; Oberwinkler, Franz
2004-09-01
Bayesian and maximum parsimony phylogenetic analyses of 92 collections of the genera Basidiophora, Bremia, Paraperonospora, Phytophthora and Plasmopara were performed using nuclear large subunit ribosomal DNA sequences containing the D1 and D2 regions. In the Bayesian tree, two main clades were apparent: one clade containing Plasmopara pygmaea s. lat., Pl. sphaerosperma, Basidiophora, Bremia and Paraperonospora, and a clade containing all other Plasmopara species. Plasmopara is shown to be polyphyletic, and Pl. sphaerosperma is transferred to a new genus, Protobremia, for which also the oospore characteristics are described. Within the core Plasmopara clade, all collections originating from the same host family except from Asteraceae and Geraniaceae formed monophyletic clades; however, higher-level phylogenetic relationships lack significant branch support. A sister group relationship of Pl. sphaerosperma with Bremia lactucae is highly supported. Within Bremia lactucae s. l., three distinct clades are evident, which only partly conform to the published host specificity groups. All species of the genera Basidiophora, Bremia, Paraperonospora and Plasmopara included in the present study were investigated for haustorial morphology, and all had ellipsoid to pyriform haustoria, which are regarded as a diagnostic synapomorphy of the whole clade. Aspects of coevolution and cospeciation within the downy mildew pathogens with ellipsoid to pyriform haustoria are briefly discussed.
Genealogical Working Distributions for Bayesian Model Testing with Phylogenetic Uncertainty.
Baele, Guy; Lemey, Philippe; Suchard, Marc A
2016-03-01
Marginal likelihood estimates to compare models using Bayes factors frequently accompany Bayesian phylogenetic inference. Approaches to estimate marginal likelihoods have garnered increased attention over the past decade. In particular, the introduction of path sampling (PS) and stepping-stone sampling (SS) into Bayesian phylogenetics has tremendously improved the accuracy of model selection. These sampling techniques are now used to evaluate complex evolutionary and population genetic models on empirical data sets, but considerable computational demands hamper their widespread adoption. Further, when very diffuse, but proper priors are specified for model parameters, numerical issues complicate the exploration of the priors, a necessary step in marginal likelihood estimation using PS or SS. To avoid such instabilities, generalized SS (GSS) has recently been proposed, introducing the concept of "working distributions" to facilitate--or shorten--the integration process that underlies marginal likelihood estimation. However, the need to fix the tree topology currently limits GSS in a coalescent-based framework. Here, we extend GSS by relaxing the fixed underlying tree topology assumption. To this purpose, we introduce a "working" distribution on the space of genealogies, which enables estimating marginal likelihoods while accommodating phylogenetic uncertainty. We propose two different "working" distributions that help GSS to outperform PS and SS in terms of accuracy when comparing demographic and evolutionary models applied to synthetic data and real-world examples. Further, we show that the use of very diffuse priors can lead to a considerable overestimation in marginal likelihood when using PS and SS, while still retrieving the correct marginal likelihood using both GSS approaches. The methods used in this article are available in BEAST, a powerful user-friendly software package to perform Bayesian evolutionary analyses. © The Author(s) 2015. Published by Oxford
Bayesian Exploratory Factor Analysis
Conti, Gabriella; Frühwirth-Schnatter, Sylvia; Heckman, James J.; Piatek, Rémi
2014-01-01
This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the corresponding factor loadings. Classical identification criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo study confirms the validity of the approach. The method is used to produce interpretable low dimensional aggregates from a high dimensional set of psychological measurements. PMID:25431517
Comparative performance of Bayesian and AIC-based measures of phylogenetic model uncertainty.
Alfaro, Michael E; Huelsenbeck, John P
2006-02-01
Reversible-jump Markov chain Monte Carlo (RJ-MCMC) is a technique for simultaneously evaluating multiple related (but not necessarily nested) statistical models that has recently been applied to the problem of phylogenetic model selection. Here we use a simulation approach to assess the performance of this method and compare it to Akaike weights, a measure of model uncertainty that is based on the Akaike information criterion. Under conditions where the assumptions of the candidate models matched the generating conditions, both Bayesian and AIC-based methods perform well. The 95% credible interval contained the generating model close to 95% of the time. However, the size of the credible interval differed with the Bayesian credible set containing approximately 25% to 50% fewer models than an AIC-based credible interval. The posterior probability was a better indicator of the correct model than the Akaike weight when all assumptions were met but both measures performed similarly when some model assumptions were violated. Models in the Bayesian posterior distribution were also more similar to the generating model in their number of parameters and were less biased in their complexity. In contrast, Akaike-weighted models were more distant from the generating model and biased towards slightly greater complexity. The AIC-based credible interval appeared to be more robust to the violation of the rate homogeneity assumption. Both AIC and Bayesian approaches suggest that substantial uncertainty can accompany the choice of model for phylogenetic analyses, suggesting that alternative candidate models should be examined in analysis of phylogenetic data. [AIC; Akaike weights; Bayesian phylogenetics; model averaging; model selection; model uncertainty; posterior probability; reversible jump.].
Bayesian modelling of compositional heterogeneity in molecular phylogenetics.
Heaps, Sarah E; Nye, Tom M W; Boys, Richard J; Williams, Tom A; Embley, T Martin
2014-10-01
In molecular phylogenetics, standard models of sequence evolution generally assume that sequence composition remains constant over evolutionary time. However, this assumption is violated in many datasets which show substantial heterogeneity in sequence composition across taxa. We propose a model which allows compositional heterogeneity across branches, and formulate the model in a Bayesian framework. Specifically, the root and each branch of the tree is associated with its own composition vector whilst a global matrix of exchangeability parameters applies everywhere on the tree. We encourage borrowing of strength between branches by developing two possible priors for the composition vectors: one in which information can be exchanged equally amongst all branches of the tree and another in which more information is exchanged between neighbouring branches than between distant branches. We also propose a Markov chain Monte Carlo (MCMC) algorithm for posterior inference which uses data augmentation of substitutional histories to yield a simple complete data likelihood function that factorises over branches and allows Gibbs updates for most parameters. Standard phylogenetic models are not informative about the root position. Therefore a significant advantage of the proposed model is that it allows inference about rooted trees. The position of the root is fundamental to the biological interpretation of trees, both for polarising trait evolution and for establishing the order of divergence among lineages. Furthermore, unlike some other related models from the literature, inference in the model we propose can be carried out through a simple MCMC scheme which does not require problematic dimension-changing moves. We investigate the performance of the model and priors in analyses of two alignments for which there is strong biological opinion about the tree topology and root position.
Tracing the roots of syntax with Bayesian phylogenetics
Maurits, Luke; Griffiths, Thomas L.
2014-01-01
The ordering of subject, verb, and object is one of the fundamental components of the syntax of natural languages. The distribution of basic word orders across the world’s languages is highly nonuniform, with the majority of languages being either subject-object-verb (SOV) or subject-verb-object (SVO). Explaining this fact using psychological accounts of language acquisition or processing requires understanding how the present distribution has resulted from ancestral distributions and the rates of change between orders. We show that Bayesian phylogenetics can provide quantitative answers to three important questions: how word orders are likely to change over time, which word orders were dominant historically, and whether strong inferences about the origins of syntax can be drawn from modern languages. We find that SOV to SVO change is more common than the reverse and VSO to SVO change is more common than VSO to SOV, and that if the seven language families we consider share a common ancestor then that common ancestor likely had SOV word order, but also that there are limits on how confidently we can make inferences about ancestral word order based on modern-day observations. These results shed new light on old questions from historical linguistics and provide clear targets for psychological explanations of word-order distributions. PMID:25192934
Tracing the roots of syntax with Bayesian phylogenetics.
Maurits, Luke; Griffiths, Thomas L
2014-09-16
The ordering of subject, verb, and object is one of the fundamental components of the syntax of natural languages. The distribution of basic word orders across the world's languages is highly nonuniform, with the majority of languages being either subject-object-verb (SOV) or subject-verb-object (SVO). Explaining this fact using psychological accounts of language acquisition or processing requires understanding how the present distribution has resulted from ancestral distributions and the rates of change between orders. We show that Bayesian phylogenetics can provide quantitative answers to three important questions: how word orders are likely to change over time, which word orders were dominant historically, and whether strong inferences about the origins of syntax can be drawn from modern languages. We find that SOV to SVO change is more common than the reverse and VSO to SVO change is more common than VSO to SOV, and that if the seven language families we consider share a common ancestor then that common ancestor likely had SOV word order, but also that there are limits on how confidently we can make inferences about ancestral word order based on modern-day observations. These results shed new light on old questions from historical linguistics and provide clear targets for psychological explanations of word-order distributions.
MATTHIAS, MICHAEL A.; DÍAZ, M. MÓNICA; CAMPOS, KALINA J.; CALDERON, MARITZA; WILLIG, MICHAEL R.; PACHECO, VICTOR; GOTUZZO, EDUARDO; GILMAN, ROBERT H.; VINETZ, JOSEPH M.
2008-01-01
The role of bats as potential sources of transmission to humans or as maintenance hosts of leptospires is poorly understood. We quantified the prevalence of leptospiral colonization in bats in the Peruvian Amazon in the vicinity of Iquitos, an area of high biologic diversity. Of 589 analyzed bats, culture (3 of 589) and molecular evidence (20 of 589) of leptospiral colonization was found in the kidneys, yielding an overall colonization rate of 3.4%. Infection rates differed with habitat and location, and among different bat species. Bayesian analysis was used to infer phylogenic relationships of leptospiral 16S ribosomal DNA sequences. Tree topologies were consistent with groupings based on DNA-DNA hybridization studies. A diverse group of leptospires was found in peri-Iquitos bat populations including Leptospira interrogans (5 clones), L. kirschneri (1), L. borgpetersenii (4), L. fainei (1), and two previously undescribed leptospiral species (8). Although L. kirschenri and L. interrogans have been previously isolated from bats, this report is the first to describe L. borgpetersenii and L. fainei infection of bats. A wild animal reservoir of L. fainei has not been previously described. The detection in bats of the L. interrogans serovar Icterohemorrhagiae, a leptospire typically maintained by peridomestic rats, suggests a rodent-bat infection cycle. Bats in Iquitos maintain a genetically diverse group of leptospires. These results provide a solid basis for pursuing molecular epidemiologic studies of bat-associated Leptospira, a potentially new epidemiologic reservoir of transmission of leptospirosis to humans. PMID:16282313
Dembo, Mana; Radovčić, Davorka; Garvin, Heather M; Laird, Myra F; Schroeder, Lauren; Scott, Jill E; Brophy, Juliet; Ackermann, Rebecca R; Musiba, Chares M; de Ruiter, Darryl J; Mooers, Arne Ø; Collard, Mark
2016-08-01
Homo naledi is a recently discovered species of fossil hominin from South Africa. A considerable amount is already known about H. naledi but some important questions remain unanswered. Here we report a study that addressed two of them: "Where does H. naledi fit in the hominin evolutionary tree?" and "How old is it?" We used a large supermatrix of craniodental characters for both early and late hominin species and Bayesian phylogenetic techniques to carry out three analyses. First, we performed a dated Bayesian analysis to generate estimates of the evolutionary relationships of fossil hominins including H. naledi. Then we employed Bayes factor tests to compare the strength of support for hypotheses about the relationships of H. naledi suggested by the best-estimate trees. Lastly, we carried out a resampling analysis to assess the accuracy of the age estimate for H. naledi yielded by the dated Bayesian analysis. The analyses strongly supported the hypothesis that H. naledi forms a clade with the other Homo species and Australopithecus sediba. The analyses were more ambiguous regarding the position of H. naledi within the (Homo, Au. sediba) clade. A number of hypotheses were rejected, but several others were not. Based on the available craniodental data, Homo antecessor, Asian Homo erectus, Homo habilis, Homo floresiensis, Homo sapiens, and Au. sediba could all be the sister taxon of H. naledi. According to the dated Bayesian analysis, the most likely age for H. naledi is 912 ka. This age estimate was supported by the resampling analysis. Our findings have a number of implications. Most notably, they support the assignment of the new specimens to Homo, cast doubt on the claim that H. naledi is simply a variant of H. erectus, and suggest H. naledi is younger than has been previously proposed.
Matthews, Luke J; Tehrani, Jamie J; Jordan, Fiona M; Collard, Mark; Nunn, Charles L
2011-04-29
Archaeologists and anthropologists have long recognized that different cultural complexes may have distinct descent histories, but they have lacked analytical techniques capable of easily identifying such incongruence. Here, we show how bayesian phylogenetic analysis can be used to identify incongruent cultural histories. We employ the approach to investigate Iranian tribal textile traditions. We used bayes factor comparisons in a phylogenetic framework to test two models of cultural evolution: the hierarchically integrated system hypothesis and the multiple coherent units hypothesis. In the hierarchically integrated system hypothesis, a core tradition of characters evolves through descent with modification and characters peripheral to the core are exchanged among contemporaneous populations. In the multiple coherent units hypothesis, a core tradition does not exist. Rather, there are several cultural units consisting of sets of characters that have different histories of descent. For the Iranian textiles, the bayesian phylogenetic analyses supported the multiple coherent units hypothesis over the hierarchically integrated system hypothesis. Our analyses suggest that pile-weave designs represent a distinct cultural unit that has a different phylogenetic history compared to other textile characters. The results from the Iranian textiles are consistent with the available ethnographic evidence, which suggests that the commercial rug market has influenced pile-rug designs but not the techniques or designs incorporated in the other textiles produced by the tribes. We anticipate that bayesian phylogenetic tests for inferring cultural units will be of great value for researchers interested in studying the evolution of cultural traits including language, behavior, and material culture.
Combined data, Bayesian phylogenetics, and the origin of the New Zealand cicada genera.
Buckley, Thomas R; Arensburger, Peter; Simon, Chris; Chambers, Geoffrey K
2002-02-01
We have applied Bayesian and maximum likelihood methods of phylogenetic estimation to data from four mitochondrial genes (COI, COII, 12S, and 16S) and a single nuclear gene (EF1alpha) from several genera of New Zealand, Australian, and New Caledonian cicada taxa. We specifically focused on the heterogeneity of phylogenetic signal among the different data partitions and the biogeographic origins of the New Zealand cicada fauna. The Bayesian analyses circumvent many of the problems associated with other statistical tests for comparing data partitions. We took an information-theoretic approach to model selection based on the Akaike Information Criterion (AIC). This approach indicated that there was considerable uncertainty in identifying the best-fit model for some of the partitions. Additionally, a large amount of uncertainty was associated with many parameter estimates from the substitution model. However, a sensitivity analysis on the combined dataset indicated that the model selection uncertainty had little effect on estimates of topology because these estimates were largely insensitive to changes in the assumed model. This outcome suggests strong signal in our data. Our analyses support a New Caledonian affiliation of the New Zealand cicada genera Maoricicada, Kikihia, and Rhodopsalta and Australian affinities for the genera Amphipsalta and Notopsalta. This result was surprising, given that previous cicada biologists suspected a close relationship between Amphipsalta, Notopsalta, and Rhodopsalta based on genitalic characters. Relationships among the closely related genera Maoricicada, Kikihia, and Rhodopsalta were poorly resolved, the mitochondrial data and the EF1alpha data favoring different arrangements within this clade.
Calibrated birth-death phylogenetic time-tree priors for bayesian inference.
Heled, Joseph; Drummond, Alexei J
2015-05-01
Here we introduce a general class of multiple calibration birth-death tree priors for use in Bayesian phylogenetic inference. All tree priors in this class separate ancestral node heights into a set of "calibrated nodes" and "uncalibrated nodes" such that the marginal distribution of the calibrated nodes is user-specified whereas the density ratio of the birth-death prior is retained for trees with equal values for the calibrated nodes. We describe two formulations, one in which the calibration information informs the prior on ranked tree topologies, through the (conditional) prior, and the other which factorizes the prior on divergence times and ranked topologies, thus allowing uniform, or any arbitrary prior distribution on ranked topologies. Although the first of these formulations has some attractive properties, the algorithm we present for computing its prior density is computationally intensive. However, the second formulation is always faster and computationally efficient for up to six calibrations. We demonstrate the utility of the new class of multiple-calibration tree priors using both small simulations and a real-world analysis and compare the results to existing schemes. The two new calibrated tree priors described in this article offer greater flexibility and control of prior specification in calibrated time-tree inference and divergence time dating, and will remove the need for indirect approaches to the assessment of the combined effect of calibration densities and tree priors in Bayesian phylogenetic inference.
Calibrated Birth–Death Phylogenetic Time-Tree Priors for Bayesian Inference
2015-01-01
Here we introduce a general class of multiple calibration birth–death tree priors for use in Bayesian phylogenetic inference. All tree priors in this class separate ancestral node heights into a set of “calibrated nodes” and “uncalibrated nodes” such that the marginal distribution of the calibrated nodes is user-specified whereas the density ratio of the birth–death prior is retained for trees with equal values for the calibrated nodes. We describe two formulations, one in which the calibration information informs the prior on ranked tree topologies, through the (conditional) prior, and the other which factorizes the prior on divergence times and ranked topologies, thus allowing uniform, or any arbitrary prior distribution on ranked topologies. Although the first of these formulations has some attractive properties, the algorithm we present for computing its prior density is computationally intensive. However, the second formulation is always faster and computationally efficient for up to six calibrations. We demonstrate the utility of the new class of multiple-calibration tree priors using both small simulations and a real-world analysis and compare the results to existing schemes. The two new calibrated tree priors described in this article offer greater flexibility and control of prior specification in calibrated time-tree inference and divergence time dating, and will remove the need for indirect approaches to the assessment of the combined effect of calibration densities and tree priors in Bayesian phylogenetic inference. PMID:25398445
Höhna, Sebastian; Landis, Michael J.
2016-01-01
Programs for Bayesian inference of phylogeny currently implement a unique and ﬁxed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be speciﬁed interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-speciﬁcation language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous ﬂexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our ﬁeld. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com. [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.] PMID:27235697
Bayesian phylogenetics using an RNA substitution model applied to early mammalian evolution.
Jow, H; Hudelot, C; Rattray, M; Higgs, P G
2002-09-01
We study the phylogeny of the placental mammals using molecular data from all mitochondrial tRNAs and rRNAs of 54 species. We use probabilistic substitution models specific to evolution in base paired regions of RNA. A number of these models have been implemented in a new phylogenetic inference software package for carrying out maximum likelihood and Bayesian phylogenetic inferences. We describe our Bayesian phylogenetic method which uses a Markov chain Monte Carlo algorithm to provide samples from the posterior distribution of tree topologies. Our results show support for four primary mammalian clades, in agreement with recent studies of much larger data sets mainly comprising nuclear DNA. We discuss some issues arising when using Bayesian techniques on RNA sequence data.
Phylogenetic analysis of honey bee behavioral evolution.
Raffiudin, Rika; Crozier, Ross H
2007-05-01
DNA sequences from three mitochondrial (rrnL, cox2, nad2) and one nuclear gene (itpr) from all 9 known honey bee species (Apis), a 10th possible species, Apis dorsata binghami, and three outgroup species (Bombus terrestris, Melipona bicolor and Trigona fimbriata) were used to infer Apis phylogenetic relationships using Bayesian analysis. The dwarf honey bees were confirmed as basal, and the giant and cavity-nesting species to be monophyletic. All nodes were strongly supported except that grouping Apis cerana with A. nigrocincta. Two thousand post-burnin trees from the phylogenetic analysis were used in a Bayesian comparative analysis to explore the evolution of dance type, nest structure, comb structure and dance sound within Apis. The ancestral honey bee species was inferred with high support to have nested in the open, and to have more likely than not had a silent vertical waggle dance and a single comb. The common ancestor of the giant and cavity-dwelling bees is strongly inferred to have had a buzzing vertical directional dance. All pairwise combinations of characters showed strong association, but the multiple comparisons problem reduces the ability to infer associations between states between characters. Nevertheless, a buzzing dance is significantly associated with cavity-nesting, several vertical combs, and dancing vertically, a horizontal dance is significantly associated with a nest with a single comb wrapped around the support, and open nesting with a single pendant comb and a silent waggle dance.
Beier, B-A; Nylander, J A A; Chase, M W; Thulin, M
2004-10-01
Phylogenetic relationships within Fagonia were inferred from analyses of plastid trnL intron and nuclear ribosomal ITS DNA sequences. Sampling of the genus was nearly complete, including 32 of 34 species. Phylogenetic analysis was carried out using parsimony, and Bayesian model averaging. The latter method allows model-based inference while accounting for model-selection uncertainty, and is here used for the first time in phylogenetic analyses. All species of Fagonia in the Old World, except F. cretica, form a weakly supported clade, and all Fagonia species of the New World, except F. scoparia, are well supported as sister to the Old World clade. Fagonia scoparia, from Mexico, and F. cretica, from Northern Africa, are well supported as sisters to all other Fagonia species. Vicariance-dispersal analysis, using DIVA, indicated that the occurrences of Fagonia in South America and southern Africa are due to dispersals, and also, that the ancestor of Fagonia had a distribution compatible with the boreotropics hypothesis.
Brandley, Matthew C; Schmitz, Andreas; Reeder, Tod W
2005-06-01
Partitioned Bayesian analyses of approximately 2.2 kb of nucleotide sequence data (mtDNA) were used to elucidate phylogenetic relationships among 30 scincid lizard genera. Few partitioned Bayesian analyses exist in the literature, resulting in a lack of methods to determine the appropriate number of and identity of partitions. Thus, a criterion, based on the Bayes factor, for selecting among competing partitioning strategies is proposed and tested. Improvements in both mean -lnL and estimated posterior probabilities were observed when specific models and parameter estimates were assumed for partitions of the total data set. This result is expected given that the 95% credible intervals of model parameter estimates for numerous partitions do not overlap and it reveals that different data partitions may evolve quite differently. We further demonstrate that how one partitions the data (by gene, codon position, etc.) is shown to be a greater concern than simply the overall number of partitions. Using the criterion of the 2 ln Bayes factor > 10, the phylogenetic analysis employing the largest number of partitions was decisively better than all other strategies. Strategies that partitioned the ND1 gene by codon position performed better than other partition strategies, regardless of the overall number of partitions. Scincidae, Acontinae, Lygosominae, east Asian and North American "Eumeces" + Neoseps; North African Eumeces, Scincus, and Scincopus, and a large group primarily from sub-Saharan Africa, Madagascar, and neighboring islands are monophyletic. Feylinia, a limbless group of previously uncertain relationships, is nested within a "scincine" clade from sub-Saharan Africa. We reject the hypothesis that the nearly limbless dibamids are derived from within the Scincidae, but cannot reject the hypothesis that they represent the sister taxon to skinks. Amphiglossus, Chalcides, the acontines Acontias and Typhlosaurus, and Scincinae are paraphyletic. The globally widespread
Bayesian analysis of CCDM models
2017-09-01
2017-09-01
Creation of Cold Dark Matter (CCDM), in the context of Einstein Field Equations, produces a negative pressure term which can be used to explain the accelerated expansion of the Universe. In this work we tested six different spatially flat models for matter creation using statistical criteria, in light of SNe Ia data: Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) and Bayesian Evidence (BE). These criteria allow to compare models considering goodness of fit and number of free parameters, penalizing excess of complexity. We find that JO model is slightly favoured over LJO/ΛCDM model, however, neither of these, nor Γ = 3αH0 model can be discarded from the current analysis. Three other scenarios are discarded either because poor fitting or because of the excess of free parameters. A method of increasing Bayesian evidence through reparameterization in order to reducing parameter degeneracy is also developed.
Matthews, Luke J.; Tehrani, Jamie J.; Jordan, Fiona M.; Collard, Mark; Nunn, Charles L.
2011-01-01
Background Archaeologists and anthropologists have long recognized that different cultural complexes may have distinct descent histories, but they have lacked analytical techniques capable of easily identifying such incongruence. Here, we show how Bayesian phylogenetic analysis can be used to identify incongruent cultural histories. We employ the approach to investigate Iranian tribal textile traditions. Methods We used Bayes factor comparisons in a phylogenetic framework to test two models of cultural evolution: the hierarchically integrated system hypothesis and the multiple coherent units hypothesis. In the hierarchically integrated system hypothesis, a core tradition of characters evolves through descent with modification and characters peripheral to the core are exchanged among contemporaneous populations. In the multiple coherent units hypothesis, a core tradition does not exist. Rather, there are several cultural units consisting of sets of characters that have different histories of descent. Results For the Iranian textiles, the Bayesian phylogenetic analyses supported the multiple coherent units hypothesis over the hierarchically integrated system hypothesis. Our analyses suggest that pile-weave designs represent a distinct cultural unit that has a different phylogenetic history compared to other textile characters. Conclusions The results from the Iranian textiles are consistent with the available ethnographic evidence, which suggests that the commercial rug market has influenced pile-rug designs but not the techniques or designs incorporated in the other textiles produced by the tribes. We anticipate that Bayesian phylogenetic tests for inferring cultural units will be of great value for researchers interested in studying the evolution of cultural traits including language, behavior, and material culture. PMID:21559083
Accurate model selection of relaxed molecular clocks in bayesian phylogenetics.
Baele, Guy; Li, Wai Lok Sibon; Drummond, Alexei J; Suchard, Marc A; Lemey, Philippe
2013-02-01
Recent implementations of path sampling (PS) and stepping-stone sampling (SS) have been shown to outperform the harmonic mean estimator (HME) and a posterior simulation-based analog of Akaike's information criterion through Markov chain Monte Carlo (AICM), in bayesian model selection of demographic and molecular clock models. Almost simultaneously, a bayesian model averaging approach was developed that avoids conditioning on a single model but averages over a set of relaxed clock models. This approach returns estimates of the posterior probability of each clock model through which one can estimate the Bayes factor in favor of the maximum a posteriori (MAP) clock model; however, this Bayes factor estimate may suffer when the posterior probability of the MAP model approaches 1. Here, we compare these two recent developments with the HME, stabilized/smoothed HME (sHME), and AICM, using both synthetic and empirical data. Our comparison shows reassuringly that MAP identification and its Bayes factor provide similar performance to PS and SS and that these approaches considerably outperform HME, sHME, and AICM in selecting the correct underlying clock model. We also illustrate the importance of using proper priors on a large set of empirical data sets.
Bayesian analysis of rare events
Straub, Daniel; Papaioannou, Iason; Betz, Wolfgang
2016-06-01
In many areas of engineering and science there is an interest in predicting the probability of rare events, in particular in applications related to safety and security. Increasingly, such predictions are made through computer models of physical systems in an uncertainty quantification framework. Additionally, with advances in IT, monitoring and sensor technology, an increasing amount of data on the performance of the systems is collected. This data can be used to reduce uncertainty, improve the probability estimates and consequently enhance the management of rare events and associated risks. Bayesian analysis is the ideal method to include the data into the probabilistic model. It ensures a consistent probabilistic treatment of uncertainty, which is central in the prediction of rare events, where extrapolation from the domain of observation is common. We present a framework for performing Bayesian updating of rare event probabilities, termed BUS. It is based on a reinterpretation of the classical rejection-sampling approach to Bayesian analysis, which enables the use of established methods for estimating probabilities of rare events. By drawing upon these methods, the framework makes use of their computational efficiency. These methods include the First-Order Reliability Method (FORM), tailored importance sampling (IS) methods and Subset Simulation (SuS). In this contribution, we briefly review these methods in the context of the BUS framework and investigate their applicability to Bayesian analysis of rare events in different settings. We find that, for some applications, FORM can be highly efficient and is surprisingly accurate, enabling Bayesian analysis of rare events with just a few model evaluations. In a general setting, BUS implemented through IS and SuS is more robust and flexible.
Fair-balance paradox, star-tree paradox, and Bayesian phylogenetics.
Yang, Ziheng
2007-08-01
The star-tree paradox refers to the conjecture that the posterior probabilities for the three unrooted trees for four species (or the three rooted trees for three species if the molecular clock is assumed) do not approach 1/3 when the data are generated using the star tree and when the amount of data approaches infinity. It reflects the more general phenomenon of high and presumably spurious posterior probabilities for trees or clades produced by the Bayesian method of phylogenetic reconstruction, and it is perceived to be a manifestation of the deeper problem of the extreme sensitivity of Bayesian model selection to the prior on parameters. Analysis of the star-tree paradox has been hampered by the intractability of the integrals involved. In this article, I use Laplacian expansion to approximate the posterior probabilities for the three rooted trees for three species using binary characters evolving at a constant rate. The approximation enables calculation of posterior tree probabilities for arbitrarily large data sets. Both theoretical analysis of the analogous fair-coin and fair-balance problems and computer simulation for the tree problem confirmed the existence of the star-tree paradox. When the data size n --> infinity, the posterior tree probabilities do not converge to 1/3 each, but they vary among data sets according to a statistical distribution. This distribution is characterized. Two strategies for resolving the star-tree paradox are explored: (1) a nonzero prior probability for the degenerate star tree and (2) an increasingly informative prior forcing the internal branch length toward zero. Both appear to be effective in resolving the paradox, but the latter is simpler to implement. The posterior tree probabilities are found to be very sensitive to the prior.
Bayesian Model Averaging for Propensity Score Analysis
Kaplan, David; Chen, Jianshen
2013-01-01
The purpose of this study is to explore Bayesian model averaging in the propensity score context. Previous research on Bayesian propensity score analysis does not take into account model uncertainty. In this regard, an internally consistent Bayesian framework for model building and estimation must also account for model uncertainty. The…
Geometric ergodicity of a hybrid sampler for Bayesian inference of phylogenetic branch lengths.
Spade, David A; Herbei, Radu; Kubatko, Laura S
2015-10-01
One of the fundamental goals in phylogenetics is to make inferences about the evolutionary pattern among a group of individuals, such as genes or species, using present-day genetic material. This pattern is represented by a phylogenetic tree, and as computational methods have caught up to the statistical theory, Bayesian methods of making inferences about phylogenetic trees have become increasingly popular. Bayesian inference of phylogenetic trees requires sampling from intractable probability distributions. Common methods of sampling from these distributions include Markov chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) methods, and one way that both of these methods can proceed is by first simulating a tree topology and then taking a sample from the posterior distribution of the branch lengths given the tree topology and the data set. In many MCMC methods, it is difficult to verify that the underlying Markov chain is geometrically ergodic, and thus, it is necessary to rely on output-based convergence diagnostics in order to assess convergence on an ad hoc basis. These diagnostics suffer from several important limitations, so in an effort to circumvent these limitations, this work establishes geometric convergence for a particular Markov chain that is used to sample branch lengths under a fairly general class of nucleotide substitution models and provides a numerical method for estimating the time this Markov chain takes to converge.
A Bayesian approach to inferring the phylogenetic structure of communities from metagenomic data.
O'Brien, John D; Didelot, Xavier; Iqbal, Zamin; Amenga-Etego, Lucas; Ahiska, Bartu; Falush, Daniel
2014-07-01
Metagenomics provides a powerful new tool set for investigating evolutionary interactions with the environment. However, an absence of model-based statistical methods means that researchers are often not able to make full use of this complex information. We present a Bayesian method for inferring the phylogenetic relationship among related organisms found within metagenomic samples. Our approach exploits variation in the frequency of taxa among samples to simultaneously infer each lineage haplotype, the phylogenetic tree connecting them, and their frequency within each sample. Applications of the algorithm to simulated data show that our method can recover a substantial fraction of the phylogenetic structure even in the presence of high rates of migration among sample sites. We provide examples of the method applied to data from green sulfur bacteria recovered from an Antarctic lake, plastids from mixed Plasmodium falciparum infections, and virulent Neisseria meningitidis samples. Copyright © 2014 by the Genetics Society of America.
Phylogenetic analysis of adenovirus sequences.
Harrach, Balázs; Benko, Mária
2007-01-01
Members of the family Adenoviridae have been isolated from a large variety of hosts, including representatives from every major vertebrate class from fish to mammals. The high prevalence, together with the fairly conserved organization of the central part of their genomes, make the adenoviruses one of (if not the) best models for studying viral evolution on a larger time scale. Phylogenetic calculation can infer the evolutionary distance among adenovirus strains on serotype, species, and genus levels, thus helping the establishment of a correct taxonomy on the one hand, and speeding up the process of typing new isolates on the other. Initially, four major lineages corresponding to four genera were recognized. Later, the demarcation criteria of lower taxon levels, such as species or types, could also be defined with phylogenetic calculations. A limited number of possible host switches have been hypothesized and convincingly supported. Application of the web-based BLAST and MultAlin programs and the freely available PHYLIP package, along with the TreeView program, enables everyone to make correct calculations. In addition to step-by-step instruction on how to perform phylogenetic analysis, critical points where typical mistakes or misinterpretation of the results might occur will be identified and hints for their avoidance will be provided.
Estimating the Effective Sample Size of Tree Topologies from Bayesian Phylogenetic Analyses
Lanfear, Robert; Hua, Xia; Warren, Dan L.
2016-01-01
Bayesian phylogenetic analyses estimate posterior distributions of phylogenetic tree topologies and other parameters using Markov chain Monte Carlo (MCMC) methods. Before making inferences from these distributions, it is important to assess their adequacy. To this end, the effective sample size (ESS) estimates how many truly independent samples of a given parameter the output of the MCMC represents. The ESS of a parameter is frequently much lower than the number of samples taken from the MCMC because sequential samples from the chain can be non-independent due to autocorrelation. Typically, phylogeneticists use a rule of thumb that the ESS of all parameters should be greater than 200. However, we have no method to calculate an ESS of tree topology samples, despite the fact that the tree topology is often the parameter of primary interest and is almost always central to the estimation of other parameters. That is, we lack a method to determine whether we have adequately sampled one of the most important parameters in our analyses. In this study, we address this problem by developing methods to estimate the ESS for tree topologies. We combine these methods with two new diagnostic plots for assessing posterior samples of tree topologies, and compare their performance on simulated and empirical data sets. Combined, the methods we present provide new ways to assess the mixing and convergence of phylogenetic tree topologies in Bayesian MCMC analyses. PMID:27435794
Is probabilistic bias analysis approximately Bayesian?
MacLehose, Richard F.; Gustafson, Paul
2011-01-01
Case-control studies are particularly susceptible to differential exposure misclassification when exposure status is determined following incident case status. Probabilistic bias analysis methods have been developed as ways to adjust standard effect estimates based on the sensitivity and specificity of exposure misclassification. The iterative sampling method advocated in probabilistic bias analysis bears a distinct resemblance to a Bayesian adjustment; however, it is not identical. Furthermore, without a formal theoretical framework (Bayesian or frequentist), the results of a probabilistic bias analysis remain somewhat difficult to interpret. We describe, both theoretically and empirically, the extent to which probabilistic bias analysis can be viewed as approximately Bayesian. While the differences between probabilistic bias analysis and Bayesian approaches to misclassification can be substantial, these situations often involve unrealistic prior specifications and are relatively easy to detect. Outside of these special cases, probabilistic bias analysis and Bayesian approaches to exposure misclassification in case-control studies appear to perform equally well. PMID:22157311
RWTY (R We There Yet): An R package for examining convergence of Bayesian phylogenetic analyses.
Warren, Dan L; Geneva, Anthony J; Lanfear, Robert
2017-01-12
Bayesian inference using Markov chain Monte Carlo (MCMC) has become one of the primary methods used to infer phylogenies from sequence data. Assessing convergence is a crucial component of these analyses, as it establishes the reliability of the posterior distribution estimates of the tree topology and model parameters sampled from the MCMC. Numerous tests and visualizations have been developed for this purpose, but many of the most popular methods are implemented in ways that make them inconvenient to use for large data sets. RWTY is an R package that implements established and new methods for diagnosing phylogenetic MCMC convergence in a single convenient interface.
ERIC Educational Resources Information Center
Stanfield, William D.; Carlton, Matthew A.
2004-01-01
The use of Bayes' formula is applied to the biological problem of pedigree analysis to show that the Bayes' formula and non-Bayesian or "classical" methods of probability calculation give different answers. First year college students of biology can be introduced to the Bayesian statistics.
Bayesian Statistics for Biological Data: Pedigree Analysis
Stanfield, William D.; Carlton, Matthew A.
2004-01-01
The use of Bayes' formula is applied to the biological problem of pedigree analysis to show that the Bayes' formula and non-Bayesian or "classical" methods of probability calculation give different answers. First year college students of biology can be introduced to the Bayesian statistics.
bmcmc: MCMC package for Bayesian data analysis
Sharma, Sanjib
2017-09-01
bmcmc is a general purpose Markov Chain Monte Carlo package for Bayesian data analysis. It uses an adaptive scheme for automatic tuning of proposal distributions. It can also handle Bayesian hierarchical models by making use of the Metropolis-Within-Gibbs scheme.
Phylogenetic analysis of the spirochetes.
Paster, B J; Dewhirst, F E; Weisburg, W G; Tordoff, L A; Fraser, G J; Hespell, R B; Stanton, T B; Zablen, L; Mandelco, L; Woese, C R
1991-10-01
The 16S rRNA sequences were determined for species of Spirochaeta, Treponema, Borrelia, Leptospira, Leptonema, and Serpula, using a modified Sanger method of direct RNA sequencing. Analysis of aligned 16S rRNA sequences indicated that the spirochetes form a coherent taxon composed of six major clusters or groups. The first group, termed the treponemes, was divided into two subgroups. The first treponeme subgroup consisted of Treponema pallidum, Treponema phagedenis, Treponema denticola, a thermophilic spirochete strain, and two species of Spirochaeta, Spirochaeta zuelzerae and Spirochaeta stenostrepta, with an average interspecies similarity of 89.9%. The second treponeme subgroup contained Treponema bryantii, Treponema pectinovorum, Treponema saccharophilum, Treponema succinifaciens, and rumen strain CA, with an average interspecies similarity of 86.2%. The average interspecies similarity between the two treponeme subgroups was 84.2%. The division of the treponemes into two subgroups was verified by single-base signature analysis. The second spirochete group contained Spirochaeta aurantia, Spirochaeta halophila, Spirochaeta bajacaliforniensis, Spirochaeta litoralis, and Spirochaeta isovalerica, with an average similarity of 87.4%. The Spirochaeta group was related to the treponeme group, with an average similarity of 81.9%. The third spirochete group contained borrelias, including Borrelia burgdorferi, Borrelia anserina, Borrelia hermsii, and a rabbit tick strain. The borrelias formed a tight phylogenetic cluster, with average similarity of 97%. THe borrelia group shared a common branch with the Spirochaeta group and was closer to this group than to the treponemes. A single spirochete strain isolated fromt the shew constituted the fourth group. The fifth group was composed of strains of Serpula (Treponema) hyodysenteriae and Serpula (Treponema) innocens. The two species of this group were closely related, with a similarity of greater than 99%. Leptonema illini
Bayesian phylogeny analysis via stochastic approximation Monte Carlo.
Cheon, Sooyoung; Liang, Faming
2009-11-01
Monte Carlo methods have received much attention in the recent literature of phylogeny analysis. However, the conventional Markov chain Monte Carlo algorithms, such as the Metropolis-Hastings algorithm, tend to get trapped in a local mode in simulating from the posterior distribution of phylogenetic trees, rendering the inference ineffective. In this paper, we apply an advanced Monte Carlo algorithm, the stochastic approximation Monte Carlo algorithm, to Bayesian phylogeny analysis. Our method is compared with two popular Bayesian phylogeny software, BAMBE and MrBayes, on simulated and real datasets. The numerical results indicate that our method outperforms BAMBE and MrBayes. Among the three methods, SAMC produces the consensus trees which have the highest similarity to the true trees, and the model parameter estimates which have the smallest mean square errors, but costs the least CPU time.
Molecular phylogenetic analysis of the Papionina using concatenation and species tree methods.
Guevara, Elaine E; Steiper, Michael E
2014-01-01
The Papionina is a geographically widespread subtribe of African cercopithecid monkeys whose evolutionary history is of particular interest to anthropologists. The phylogenetic relationships among arboreal mangabeys (Lophocebus), baboons (Papio), and geladas (Theropithecus) remain unresolved. Molecular phylogenetic analyses have revealed marked gene tree incongruence for these taxa, and several recent concatenated phylogenetic analyses of multilocus datasets have supported different phylogenetic hypotheses. To address this issue, we investigated the phylogeny of the Lophocebus + Papio + Theropithecus group using concatenation methods, as well as alternative methods that incorporate gene tree heterogeneity to estimate a 'species tree.' Our compiled DNA sequence dataset was ∼56 kb pairs long and included 57 independent partitions. All analyses of concatenated alignments strongly supported a Lophocebus + Papio clade and a basal position for Theropithecus. The Bayesian concordance analysis supported the same phylogeny. A coalescent-based Bayesian method resulted in a very poorly resolved species tree. The topological agreement between concatenation and the Bayesian concordance analysis offers considerable support for a Lophocebus + Papio clade as the dominant relationship across the genome. However, the results of the Bayesian concordance analysis indicate that almost half the genome has an alternative history. As such, our results offer a well-supported phylogenetic hypothesis for the Papio/Lophocebus/Theropithecus trichotomy, while at the same time providing evidence for a complex evolutionary history that likely includes hybridization among lineages.
Bayesian robust principal component analysis.
Ding, Xinghao; He, Lihan; Carin, Lawrence
2011-12-01
A hierarchical Bayesian model is considered for decomposing a matrix into low-rank and sparse components, assuming the observed matrix is a superposition of the two. The matrix is assumed noisy, with unknown and possibly non-stationary noise statistics. The Bayesian framework infers an approximate representation for the noise statistics while simultaneously inferring the low-rank and sparse-outlier contributions; the model is robust to a broad range of noise levels, without having to change model hyperparameter settings. In addition, the Bayesian framework allows exploitation of additional structure in the matrix. For example, in video applications each row (or column) corresponds to a video frame, and we introduce a Markov dependency between consecutive rows in the matrix (corresponding to consecutive frames in the video). The properties of this Markov process are also inferred based on the observed matrix, while simultaneously denoising and recovering the low-rank and sparse components. We compare the Bayesian model to a state-of-the-art optimization-based implementation of robust PCA; considering several examples, we demonstrate competitive performance of the proposed model.
Domino effect analysis using Bayesian networks.
Khakzad, Nima; Khan, Faisal; Amyotte, Paul; Cozzani, Valerio
2013-02-01
A new methodology is introduced based on Bayesian network both to model domino effect propagation patterns and to estimate the domino effect probability at different levels. The flexible structure and the unique modeling techniques offered by Bayesian network make it possible to analyze domino effects through a probabilistic framework, considering synergistic effects, noisy probabilities, and common cause failures. Further, the uncertainties and the complex interactions among the domino effect components are captured using Bayesian network. The probabilities of events are updated in the light of new information, and the most probable path of the domino effect is determined on the basis of the new data gathered. This study shows how probability updating helps to update the domino effect model either qualitatively or quantitatively. The methodology is applied to a hypothetical example and also to an earlier-studied case study. These examples accentuate the effectiveness of Bayesian network in modeling domino effects in processing facility. © 2012 Society for Risk Analysis.
Bayesian analysis for kaon photoproduction
Marsainy, T. Mart, T.
2014-09-25
We have investigated contribution of the nucleon resonances in the kaon photoproduction process by using an established statistical decision making method, i.e. the Bayesian method. This method does not only evaluate the model over its entire parameter space, but also takes the prior information and experimental data into account. The result indicates that certain resonances have larger probabilities to contribute to the process.
A Gentle Introduction to Bayesian Analysis: Applications to Developmental Research
ERIC Educational Resources Information Center
van de Schoot, Rens; Kaplan, David; Denissen, Jaap; Asendorpf, Jens B.; Neyer, Franz J.; van Aken, Marcel A. G.
2014-01-01
Bayesian statistical methods are becoming ever more popular in applied and fundamental research. In this study a gentle introduction to Bayesian analysis is provided. It is shown under what circumstances it is attractive to use Bayesian estimation, and how to interpret properly the results. First, the ingredients underlying Bayesian methods are…
A Gentle Introduction to Bayesian Analysis: Applications to Developmental Research
van de Schoot, Rens; Kaplan, David; Denissen, Jaap; Asendorpf, Jens B.; Neyer, Franz J.; van Aken, Marcel A. G.
2014-01-01
Bayesian statistical methods are becoming ever more popular in applied and fundamental research. In this study a gentle introduction to Bayesian analysis is provided. It is shown under what circumstances it is attractive to use Bayesian estimation, and how to interpret properly the results. First, the ingredients underlying Bayesian methods are…
Bayesian Meta-Analysis of Coefficient Alpha
Brannick, Michael T.; Zhang, Nanhua
2013-01-01
The current paper describes and illustrates a Bayesian approach to the meta-analysis of coefficient alpha. Alpha is the most commonly used estimate of the reliability or consistency (freedom from measurement error) for educational and psychological measures. The conventional approach to meta-analysis uses inverse variance weights to combine…
An Integrated Bayesian Model for DIF Analysis
ERIC Educational Resources Information Center
Soares, Tufi M.; Goncalves, Flavio B.; Gamerman, Dani
2009-01-01
In this article, an integrated Bayesian model for differential item functioning (DIF) analysis is proposed. The model is integrated in the sense of modeling the responses along with the DIF analysis. This approach allows DIF detection and explanation in a simultaneous setup. Previous empirical studies and/or subjective beliefs about the item…
ERIC Educational Resources Information Center
Brannick, Michael T.; Zhang, Nanhua
2013-01-01
The current paper describes and illustrates a Bayesian approach to the meta-analysis of coefficient alpha. Alpha is the most commonly used estimate of the reliability or consistency (freedom from measurement error) for educational and psychological measures. The conventional approach to meta-analysis uses inverse variance weights to combine…
Gowri-Shankar, Vivek; Rattray, Magnus
2007-06-01
Nonhomogeneous substitution models have been introduced for phylogenetic inference when the substitution process is nonstationary, for example, when sequence composition differs between lineages. Existing models can have many parameters, and it is then difficult and computationally expensive to learn the parameters and to select the optimal model complexity. We extend an existing nonhomogeneous substitution model by introducing a reversible jump Markov chain Monte Carlo method for efficient Bayesian inference of the model order along with other phylogenetic parameters of interest. We also introduce a new hierarchical prior which leads to more reasonable results when only a small number of lineages share a particular substitution process. The method is implemented in the PHASE software, which includes specialized substitution models for RNA genes with conserved secondary structure. We apply an RNA-specific nonhomogeneous model to a structure-based alignment of rRNA sequences spanning the entire tree of life. A previous study of the same genes from a similar set of species found robust evidence for a mesophilic last universal common ancestor (LUCA) by inference of the G+C composition at the root of the tree. In the present study, we find that the helical GC composition at the root is strongly dependent on the root position. With a bacterial rooting, we find that there is no longer strong support for either a mesophile or a thermophile LUCA, although a hyperthermophile LUCA remains unlikely. We discuss reasons why results using only RNA helices may differ from results using all aligned sites when applying nonhomogeneous models to RNA genes.
Bayesian phylogenetic model selection using reversible jump Markov chain Monte Carlo.
Huelsenbeck, John P; Larget, Bret; Alfaro, Michael E
2004-06-01
A common problem in molecular phylogenetics is choosing a model of DNA substitution that does a good job of explaining the DNA sequence alignment without introducing superfluous parameters. A number of methods have been used to choose among a small set of candidate substitution models, such as the likelihood ratio test, the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC), and Bayes factors. Current implementations of any of these criteria suffer from the limitation that only a small set of models are examined, or that the test does not allow easy comparison of non-nested models. In this article, we expand the pool of candidate substitution models to include all possible time-reversible models. This set includes seven models that have already been described. We show how Bayes factors can be calculated for these models using reversible jump Markov chain Monte Carlo, and apply the method to 16 DNA sequence alignments. For each data set, we compare the model with the best Bayes factor to the best models chosen using AIC and BIC. We find that the best model under any of these criteria is not necessarily the most complicated one; models with an intermediate number of substitution types typically do best. Moreover, almost all of the models that are chosen as best do not constrain a transition rate to be the same as a transversion rate, suggesting that it is the transition/transversion rate bias that plays the largest role in determining which models are selected. Importantly, the reversible jump Markov chain Monte Carlo algorithm described here allows estimation of phylogeny (and other phylogenetic model parameters) to be performed while accounting for uncertainty in the model of DNA substitution.
Mesoamerican tree squirrels evolution (Rodentia: Sciuridae): a molecular phylogenetic analysis.
Villalobos, Federico; Gutierrez-Espeleta, Gustavo
2014-06-01
The tribe Sciurini comprehends the genera Sciurus, Syntheosiurus, Microsciurus, Tamiasciurus and Rheinthrosciurus. The phylogenetic relationships within Sciurus have been only partially done, and the relationship between Mesoamerican species remains unsolved. The phylogenetic relationships of the Mesoamerican tree squirrels were examined using molecular data. Sequence data publicly available (12S, 16S, CYTB mitochondrial genes and IRBP nuclear gene) and cytochrome B gene sequences of four previously not sampled Mesoamerican Sciurus species were analyzed under a Bayesian multispecies coalescence model. Phylogenetic analysis of the multilocus data set showed the neotropical tree squirrels as a monophyletic clade. The genus Sciurus was paraphyletic due to the inclusion of Microsciurus species (M. alfari and M. flaviventer). The South American species S. aestuans and S. stramineus showed a sister taxa relationship. Single locus analysis based on the most compact and complete data set (i.e. CYTB gene sequences), supported the monophyly of the South American species and recovered a Mesoamerican clade including S. aureogaster, S. granatensis and S. variegatoides. These results corroborated previous findings based on cladistic analysis of cranial and post-cranial characters. Our data support a close relationship between Mesoamerican Sciurus species and a sister relationship with South American species, and corroborates previous findings in relation to the polyphyly of Microsciurus and Syntheosciurus paraphyly.
Bayesian Analysis of Savings from Retrofit Projects
Im, Piljae
2012-01-01
Estimates of savings from retrofit projects depend on statistical models, but because of the complicated analysis required to determine the uncertainty of the estimates, savings uncertainty is not often considered. Numerous simplified methods have been proposed to determine savings uncertainty, but in all but the simplest cases, these methods provide approximate results only. The objective of this paper is to show that Bayesian inference provides a consistent framework for estimating savings and savings uncertainty in retrofit projects. We review the mathematical background of Bayesian inference and Bayesian regression, and present two examples of estimating savings and savings uncertainty in retrofit projects. The first is a simple case where both baseline and post-retrofit monthly natural gas use can be modeled as a linear function of monthly heating degree days. The Efficiency Valuation Organization (EVO 2007) defines two methods of determining savings in such cases: reporting period savings, which is an estimate of the savings during the post-retrofit period; and normalized savings, which is an estimate of the savings that would be obtained during a typical year at the project site. For reporting period savings, classical statistical analysis provides exact analytic results for both savings and savings uncertainty in this case. We use Bayesian analysis to calculate reporting period savings and savings uncertainty and show that the results are identical to the analytical results. For normalized savings, the literature contains no exact expression for the uncertainty of normalized savings; we use Bayesian inference to calculate this quantity for the first time, and compare it with the result of an approximate formula that has been proposed. The second example concerns a problem where the baseline data exhibit nonlinearity and serial autocorrelation, both of which are common in real-world retrofit projects. No analytical solutions exist to determine savings or
Model averaging and Bayes factor calculation of relaxed molecular clocks in Bayesian phylogenetics.
Li, Wai Lok Sibon; Drummond, Alexei J
2012-02-01
We describe a procedure for model averaging of relaxed molecular clock models in Bayesian phylogenetics. Our approach allows us to model the distribution of rates of substitution across branches, averaged over a set of models, rather than conditioned on a single model. We implement this procedure and test it on simulated data to show that our method can accurately recover the true underlying distribution of rates. We applied the method to a set of alignments taken from a data set of 12 mammalian species and uncovered evidence that lognormally distributed rates better describe this data set than do exponentially distributed rates. Additionally, our implementation of model averaging permits accurate calculation of the Bayes factor(s) between two or more relaxed molecular clock models. Finally, we introduce a new computational approach for sampling rates of substitution across branches that improves the convergence of our Markov chain Monte Carlo algorithms in this context. Our methods are implemented under the BEAST 1.6 software package, available at http://beast-mcmc.googlecode.com.
Model Averaging and Bayes Factor Calculation of Relaxed Molecular Clocks in Bayesian Phylogenetics
Li, Wai Lok Sibon; Drummond, Alexei J.
2012-01-01
We describe a procedure for model averaging of relaxed molecular clock models in Bayesian phylogenetics. Our approach allows us to model the distribution of rates of substitution across branches, averaged over a set of models, rather than conditioned on a single model. We implement this procedure and test it on simulated data to show that our method can accurately recover the true underlying distribution of rates. We applied the method to a set of alignments taken from a data set of 12 mammalian species and uncovered evidence that lognormally distributed rates better describe this data set than do exponentially distributed rates. Additionally, our implementation of model averaging permits accurate calculation of the Bayes factor(s) between two or more relaxed molecular clock models. Finally, we introduce a new computational approach for sampling rates of substitution across branches that improves the convergence of our Markov chain Monte Carlo algorithms in this context. Our methods are implemented under the BEAST 1.6 software package, available at http://beast-mcmc.googlecode.com. PMID:21940644
The importance of data partitioning and the utility of Bayes factors in Bayesian phylogenetics.
Brown, Jeremy M; Lemmon, Alan R
2007-08-01
As larger, more complex data sets are being used to infer phylogenies, accuracy of these phylogenies increasingly requires models of evolution that accommodate heterogeneity in the processes of molecular evolution. We investigated the effect of improper data partitioning on phylogenetic accuracy, as well as the type I error rate and sensitivity of Bayes factors, a commonly used method for choosing among different partitioning strategies in Bayesian analyses. We also used Bayes factors to test empirical data for the need to divide data in a manner that has no expected biological meaning. Posterior probability estimates are misleading when an incorrect partitioning strategy is assumed. The error was greatest when the assumed model was underpartitioned. These results suggest that model partitioning is important for large data sets. Bayes factors performed well, giving a 5% type I error rate, which is remarkably consistent with standard frequentist hypothesis tests. The sensitivity of Bayes factors was found to be quite high when the across-class model heterogeneity reflected that of empirical data. These results suggest that Bayes factors represent a robust method of choosing among partitioning strategies. Lastly, results of tests for the inclusion of unexpected divisions in empirical data mirrored the simulation results, although the outcome of such tests is highly dependent on accounting for rate variation among classes. We conclude by discussing other approaches for partitioning data, as well as other applications of Bayes factors.
Heterogeneous Factor Analysis Models: A Bayesian Approach.
Ansari, Asim; Jedidi, Kamel; Dube, Laurette
2002-01-01
Developed Markov Chain Monte Carlo procedures to perform Bayesian inference, model checking, and model comparison in heterogeneous factor analysis. Tested the approach with synthetic data and data from a consumption emotion study involving 54 consumers. Results show that traditional psychometric methods cannot fully capture the heterogeneity in…
A Bayesian Approach for Multigroup Nonlinear Factor Analysis.
Song, Xin-Yuan; Lee, Sik-Yum
2002-01-01
Developed a Bayesian approach for a general multigroup nonlinear factor analysis model that simultaneously obtains joint Bayesian estimates of the factor scores and the structural parameters subjected to some constraints across different groups. (SLD)
Phylogenetic analysis of Maverick/Polinton giant transposons across organisms.
Haapa-Paananen, Saija; Wahlberg, Niklas; Savilahti, Harri
2014-09-01
2011-06-01
Bayesian Correlation Analysis for Sequence Count Data.
Sánchez-Taltavull, Daniel; Ramachandran, Parameswaran; Lau, Nelson; Perkins, Theodore J
2016-01-01
Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities' measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low-especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities' signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset.
Bayesian Correlation Analysis for Sequence Count Data
Lau, Nelson; Perkins, Theodore J.
2016-01-01
Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities’ measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low—especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities’ signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset. PMID:27701449
McGuire, Jimmy A; Witt, Christopher C; Altshuler, Douglas L; Remsen, J V
2007-10-01
Hummingbirds are an important model system in avian biology, but to date the group has been the subject of remarkably few phylogenetic investigations. Here we present partitioned Bayesian and maximum likelihood phylogenetic analyses for 151 of approximately 330 species of hummingbirds and 12 outgroup taxa based on two protein-coding mitochondrial genes (ND2 and ND4), flanking tRNAs, and two nuclear introns (AK1 and BFib). We analyzed these data under several partitioning strategies ranging between unpartitioned and a maximum of nine partitions. In order to select a statistically justified partitioning strategy following partitioned Bayesian analysis, we considered four alternative criteria including Bayes factors, modified versions of the Akaike information criterion for small sample sizes (AIC(c)), Bayesian information criterion (BIC), and a decision-theoretic methodology (DT). Following partitioned maximum likelihood analyses, we selected a best-fitting strategy using hierarchical likelihood ratio tests (hLRTS), the conventional AICc, BIC, and DT, concluding that the most stringent criterion, the performance-based DT, was the most appropriate methodology for selecting amongst partitioning strategies. In the context of our well-resolved and well-supported phylogenetic estimate, we consider the historical biogeography of hummingbirds using ancestral state reconstructions of (1) primary geographic region of occurrence (i.e., South America, Central America, North America, Greater Antilles, Lesser Antilles), (2) Andean or non-Andean geographic distribution, and (3) minimum elevational occurrence. These analyses indicate that the basal hummingbird assemblages originated in the lowlands of South America, that most of the principle clades of hummingbirds (all but Mountain Gems and possibly Bees) originated on this continent, and that there have been many (at least 30) independent invasions of other primary landmasses, especially Central America.
[Analysis phylogenetic relationship of Gynostemma (Cucurbitaceae)].
Qin, Shuang-shuang; Li, Hai-tao; Wang, Zhou-yong; Cui, Zhan-hu; Yu, Li-ying
2015-05-01
The sequences of ITS, matK, rbcL and psbA-trnH of 9 Gynostemma species or variety including 38 samples were compared and analyzed by molecular phylogeny method. Hemsleya macrosperma was designated as outgroup. The MP and NJ phylogenetic tree of Gynostemma was built based on ITS sequence, the results of PAUP phylogenetic analysis showed the following results: (1) The eight individuals of G. pentaphyllum var. pentaphyllum were not supported as monophyletic in the strict consensus trees and NJ trees. (2) It is suspected whether G. longipes and G. laxum should be classified as the independent species. (3)The classification of subgenus units of Gynostemma plants is supported.
A Comprehensive Phylogenetic Analysis of Deadenylases
Pavlopoulou, Athanasia; Vlachakis, Dimitrios; Balatsos, Nikolaos A.A.; Kossida, Sophia
2013-01-01
Deadenylases catalyze the shortening of the poly(A) tail at the messenger ribonucleic acid (mRNA) 3′-end in eukaryotes. Therefore, these enzymes influence mRNA decay, and constitute a major emerging group of promising anti-cancer pharmacological targets. Herein, we conducted full phylogenetic analyses of the deadenylase homologs in all available genomes in an effort to investigate evolutionary relationships between the deadenylase families and to identify invariant residues, which probably play key roles in the function of deadenylation across species. Our study includes both major Asp-Glu-Asp-Asp (DEDD) and exonuclease-endonuclease-phospatase (EEP) deadenylase superfamilies. The phylogenetic analysis has provided us with important information regarding conserved and invariant deadenylase amino acids across species. Knowledge of the phylogenetic properties and evolution of the domain of deadenylases provides the foundation for the targeted drug design in the pharmaceutical industry and modern exonuclease anti-cancer scientific research. PMID:24348009
A SAS Interface for Bayesian Analysis with WinBUGS
ERIC Educational Resources Information Center
Zhang, Zhiyong; McArdle, John J.; Wang, Lijuan; Hamagami, Fumiaki
2008-01-01
Bayesian methods are becoming very popular despite some practical difficulties in implementation. To assist in the practical application of Bayesian methods, we show how to implement Bayesian analysis with WinBUGS as part of a standard set of SAS routines. This implementation procedure is first illustrated by fitting a multiple regression model…
A SAS Interface for Bayesian Analysis with WinBUGS
ERIC Educational Resources Information Center
Zhang, Zhiyong; McArdle, John J.; Wang, Lijuan; Hamagami, Fumiaki
2008-01-01
Bayesian methods are becoming very popular despite some practical difficulties in implementation. To assist in the practical application of Bayesian methods, we show how to implement Bayesian analysis with WinBUGS as part of a standard set of SAS routines. This implementation procedure is first illustrated by fitting a multiple regression model…
Integrative bayesian network analysis of genomic data.
Ni, Yang; Stingo, Francesco C; Baladandayuthapani, Veerabhadran
2014-01-01
Rapid development of genome-wide profiling technologies has made it possible to conduct integrative analysis on genomic data from multiple platforms. In this study, we develop a novel integrative Bayesian network approach to investigate the relationships between genetic and epigenetic alterations as well as how these mutations affect a patient's clinical outcome. We take a Bayesian network approach that admits a convenient decomposition of the joint distribution into local distributions. Exploiting the prior biological knowledge about regulatory mechanisms, we model each local distribution as linear regressions. This allows us to analyze multi-platform genome-wide data in a computationally efficient manner. We illustrate the performance of our approach through simulation studies. Our methods are motivated by and applied to a multi-platform glioblastoma dataset, from which we reveal several biologically relevant relationships that have been validated in the literature as well as new genes that could potentially be novel biomarkers for cancer progression.
Posada, David; Buckley, Thomas R
2004-10-01
Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects of the selection of substitution models in phylogenetics from a theoretical, philosophical and practical point of view, and summarize this comparison in table format. We argue that the most commonly implemented model selection approach, the hierarchical likelihood ratio test, is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages. In particular, the latter two methods are able to simultaneously compare multiple nested or nonnested models, assess model selection uncertainty, and allow for the estimation of phylogenies and model parameters using all available models (model-averaged inference or multimodel inference). We also describe how the relative importance of the different parameters included in substitution models can be depicted. To illustrate some of these points, we have applied AIC-based model averaging to 37 mitochondrial DNA sequences from the subgenus Ohomopterus(genus Carabus) ground beetles described by Sota and Vogler (2001).
On the analysis of phylogenetically paired designs
Funk, Jennifer L; Rakovski, Cyril S; Macpherson, J Michael
2015-01-01
As phylogenetically controlled experimental designs become increasingly common in ecology, the need arises for a standardized statistical treatment of these datasets. Phylogenetically paired designs circumvent the need for resolved phylogenies and have been used to compare species groups, particularly in the areas of invasion biology and adaptation. Despite the widespread use of this approach, the statistical analysis of paired designs has not been critically evaluated. We propose a mixed model approach that includes random effects for pair and species. These random effects introduce a “two-layer” compound symmetry variance structure that captures both the correlations between observations on related species within a pair as well as the correlations between the repeated measurements within species. We conducted a simulation study to assess the effect of model misspecification on Type I and II error rates. We also provide an illustrative example with data containing taxonomically similar species and several outcome variables of interest. We found that a mixed model with species and pair as random effects performed better in these phylogenetically explicit simulations than two commonly used reference models (no or single random effect) by optimizing Type I error rates and power. The proposed mixed model produces acceptable Type I and II error rates despite the absence of a phylogenetic tree. This design can be generalized to a variety of datasets to analyze repeated measurements in clusters of related subjects/species. PMID:25750719
Bayesian Analysis of Individual Level Personality Dynamics.
Cripps, Edward; Wood, Robert E; Beckmann, Nadin; Lau, John; Beckmann, Jens F; Cripps, Sally Ann
2016-01-01
A Bayesian technique with analyses of within-person processes at the level of the individual is presented. The approach is used to examine whether the patterns of within-person responses on a 12-trial simulation task are consistent with the predictions of ITA theory (Dweck, 1999). ITA theory states that the performance of an individual with an entity theory of ability is more likely to spiral down following a failure experience than the performance of an individual with an incremental theory of ability. This is because entity theorists interpret failure experiences as evidence of a lack of ability which they believe is largely innate and therefore relatively fixed; whilst incremental theorists believe in the malleability of abilities and interpret failure experiences as evidence of more controllable factors such as poor strategy or lack of effort. The results of our analyses support ITA theory at both the within- and between-person levels of analyses and demonstrate the benefits of Bayesian techniques for the analysis of within-person processes. These include more formal specification of the theory and the ability to draw inferences about each individual, which allows for more nuanced interpretations of individuals within a personality category, such as differences in the individual probabilities of spiraling. While Bayesian techniques have many potential advantages for the analyses of processes at the level of the individual, ease of use is not one of them for psychologists trained in traditional frequentist statistical techniques.
Bayesian Analysis of Individual Level Personality Dynamics
2016-01-01
A Bayesian technique with analyses of within-person processes at the level of the individual is presented. The approach is used to examine whether the patterns of within-person responses on a 12-trial simulation task are consistent with the predictions of ITA theory (Dweck, 1999). ITA theory states that the performance of an individual with an entity theory of ability is more likely to spiral down following a failure experience than the performance of an individual with an incremental theory of ability. This is because entity theorists interpret failure experiences as evidence of a lack of ability which they believe is largely innate and therefore relatively fixed; whilst incremental theorists believe in the malleability of abilities and interpret failure experiences as evidence of more controllable factors such as poor strategy or lack of effort. The results of our analyses support ITA theory at both the within- and between-person levels of analyses and demonstrate the benefits of Bayesian techniques for the analysis of within-person processes. These include more formal specification of the theory and the ability to draw inferences about each individual, which allows for more nuanced interpretations of individuals within a personality category, such as differences in the individual probabilities of spiraling. While Bayesian techniques have many potential advantages for the analyses of processes at the level of the individual, ease of use is not one of them for psychologists trained in traditional frequentist statistical techniques. PMID:27486415
Bayesian model selection analysis of WMAP3
Parkinson, David; Mukherjee, Pia; Liddle, Andrew R.
2006-06-15
We present a Bayesian model selection analysis of WMAP3 data using our code CosmoNest. We focus on the density perturbation spectral index n{sub S} and the tensor-to-scalar ratio r, which define the plane of slow-roll inflationary models. We find that while the Bayesian evidence supports the conclusion that n{sub S}{ne}1, the data are not yet powerful enough to do so at a strong or decisive level. If tensors are assumed absent, the current odds are approximately 8 to 1 in favor of n{sub S}{ne}1 under our assumptions, when WMAP3 data is used together with external data sets. WMAP3 data on its own is unable to distinguish between the two models. Further, inclusion of r as a parameter weakens the conclusion against the Harrison-Zel'dovich case (n{sub S}=1, r=0), albeit in a prior-dependent way. In appendices we describe the CosmoNest code in detail, noting its ability to supply posterior samples as well as to accurately compute the Bayesian evidence. We make a first public release of CosmoNest, now available at www.cosmonest.org.
Bayesian analysis of factorial designs.
Rouder, Jeffrey N; Morey, Richard D; Verhagen, Josine; Swagman, April R; Wagenmakers, Eric-Jan
2017-06-01
This article provides a Bayes factor approach to multiway analysis of variance (ANOVA) that allows researchers to state graded evidence for effects or invariances as determined by the data. ANOVA is conceptualized as a hierarchical model where levels are clustered within factors. The development is comprehensive in that it includes Bayes factors for fixed and random effects and for within-subjects, between-subjects, and mixed designs. Different model construction and comparison strategies are discussed, and an example is provided. We show how Bayes factors may be computed with BayesFactor package in R and with the JASP statistical package. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
A phylogenetic analysis of the family Dermatophilaceae.
Stackebrandt, E; Kroppenstedt, R M; Fowler, V J
1983-06-01
The comparative analysis of the 16S ribosomal ribonucleic acid (rRNA) of Geodermatophilus obscurus DSM 43060 and Dermatophilus congolensis DSM 43037 revealed that these members of the family Dermatophilaceae were only remotely related. While G. obscurus represented an individual and separate line of descent within the phylogenetically defined order Actinomycetales, D. congolensis was closely related to representatives of Arthrobacter, Micrococcus, Cellulomonas, Brevibacterium, Promicromonospora and Microbacterium.
Klebsiella pneumoniae blaKPC-3 nosocomial epidemic: Bayesian and evolutionary analysis.
Angeletti, Silvia; Presti, Alessandra Lo; Cella, Eleonora; Fogolari, Marta; De Florio, Lucia; Dedej, Etleva; Blasi, Aletheia; Milano, Teresa; Pascarella, Stefano; Incalzi, Raffaele Antonelli; Coppola, Roberto; Dicuonzo, Giordano; Ciccozzi, Massimo
2016-12-01
K. pneumoniae isolates carrying blaKPC-3 gene were collected to perform Bayesian phylogenetic and selective pressure analysis and to apply homology modeling to the KPC-3 protein. A dataset of 44 blakpc-3 gene sequences from clinical isolates of K. pneumoniae was used for Bayesian phylogenetic, selective pressure analysis and homology modeling. The mean evolutionary rate for blakpc-3 gene was 2.67×10(-3) substitution/site/year (95% HPD: 3.4×10(-4-)5.59×10(-)(3)). The root of the Bayesian tree dated back to the year 2011 (95% HPD: 2007-2012). Two main clades (I and II) were identified. The population dynamics analysis showed an exponential growth from 2011 to 2013 and the reaching of a plateau. The phylogeographic reconstruction showed that the root of the tree had a probable common ancestor in the general surgery ward. Selective pressure analysis revealed twelve positively selected sites. Structural analysis of KPC-3 protein predicted that the amino acid mutations are destabilizing for the protein and could alter the substrate specificity. Phylogenetic analysis and homology modeling of blaKPC-3 gene could represent a useful tool to follow KPC spread in nosocomial setting and to evidence amino acid substitutions altering the substrate specificity.
Detecting Network Communities: An Application to Phylogenetic Analysis
Andrade, Roberto F. S.; Rocha-Neto, Ivan C.; Santos, Leonardo B. L.; de Santana, Charles N.; Diniz, Marcelo V. C.; Lobão, Thierry Petit; Goés-Neto, Aristóteles; Pinho, Suani T. R.; El-Hani, Charbel N.
2011-01-01
This paper proposes a new method to identify communities in generally weighted complex networks and apply it to phylogenetic analysis. In this case, weights correspond to the similarity indexes among protein sequences, which can be used for network construction so that the network structure can be analyzed to recover phylogenetically useful information from its properties. The analyses discussed here are mainly based on the modular character of protein similarity networks, explored through the Newman-Girvan algorithm, with the help of the neighborhood matrix . The most relevant networks are found when the network topology changes abruptly revealing distinct modules related to the sets of organisms to which the proteins belong. Sound biological information can be retrieved by the computational routines used in the network approach, without using biological assumptions other than those incorporated by BLAST. Usually, all the main bacterial phyla and, in some cases, also some bacterial classes corresponded totally (100%) or to a great extent (>70%) to the modules. We checked for internal consistency in the obtained results, and we scored close to 84% of matches for community pertinence when comparisons between the results were performed. To illustrate how to use the network-based method, we employed data for enzymes involved in the chitin metabolic pathway that are present in more than 100 organisms from an original data set containing 1,695 organisms, downloaded from GenBank on May 19, 2007. A preliminary comparison between the outcomes of the network-based method and the results of methods based on Bayesian, distance, likelihood, and parsimony criteria suggests that the former is as reliable as these commonly used methods. We conclude that the network-based method can be used as a powerful tool for retrieving modularity information from weighted networks, which is useful for phylogenetic analysis. PMID:21573202
A Gentle Introduction to Bayesian Analysis: Applications to Developmental Research
van de Schoot, Rens; Kaplan, David; Denissen, Jaap; Asendorpf, Jens B; Neyer, Franz J; van Aken, Marcel AG
2014-01-01
Bayesian statistical methods are becoming ever more popular in applied and fundamental research. In this study a gentle introduction to Bayesian analysis is provided. It is shown under what circumstances it is attractive to use Bayesian estimation, and how to interpret properly the results. First, the ingredients underlying Bayesian methods are introduced using a simplified example. Thereafter, the advantages and pitfalls of the specification of prior knowledge are discussed. To illustrate Bayesian methods explained in this study, in a second example a series of studies that examine the theoretical framework of dynamic interactionism are considered. In the Discussion the advantages and disadvantages of using Bayesian statistics are reviewed, and guidelines on how to report on Bayesian statistics are provided. PMID:24116396
A Bayesian nonparametric meta-analysis model.
Karabatsos, George; Talbott, Elizabeth; Walker, Stephen G
2015-03-01
In a meta-analysis, it is important to specify a model that adequately describes the effect-size distribution of the underlying population of studies. The conventional normal fixed-effect and normal random-effects models assume a normal effect-size population distribution, conditionally on parameters and covariates. For estimating the mean overall effect size, such models may be adequate, but for prediction, they surely are not if the effect-size distribution exhibits non-normal behavior. To address this issue, we propose a Bayesian nonparametric meta-analysis model, which can describe a wider range of effect-size distributions, including unimodal symmetric distributions, as well as skewed and more multimodal distributions. We demonstrate our model through the analysis of real meta-analytic data arising from behavioral-genetic research. We compare the predictive performance of the Bayesian nonparametric model against various conventional and more modern normal fixed-effects and random-effects models. Copyright © 2014 John Wiley & Sons, Ltd.
Phylogenetic analysis of cubilin (CUBN) gene
Shaik, Abjal Pasha; Alsaeed, Abbas H; Kiranmayee, S; Bammidi, VK; Sultana, Asma
2013-01-01
Cubilin, (CUBN; also known as intrinsic factor-cobalamin receptor [Homo sapiens Entrez Pubmed ref NM_001081.3; NG_008967.1; GI: 119606627]), located in the epithelium of intestine and kidney acts as a receptor for intrinsic factor – vitamin B12 complexes. Mutations in CUBN may play a role in autosomal recessive megaloblastic anemia. The current study investigated the possible role of CUBN in evolution using phylogenetic testing. A total of 588 BLAST hits were found for the cubilin query sequence and these hits showed putative conserved domain, CUB superfamily (as on 27th Nov 2012). A first-pass phylogenetic tree was constructed to identify the taxa which most often contained the CUBN sequences. Following this, we narrowed down the search by manually deleting sequences which were not CUBN. A repeat phylogenetic analysis of 25 taxa was performed using PhyML, RAxML and TreeDyn softwares to confirm that CUBN is a conserved protein emphasizing its importance as an extracellular domain and being present in proteins mostly known to be involved in development in many chordate taxa but not found in prokaryotes, plants and yeast.. No horizontal gene transfers have been found between different taxa. PMID:23390341
Phylogenetic analysis of cubilin (CUBN) gene.
Shaik, Abjal Pasha; Alsaeed, Abbas H; Kiranmayee, S; Bammidi, Vk; Sultana, Asma
2013-01-01
Cubilin, (CUBN; also known as intrinsic factor-cobalamin receptor [Homo sapiens Entrez Pubmed ref NM_001081.3; NG_008967.1; GI: 119606627]), located in the epithelium of intestine and kidney acts as a receptor for intrinsic factor - vitamin B12 complexes. Mutations in CUBN may play a role in autosomal recessive megaloblastic anemia. The current study investigated the possible role of CUBN in evolution using phylogenetic testing. A total of 588 BLAST hits were found for the cubilin query sequence and these hits showed putative conserved domain, CUB superfamily (as on 27(th) Nov 2012). A first-pass phylogenetic tree was constructed to identify the taxa which most often contained the CUBN sequences. Following this, we narrowed down the search by manually deleting sequences which were not CUBN. A repeat phylogenetic analysis of 25 taxa was performed using PhyML, RAxML and TreeDyn softwares to confirm that CUBN is a conserved protein emphasizing its importance as an extracellular domain and being present in proteins mostly known to be involved in development in many chordate taxa but not found in prokaryotes, plants and yeast.. No horizontal gene transfers have been found between different taxa.
Software for Bayesian Analysis: Current Status and Additional Needs
1987-05-15
Linear Regression, Econometric models and Time Series Analysis. Program Name: BRAP [Bayesian Regression Analysis Program( Abowd /Zellner)], Version 2.0...for newer IBM compilers) Documentation: Abowd , J.M., Moulton, B. R. and Zellner, A.(1985) The Bayesian Regression Analysis Package, BRAP user’s
Bayesian analysis of polyphonic western tonal music.
Davy, Manuel; Godsill, Simon; Idier, Jérôme
2006-04-01
This paper deals with the computational analysis of musical audio from recorded audio waveforms. This general problem includes, as subtasks, music transcription, extraction of musical pitch, dynamics, timbre, instrument identity, and source separation. Analysis of real musical signals is a highly ill-posed task which is made complicated by the presence of transient sounds, background interference, or the complex structure of musical pitches in the time-frequency domain. This paper focuses on models and algorithms for computer transcription of multiple musical pitches in audio, elaborated from previous work by two of the authors. The audio data are supposedly presegmented into fixed pitch regimes such as individual chords. The models presented apply to pitched (tonal) music and are formulated via a Gabor representation of nonstationary signals. A Bayesian probabilistic structure is employed for representation of prior information about the parameters of the notes. This paper introduces a numerical Bayesian inference strategy for estimation of the pitches and other parameters of the waveform. The improved algorithm is much quicker and makes the approach feasible in realistic situations. Results are presented for estimation of a known number of notes present in randomly generated note clusters from a real musical instrument database.
Bayesian Nonparametric Models for Multiway Data Analysis.
Xu, Zenglin; Yan, Feng; Qi, Yuan
2015-02-01
Tensor decomposition is a powerful computational tool for multiway data analysis. Many popular tensor decomposition approaches-such as the Tucker decomposition and CANDECOMP/PARAFAC (CP)-amount to multi-linear factorization. They are insufficient to model (i) complex interactions between data entities, (ii) various data types (e.g., missing data and binary data), and (iii) noisy observations and outliers. To address these issues, we propose tensor-variate latent nonparametric Bayesian models for multiway data analysis. We name these models InfTucker. These new models essentially conduct Tucker decomposition in an infinite feature space. Unlike classical tensor decomposition models, our new approaches handle both continuous and binary data in a probabilistic framework. Unlike previous Bayesian models on matrices and tensors, our models are based on latent Gaussian or t processes with nonlinear covariance functions. Moreover, on network data, our models reduce to nonparametric stochastic blockmodels and can be used to discover latent groups and predict missing interactions. To learn the models efficiently from data, we develop a variational inference technique and explore properties of the Kronecker product for computational efficiency. Compared with a classical variational implementation, this technique reduces both time and space complexities by several orders of magnitude. On real multiway and network data, our new models achieved significantly higher prediction accuracy than state-of-art tensor decomposition methods and blockmodels.
Waddell, Peter J; Kishino, Hirohisa; Ota, Rissa
2002-01-01
Evolutionary trees sit at the core of all realistic models describing a set of related sequences, including alignment, homology search, ancestral protein reconstruction and 2D/3D structural change. It is important to assess the stochastic error when estimating a tree, including models using the most realistic likelihood-based optimizations, yet computation times may be many days or weeks. If so, the bootstrap is computationally prohibitive. Here we show that the extremely fast "resampling of estimated log likelihoods" or RELL method behaves well under more general circumstances than previously examined. RELL approximates the bootstrap (BP) proportions of trees better that some bootstrap methods that rely on fast heuristics to search the tree space. The BIC approximation of the Bayesian posterior probability (BPP) of trees is made more accurate by including an additional term related to the determinant of the information matrix (which may also be obtained as a product of gradient or score vectors). Such estimates are shown to be very close to MCMC chain values. Our analysis of mammalian mitochondrial amino acid sequences suggest that when model breakdown occurs, as it typically does for sequences separated by more than a few million years, the BPP values are far too peaked and the real fluctuations in the likelihood of the data are many times larger than expected. Accordingly, several ways to incorporate the bootstrap and other types of direct resampling with MCMC procedures are outlined. Genes evolve by a process which involves some sites following a tree close to, but not identical with, the species tree. It is seen that under such a likelihood model BP (bootstrap proportions) and BPP estimates may still be reasonable estimates of the species tree. Since many of the methods studied are very fast computationally, there is no reason to ignore stochastic error even with the slowest ML or likelihood based methods.
Bayesian analysis of multiple direct detection experiments
Arina, Chiara
2014-12-01
Bayesian methods offer a coherent and efficient framework for implementing uncertainties into induction problems. In this article, we review how this approach applies to the analysis of dark matter direct detection experiments. In particular we discuss the exclusion limit of XENON100 and the debated hints of detection under the hypothesis of a WIMP signal. Within parameter inference, marginalizing consistently over uncertainties to extract robust posterior probability distributions, we find that the claimed tension between XENON100 and the other experiments can be partially alleviated in isospin violating scenario, while elastic scattering model appears to be compatible with the frequentist statistical approach. We then move to model comparison, for which Bayesian methods are particularly well suited. Firstly, we investigate the annual modulation seen in CoGeNT data, finding that there is weak evidence for a modulation. Modulation models due to other physics compare unfavorably with the WIMP models, paying the price for their excessive complexity. Secondly, we confront several coherent scattering models to determine the current best physical scenario compatible with the experimental hints. We find that exothermic and inelastic dark matter are moderatly disfavored against the elastic scenario, while the isospin violating model has a similar evidence. Lastly the Bayes' factor gives inconclusive evidence for an incompatibility between the data sets of XENON100 and the hints of detection. The same question assessed with goodness of fit would indicate a 2 σ discrepancy. This suggests that more data are therefore needed to settle this question.
The Application of Bayesian Analysis to Issues in Developmental Research
ERIC Educational Resources Information Center
Walker, Lawrence J.; Gustafson, Paul; Frimer, Jeremy A.
2007-01-01
This article reviews the concepts and methods of Bayesian statistical analysis, which can offer innovative and powerful solutions to some challenging analytical problems that characterize developmental research. In this article, we demonstrate the utility of Bayesian analysis, explain its unique adeptness in some circumstances, address some…
Bayesian Model Averaging for Propensity Score Analysis.
Kaplan, David; Chen, Jianshen
2014-01-01
This article considers Bayesian model averaging as a means of addressing uncertainty in the selection of variables in the propensity score equation. We investigate an approximate Bayesian model averaging approach based on the model-averaged propensity score estimates produced by the R package BMA but that ignores uncertainty in the propensity score. We also provide a fully Bayesian model averaging approach via Markov chain Monte Carlo sampling (MCMC) to account for uncertainty in both parameters and models. A detailed study of our approach examines the differences in the causal estimate when incorporating noninformative versus informative priors in the model averaging stage. We examine these approaches under common methods of propensity score implementation. In addition, we evaluate the impact of changing the size of Occam's window used to narrow down the range of possible models. We also assess the predictive performance of both Bayesian model averaging propensity score approaches and compare it with the case without Bayesian model averaging. Overall, results show that both Bayesian model averaging propensity score approaches recover the treatment effect estimates well and generally provide larger uncertainty estimates, as expected. Both Bayesian model averaging approaches offer slightly better prediction of the propensity score compared with the Bayesian approach with a single propensity score equation. Covariate balance checks for the case study show that both Bayesian model averaging approaches offer good balance. The fully Bayesian model averaging approach also provides posterior probability intervals of the balance indices.
Tanner, Alastair R.; Fleming, James F.; Tarver, James E.; Pisani, Davide
2017-01-01
Morphological data provide the only means of classifying the majority of life's history, but the choice between competing phylogenetic methods for the analysis of morphology is unclear. Traditionally, parsimony methods have been favoured but recent studies have shown that these approaches are less accurate than the Bayesian implementation of the Mk model. Here we expand on these findings in several ways: we assess the impact of tree shape and maximum-likelihood estimation using the Mk model, as well as analysing data composed of both binary and multistate characters. We find that all methods struggle to correctly resolve deep clades within asymmetric trees, and when analysing small character matrices. The Bayesian Mk model is the most accurate method for estimating topology, but with lower resolution than other methods. Equal weights parsimony is more accurate than implied weights parsimony, and maximum-likelihood estimation using the Mk model is the least accurate method. We conclude that the Bayesian implementation of the Mk model should be the default method for phylogenetic estimation from phenotype datasets, and we explore the implications of our simulations in reanalysing several empirical morphological character matrices. A consequence of our finding is that high levels of resolution or the ability to classify species or groups with much confidence should not be expected when using small datasets. It is now necessary to depart from the traditional parsimony paradigms of constructing character matrices, towards datasets constructed explicitly for Bayesian methods. PMID:28077778
Puttick, Mark N; O'Reilly, Joseph E; Tanner, Alastair R; Fleming, James F; Clark, James; Holloway, Lucy; Lozano-Fernandez, Jesus; Parry, Luke A; Tarver, James E; Pisani, Davide; Donoghue, Philip C J
2017-01-11
Morphological data provide the only means of classifying the majority of life's history, but the choice between competing phylogenetic methods for the analysis of morphology is unclear. Traditionally, parsimony methods have been favoured but recent studies have shown that these approaches are less accurate than the Bayesian implementation of the Mk model. Here we expand on these findings in several ways: we assess the impact of tree shape and maximum-likelihood estimation using the Mk model, as well as analysing data composed of both binary and multistate characters. We find that all methods struggle to correctly resolve deep clades within asymmetric trees, and when analysing small character matrices. The Bayesian Mk model is the most accurate method for estimating topology, but with lower resolution than other methods. Equal weights parsimony is more accurate than implied weights parsimony, and maximum-likelihood estimation using the Mk model is the least accurate method. We conclude that the Bayesian implementation of the Mk model should be the default method for phylogenetic estimation from phenotype datasets, and we explore the implications of our simulations in reanalysing several empirical morphological character matrices. A consequence of our finding is that high levels of resolution or the ability to classify species or groups with much confidence should not be expected when using small datasets. It is now necessary to depart from the traditional parsimony paradigms of constructing character matrices, towards datasets constructed explicitly for Bayesian methods.
Bayesian analysis of the solar neutrino anomaly
Bhat, C.M.
1998-02-01
We present an analysis of the recent solar neutrino data from the five experiments using Bayesian approach. We extract quantitative and easily understandable information pertaining to the solar neutrino problem. The probability distributions for the individual neutrino fluxes and, discrepancy distribution for B and Be fluxes, which include theoretical and experimental uncertainties have been extracted. The analysis carried out assuming that the neutrinos are unaltered during their passage from the sun to earth, clearly indicate that the observed PP flux is consistent with the 1995 standard solar model predictions of Bahcall and Pinsonneault within 2{sigma} (standard deviation), whereas the {sup 8}B flux is down by more than 12{sigma} and the {sup 7}Be flux is maximally suppressed. We also deduce the experimental survival probability for the solar neutrinos as a function of their energy in a model-independent way. We find that the shape of that distribution is in qualitative agreement with the MSW oscillation predictions.
Bayesian analysis on gravitational waves and exoplanets
Deng, Xihao
Attempts to detect gravitational waves using a pulsar timing array (PTA), i.e., a collection of pulsars in our Galaxy, have become more organized over the last several years. PTAs act to detect gravitational waves generated from very distant sources by observing the small and correlated effect the waves have on pulse arrival times at the Earth. In this thesis, I present advanced Bayesian analysis methods that can be used to search for gravitational waves in pulsar timing data. These methods were also applied to analyze a set of radial velocity (RV) data collected by the Hobby- Eberly Telescope on observing a K0 giant star. They confirmed the presence of two Jupiter mass planets around a K0 giant star and also characterized the stellar p-mode oscillation. The first part of the thesis investigates the effect of wavefront curvature on a pulsar's response to a gravitational wave. In it we show that we can assume the gravitational wave phasefront is planar across the array only if the source luminosity distance " 2piL2/lambda, where L is the pulsar distance to the Earth (˜ kpc) and lambda is the radiation wavelength (˜ pc) in the PTA waveband. Correspondingly, for a point gravitational wave source closer than ˜ 100 Mpc, we should take into account the effect of wavefront curvature across the pulsar-Earth line of sight, which depends on the luminosity distance to the source, when evaluating the pulsar timing response. As a consequence, if a PTA can detect a gravitational wave from a source closer than ˜ 100 Mpc, the effects of wavefront curvature on the response allows us to determine the source luminosity distance. The second and third parts of the thesis propose a new analysis method based on Bayesian nonparametric regression to search for gravitational wave bursts and a gravitational wave background in PTA data. Unlike the conventional Bayesian analysis that introduces a signal model with a fixed number of parameters, Bayesian nonparametric regression sets
On the Bayesian analysis of ring-recovery data.
Brooks, S P; Catchpole, E A; Morgan, B J; Barry, S C
2000-09-01
Vounatsou and Smith (1995, Biometrics 51, 687-708) describe the modern Bayesian analysis of ring-recovery data. Here we discuss and extend their work. We draw different conclusions from two major data analyses. We emphasize the extreme sensitivity of certain parameter estimates to the choice of prior distribution and conclude that naive use of Bayesian methods in this area can be misleading. Additionally, we explain the discrepancy between the Bayesian and classical analyses when the likelihood surface has a flat ridge. In this case, when there is no unique maximum likelihood estimate, the Bayesian estimators are remarkably precise.
Bayesian Logical Data Analysis for the Physical Sciences
Gregory, Phil
2010-05-01
Preface; Acknowledgements; 1. Role of probability theory in science; 2. Probability theory as extended logic; 3. The how-to of Bayesian inference; 4. Assigning probabilities; 5. Frequentist statistical inference; 6. What is a statistic?; 7. Frequentist hypothesis testing; 8. Maximum entropy probabilities; 9. Bayesian inference (Gaussian errors); 10. Linear model fitting (Gaussian errors); 11. Nonlinear model fitting; 12. Markov Chain Monte Carlo; 13. Bayesian spectral analysis; 14. Bayesian inference (Poisson sampling); Appendix A. Singular value decomposition; Appendix B. Discrete Fourier transforms; Appendix C. Difference in two samples; Appendix D. Poisson ON/OFF details; Appendix E. Multivariate Gaussian from maximum entropy; References; Index.
Vision as Bayesian inference: analysis by synthesis?
Yuille, Alan; Kersten, Daniel
2006-07-01
We argue that the study of human vision should be aimed at determining how humans perform natural tasks with natural images. Attempts to understand the phenomenology of vision from artificial stimuli, although worthwhile as a starting point, can lead to faulty generalizations about visual systems, because of the enormous complexity of natural images. Dealing with this complexity is daunting, but Bayesian inference on structured probability distributions offers the ability to design theories of vision that can deal with the complexity of natural images, and that use 'analysis by synthesis' strategies with intriguing similarities to the brain. We examine these strategies using recent examples from computer vision, and outline some important implications for cognitive science.
Optimal sequential Bayesian analysis for degradation tests.
Rodríguez-Narciso, Silvia; Christen, J Andrés
2016-07-01
Degradation tests are especially difficult to conduct for items with high reliability. Test costs, caused mainly by prolonged item duration and item destruction costs, establish the necessity of sequential degradation test designs. We propose a methodology that sequentially selects the optimal observation times to measure the degradation, using a convenient rule that maximizes the inference precision and minimizes test costs. In particular our objective is to estimate a quantile of the time to failure distribution, where the degradation process is modelled as a linear model using Bayesian inference. The proposed sequential analysis is based on an index that measures the expected discrepancy between the estimated quantile and its corresponding prediction, using Monte Carlo methods. The procedure was successfully implemented for simulated and real data.
Taxonomic review and phylogenetic analysis of Enchodontoidei.
Silva, Hilda M A; Gallo, Valéria
2011-06-01
Enchodontoidei are extinct marine teleost fishes with a long temporal range and a wide geographic distribution. As there has been no comprehensive phylogenetic study of this taxon, we performed a parsimony analysis using a data matrix with 87 characters, 31 terminal taxa for ingroup, and three taxa for outgroup. The analysis produced 93 equally parsimonious trees (L = 437 steps; CI = 0. 24; RI = 0. 49). The topology of the majority rule consensus tree was: (Sardinioides + Hemisaurida + (Nardorex + (Atolvorator + (Protostomias + Yabrudichthys ) + (Apateopholis + (Serrilepis + (Halec + Phylactocephalus ) + (Cimolichthys + (Prionolepis + ( (Eurypholis + Saurorhamphus ) + (Enchodus + (Paleolycus + Parenchodus ))))))) + ( (Ichthyotringa + Apateodus ) + (Rharbichthys + (Trachinocephalus + ( (Apuliadercetis + Brazilodercetis ) + (Benthesikyme + (Cyranichthys + Robertichthys ) + (Dercetis + Ophidercetis )) + (Caudadercetis + (Pelargorhynchus + (Nardodercetis + (Rhynchodercetis + (Dercetoides + Hastichthys )))))). The group Enchodontoidei is not monophyletic. Dercetidae form a clade supported by the presence of very reduced neural spines and possess a new composition. Enchodontidae are monophyletic by the presence of middorsal scutes, and Rharbichthys was excluded. Halecidae possess a new composition, with the exclusion of Hemisaurida. This taxon and Nardorex are Aulopiformes incertae sedis.
Exoribonuclease superfamilies: structural analysis and phylogenetic distribution
Zuo, Yuhong; Deutscher, Murray P.
2001-01-01
Exoribonucleases play an important role in all aspects of RNA metabolism. Biochemical and genetic analyses in recent years have identified many new RNases and it is now clear that a single cell can contain multiple enzymes of this class. Here, we analyze the structure and phylogenetic distribution of the known exoribonucleases. Based on extensive sequence analysis and on their catalytic properties, all of the exoribonucleases and their homologs have been grouped into six superfamilies and various subfamilies. We identify common motifs that can be used to characterize newly-discovered exoribonucleases, and based on these motifs we correct some previously misassigned proteins. This analysis may serve as a useful first step for developing a nomenclature for this group of enzymes. PMID:11222749
A Distance Measure for Genome Phylogenetic Analysis
Cao, Minh Duc; Allison, Lloyd; Dix, Trevor
Phylogenetic analyses of species based on single genes or parts of the genomes are often inconsistent because of factors such as variable rates of evolution and horizontal gene transfer. The availability of more and more sequenced genomes allows phylogeny construction from complete genomes that is less sensitive to such inconsistency. For such long sequences, construction methods like maximum parsimony and maximum likelihood are often not possible due to their intensive computational requirement. Another class of tree construction methods, namely distance-based methods, require a measure of distances between any two genomes. Some measures such as evolutionary edit distance of gene order and gene content are computational expensive or do not perform well when the gene content of the organisms are similar. This study presents an information theoretic measure of genetic distances between genomes based on the biological compression algorithm expert model. We demonstrate that our distance measure can be applied to reconstruct the consensus phylogenetic tree of a number of Plasmodium parasites from their genomes, the statistical bias of which would mislead conventional analysis methods. Our approach is also used to successfully construct a plausible evolutionary tree for the γ-Proteobacteria group whose genomes are known to contain many horizontally transferred genes.
Yamanoue, Yusuke; Miya, Masaki; Matsuura, Keiichi; Yagishita, Naoki; Mabuchi, Kohji; Sakai, Harumi; Katoh, Masaya; Nishida, Mutsumi
2007-10-01
Tetraodontiformes includes approximately 350 species assigned to nine families, sharing several reduced morphological features of higher teleosts. The order has been accepted as a monophyletic group by many authors, although several alternative hypotheses exist regarding its phylogenetic position within the higher teleosts. To date, acanthuroids, zeiforms, and lophiiforms have been proposed as sister-groups of the tetraodontiforms. The monophyly and sister-group status was investigated using whole mitochondrial genome (mitogenome) sequences from 44 purposefully-chosen species (26 sequences newly-determined during the study) that fully represent the major tetraodontiform lineages plus all the groups that have been hypothesized as being close relatives. Partitioned Bayesian analyses were conducted with the three datasets that comprised concatenated nucleotide sequences from 13 protein-coding genes (with and without, or with RY-coding, 3rd codon positions), plus 22 transfer RNA and two ribosomal RNA genes. The resultant trees were well resolved and largely congruent, with most internal branches being supported by high posterior probabilities. Mitogenomic data strongly supported the monophyly of tetraodontiform fishes, placing them as a sister-group of either Lophiiformes plus Caproidei or Caproidei only. The sister-group relationship between Acanthuroidei and Tetraodontiformes was statistically rejected using Bayes factors. These results were confirmed by a reanalysis of the previously published nuclear RAG1 gene sequences using the Bayesian method. Within the Tetraodontiformes, however, monophylies of the three superfamilies were not recovered and further taxonomic sampling and subsequent efforts should clarify these relationships.
Bayesian analysis. II. Signal detection and model selection
Bretthorst, G. Larry
In the preceding. paper, Bayesian analysis was applied to the parameter estimation problem, given quadrature NMR data. Here Bayesian analysis is extended to the problem of selecting the model which is most probable in view of the data and all the prior information. In addition to the analytic calculation, two examples are given. The first example demonstrates how to use Bayesian probability theory to detect small signals in noise. The second example uses Bayesian probability theory to compute the probability of the number of decaying exponentials in simulated T1 data. The Bayesian answer to this question is essentially a microcosm of the scientific method and a quantitative statement of Ockham's razor: theorize about possible models, compare these to experiment, and select the simplest model that "best" fits the data.
A Primer on Bayesian Analysis for Experimental Psychopathologists.
Krypotos, Angelos-Miltiadis; Blanken, Tessa F; Arnaudova, Inna; Matzke, Dora; Beckers, Tom
2017-01-01
The principal goals of experimental psychopathology (EPP) research are to offer insights into the pathogenic mechanisms of mental disorders and to provide a stable ground for the development of clinical interventions. The main message of the present article is that those goals are better served by the adoption of Bayesian statistics than by the continued use of null-hypothesis significance testing (NHST). In the first part of the article we list the main disadvantages of NHST and explain why those disadvantages limit the conclusions that can be drawn from EPP research. Next, we highlight the advantages of Bayesian statistics. To illustrate, we then pit NHST and Bayesian analysis against each other using an experimental data set from our lab. Finally, we discuss some challenges when adopting Bayesian statistics. We hope that the present article will encourage experimental psychopathologists to embrace Bayesian statistics, which could strengthen the conclusions drawn from EPP research.
Phylogenetic analysis of fungal ABC transporters.
Kovalchuk, Andriy; Driessen, Arnold J M
2010-03-16
The superfamily of ABC proteins is among the largest known in nature. Its members are mainly, but not exclusively, involved in the transport of a broad range of substrates across biological membranes. Many contribute to multidrug resistance in microbial pathogens and cancer cells. The diversity of ABC proteins in fungi is comparable with those in multicellular animals, but so far fungal ABC proteins have barely been studied. We performed a phylogenetic analysis of the ABC proteins extracted from the genomes of 27 fungal species from 18 orders representing 5 fungal phyla thereby covering the most important groups. Our analysis demonstrated that some of the subfamilies of ABC proteins remained highly conserved in fungi, while others have undergone a remarkable group-specific diversification. Members of the various fungal phyla also differed significantly in the number of ABC proteins found in their genomes, which is especially reduced in the yeast S. cerevisiae and S. pombe. Data obtained during our analysis should contribute to a better understanding of the diversity of the fungal ABC proteins and provide important clues about their possible biological functions.
Phylogenetic analysis of fungal ABC transporters
2010-01-01
Background The superfamily of ABC proteins is among the largest known in nature. Its members are mainly, but not exclusively, involved in the transport of a broad range of substrates across biological membranes. Many contribute to multidrug resistance in microbial pathogens and cancer cells. The diversity of ABC proteins in fungi is comparable with those in multicellular animals, but so far fungal ABC proteins have barely been studied. Results We performed a phylogenetic analysis of the ABC proteins extracted from the genomes of 27 fungal species from 18 orders representing 5 fungal phyla thereby covering the most important groups. Our analysis demonstrated that some of the subfamilies of ABC proteins remained highly conserved in fungi, while others have undergone a remarkable group-specific diversification. Members of the various fungal phyla also differed significantly in the number of ABC proteins found in their genomes, which is especially reduced in the yeast S. cerevisiae and S. pombe. Conclusions Data obtained during our analysis should contribute to a better understanding of the diversity of the fungal ABC proteins and provide important clues about their possible biological functions. PMID:20233411
Bayesian hypothesis testing: Editorial to the Special Issue on Bayesian data analysis.
Hoijtink, Herbert; Chow, Sy-Miin
2017-06-01
In the past 20 years, there has been a steadily increasing attention and demand for Bayesian data analysis across multiple scientific disciplines, including psychology. Bayesian methods and the related Markov chain Monte Carlo sampling techniques offered renewed ways of handling old and challenging new problems that may be difficult or impossible to handle using classical approaches. Yet, such opportunities and potential improvements have not been sufficiently explored and investigated. This is 1 of 2 special issues in Psychological Methods dedicated to the topic of Bayesian data analysis, with an emphasis on Bayesian hypothesis testing, model comparison, and general guidelines for applications in psychology. In this editorial, we provide an overview of the use of Bayesian methods in psychological research and a brief history of the Bayes factor and the posterior predictive p value. Translational abstracts that summarize the articles in this issue in very clear and understandable terms are included in the Appendix. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Bayesian data analysis in population ecology: motivations, methods, and benefits
Dorazio, Robert
2016-01-01
During the 20th century ecologists largely relied on the frequentist system of inference for the analysis of their data. However, in the past few decades ecologists have become increasingly interested in the use of Bayesian methods of data analysis. In this article I provide guidance to ecologists who would like to decide whether Bayesian methods can be used to improve their conclusions and predictions. I begin by providing a concise summary of Bayesian methods of analysis, including a comparison of differences between Bayesian and frequentist approaches to inference when using hierarchical models. Next I provide a list of problems where Bayesian methods of analysis may arguably be preferred over frequentist methods. These problems are usually encountered in analyses based on hierarchical models of data. I describe the essentials required for applying modern methods of Bayesian computation, and I use real-world examples to illustrate these methods. I conclude by summarizing what I perceive to be the main strengths and weaknesses of using Bayesian methods to solve ecological inference problems.
Spherical Harmonic Analysis via Bayesian Inference
Muir, J. B.; Tkalcic, H.
2014-12-01
The real spherical harmonics form a compact, simple and commonly used set of basis functions for describing fields in tomographic inverse problems. It is therefore often useful to perform spherical harmonic analysis on data to represent it in the spherical harmonic parametrisation. Most existing algorithms, based on Fourier transforms, require that data be interpolated to a regular grid; this is not appropriate for the sparse, irregularly distributed data found in many geophysical applications. Instead, this work casts the problem of spherical harmonic analysis as an inverse problem, and applies the methods of Bayesian inference to overcome regularization problems in the inversion. This allows irregular data to be easily handled, and directly provides error estimates for the inverted spherical harmonic parameters. Synthetic tests have shown that this method easily handles relatively large amounts of added Gaussian noise. So far, this method has been applied to estimate the power in each harmonic degree for tomographic maps of the deep mantle based on PKP-PKIKP and PcP-P differential travel times, showing that they agree at global length scales despite local heterogeneity results being heavily influenced by data coverage. This potentially allows for simple heuristic arguments to constrain the global variation in core-mantle boundary topography based on the similarity between PKP and PcP derived tomographic maps.
Bayesian Analysis of Item Response Curves.
1984-07-01
response.a are studied from a Bayesian viewpoint of estimating the item parameters. For the two-parameter logistic model with normally distributed ability...estimating the item parameters. For the two-parameter logistic model with normally distributed ability, restricted bivariate beta priors are used to...responses, Bayesian estimation, EM algorithm. 3 Introduction We will consider dichotomous responses to a set of test items which are designed to measure the
Ockham's razor and Bayesian analysis. [statistical theory for systems evaluation
NASA Technical Reports Server (NTRS)
Jefferys, William H.; Berger, James O.
1992-01-01
'Ockham's razor', the ad hoc principle enjoining the greatest possible simplicity in theoretical explanations, is presently shown to be justifiable as a consequence of Bayesian inference; Bayesian analysis can, moreover, clarify the nature of the 'simplest' hypothesis consistent with the given data. By choosing the prior probabilities of hypotheses, it becomes possible to quantify the scientific judgment that simpler hypotheses are more likely to be correct. Bayesian analysis also shows that a hypothesis with fewer adjustable parameters intrinsically possesses an enhanced posterior probability, due to the clarity of its predictions.
Ockham's razor and Bayesian analysis. [statistical theory for systems evaluation
NASA Technical Reports Server (NTRS)
Jefferys, William H.; Berger, James O.
1992-01-01
'Ockham's razor', the ad hoc principle enjoining the greatest possible simplicity in theoretical explanations, is presently shown to be justifiable as a consequence of Bayesian inference; Bayesian analysis can, moreover, clarify the nature of the 'simplest' hypothesis consistent with the given data. By choosing the prior probabilities of hypotheses, it becomes possible to quantify the scientific judgment that simpler hypotheses are more likely to be correct. Bayesian analysis also shows that a hypothesis with fewer adjustable parameters intrinsically possesses an enhanced posterior probability, due to the clarity of its predictions.
Enhancing the Modeling of PFOA Pharmacokinetics with Bayesian Analysis
Enhancing the Modeling of PFOA Pharmacokinetics with Bayesian Analysis
Stefanović, Sasa; Olmstead, Richard G
2004-06-01
Previous findings on structural rearrangements in the chloroplast genome of Cuscuta (dodder), the only parasitic genus in the morning-glory family, Convolvulaceae, were attributed to its parasitic life style, but without proper comparison to related nonparasitic members of the family. Before molecular evolutionary questions regarding genome evolution can be answered, the phylogenetic problems within the family need to be resolved. However, the phylogenetic position of parasitic angiosperms and their precise relationship to nonparasitic relatives are difficult to infer. Problems are encountered with both morphological and molecular evidence. Molecular data have been used in numerous studies to elucidate relationships of parasitic taxa, despite accelerated rates of sequence evolution. To address the question of the position of the genus Cuscuta within Convolvulaceae, we generated a new molecular data set consisting of mitochondrial (atpA) and nuclear (RPB2) genes, and analyzed these data together with an existing chloroplast data matrix (rbcL, atpB, trnL-F, and psbE-J), to which an additional chloroplast gene (rpl2) was added. This data set was analyzed with an array of phylogenetic methods, including Bayesian analysis, maximum likelihood, and maximum parsimony. Further exploration of data was done by using methods of phylogeny hypothesis testing. At least two nonparasitic lineages are shown to diverge within the Convolvulaceae before Cuscuta. However, the exact sister group of Cuscuta could not be ascertained, even though many alternatives were rejected with confidence. Caution is therefore warranted when interpreting the causes of molecular evolution in Cuscuta. Detailed comparisons with nonparasitic Convolvulaceae are necessary before firm conclusions can be reached regarding the effects of the parasitic mode of life on patterns of molecular evolution in Cuscuta.
Bayesian analysis of the backreaction models
Kurek, Aleksandra; Bolejko, Krzysztof; Szydlowski, Marek
2010-03-15
We present a Bayesian analysis of four different types of backreaction models, which are based on the Buchert equations. In this approach, one considers a solution to the Einstein equations for a general matter distribution and then an average of various observable quantities is taken. Such an approach became of considerable interest when it was shown that it could lead to agreement with observations without resorting to dark energy. In this paper we compare the {Lambda}CDM model and the backreaction models with type Ia supernovae, baryon acoustic oscillations, and cosmic microwave background data, and find that the former is favored. However, the tested models were based on some particular assumptions about the relation between the average spatial curvature and the backreaction, as well as the relation between the curvature and curvature index. In this paper we modified the latter assumption, leaving the former unchanged. We find that, by varying the relation between the curvature and curvature index, we can obtain a better fit. Therefore, some further work is still needed--in particular, the relation between the backreaction and the curvature should be revisited in order to fully determine the feasibility of the backreaction models to mimic dark energy.
Bayesian Analysis of the Cosmic Microwave Background
NASA Technical Reports Server (NTRS)
Jewell, Jeffrey
2007-01-01
There is a wealth of cosmological information encoded in the spatial power spectrum of temperature anisotropies of the cosmic microwave background! Experiments designed to map the microwave sky are returning a flood of data (time streams of instrument response as a beam is swept over the sky) at several different frequencies (from 30 to 900 GHz), all with different resolutions and noise properties. The resulting analysis challenge is to estimate, and quantify our uncertainty in, the spatial power spectrum of the cosmic microwave background given the complexities of "missing data", foreground emission, and complicated instrumental noise. Bayesian formulation of this problem allows consistent treatment of many complexities including complicated instrumental noise and foregrounds, and can be numerically implemented with Gibbs sampling. Gibbs sampling has now been validated as an efficient, statistically exact, and practically useful method for low-resolution (as demonstrated on WMAP 1 and 3 year temperature and polarization data). Continuing development for Planck - the goal is to exploit the unique capabilities of Gibbs sampling to directly propagate uncertainties in both foreground and instrument models to total uncertainty in cosmological parameters.
Asymptotic analysis of Bayesian generalization error with Newton diagram.
Yamazaki, Keisuke; Aoyagi, Miki; Watanabe, Sumio
2010-01-01
Statistical learning machines that have singularities in the parameter space, such as hidden Markov models, Bayesian networks, and neural networks, are widely used in the field of information engineering. Singularities in the parameter space determine the accuracy of estimation in the Bayesian scenario. The Newton diagram in algebraic geometry is recognized as an effective method by which to investigate a singularity. The present paper proposes a new technique to plug the diagram in the Bayesian analysis. The proposed technique allows the generalization error to be clarified and provides a foundation for an efficient model selection. We apply the proposed technique to mixtures of binomial distributions.
Bayesian analysis of MEG visual evoked responses
Schmidt, D.M.; George, J.S.; Wood, C.C.
1999-04-01
The authors developed a method for analyzing neural electromagnetic data that allows probabilistic inferences to be drawn about regions of activation. The method involves the generation of a large number of possible solutions which both fir the data and prior expectations about the nature of probable solutions made explicit by a Bayesian formalism. In addition, they have introduced a model for the current distributions that produce MEG and (EEG) data that allows extended regions of activity, and can easily incorporate prior information such as anatomical constraints from MRI. To evaluate the feasibility and utility of the Bayesian approach with actual data, they analyzed MEG data from a visual evoked response experiment. They compared Bayesian analyses of MEG responses to visual stimuli in the left and right visual fields, in order to examine the sensitivity of the method to detect known features of human visual cortex organization. They also examined the changing pattern of cortical activation as a function of time.
Bayesian analysis of MEG visual evoked responses
NASA Astrophysics Data System (ADS)
Schmidt, David M.; George, John S.; Wood, C. C.
1999-05-01
We have developed a method for analyzing neural electromagnetic data that allows probabilistic inferences to be drawn about regions of activation. The method involves the generation of a large number of possible solutions which both fit the data and prior expectations about the nature of probable solutions made explicit by a Bayesian formalism. In addition, we have introduced a model for the current distributions that produce MEG (and EEG) data that allows extended regions of activity, and can easily incorporate prior information such as anatomical constraints from MRI. To evaluate the feasibility and utility of the Bayesian approach with actual data, we analyzed MEG data from a visual evoked response experiment. We compared Bayesian analyses of MEG responses to visual stimuli in the left and right visual fields, in order to examine the sensitivity of the method to detect known features of human visual cortex organization. We also examined the changing pattern of cortical activation as a function of time.
Bayesian Analysis of Perceived Eye Level
Orendorff, Elaine E.; Kalesinskas, Laurynas; Palumbo, Robert T.; Albert, Mark V.
2016-01-01
To accurately perceive the world, people must efficiently combine internal beliefs and external sensory cues. We introduce a Bayesian framework that explains the role of internal balance cues and visual stimuli on perceived eye level (PEL)—a self-reported measure of elevation angle. This framework provides a single, coherent model explaining a set of experimentally observed PEL over a range of experimental conditions. Further, it provides a parsimonious explanation for the additive effect of low fidelity cues as well as the averaging effect of high fidelity cues, as also found in other Bayesian cue combination psychophysical studies. Our model accurately estimates the PEL and explains the form of previous equations used in describing PEL behavior. Most importantly, the proposed Bayesian framework for PEL is more powerful than previous behavioral modeling; it permits behavioral estimation in a wider range of cue combination and perceptual studies than models previously reported. PMID:28018204
FPGA acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods
2010-01-01
Background Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA's on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors. Results We use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10× speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture. Conclusions Heterogeneous computing, which combines general-purpose processors with special-purpose co-processors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs) [1]. PMID:20385005
Kruschke, John K; Liddell, Torrin M
2017-02-07
In the practice of data analysis, there is a conceptual distinction between hypothesis testing, on the one hand, and estimation with quantified uncertainty on the other. Among frequentists in psychology, a shift of emphasis from hypothesis testing to estimation has been dubbed "the New Statistics" (Cumming 2014). A second conceptual distinction is between frequentist methods and Bayesian methods. Our main goal in this article is to explain how Bayesian methods achieve the goals of the New Statistics better than frequentist methods. The article reviews frequentist and Bayesian approaches to hypothesis testing and to estimation with confidence or credible intervals. The article also describes Bayesian approaches to meta-analysis, randomized controlled trials, and power analysis.
Open Reading Frame Phylogenetic Analysis on the Cloud
2013-01-01
Phylogenetic analysis has become essential in researching the evolutionary relationships between viruses. These relationships are depicted on phylogenetic trees, in which viruses are grouped based on sequence similarity. Viral evolutionary relationships are identified from open reading frames rather than from complete sequences. Recently, cloud computing has become popular for developing internet-based bioinformatics tools. Biocloud is an efficient, scalable, and robust bioinformatics computing service. In this paper, we propose a cloud-based open reading frame phylogenetic analysis service. The proposed service integrates the Hadoop framework, virtualization technology, and phylogenetic analysis methods to provide a high-availability, large-scale bioservice. In a case study, we analyze the phylogenetic relationships among Norovirus. Evolutionary relationships are elucidated by aligning different open reading frame sequences. The proposed platform correctly identifies the evolutionary relationships between members of Norovirus. PMID:23671843
[Phylogenetic analysis of bacteria of extreme ecosystems].
Romanovskaia, V A; Parfenova, V V; Bel'kova, N L; Sukhanova, E V; Gladka, G V; Tashireva, A A
2014-01-01
Phylogenetic analysis of aerobic chemoorganotrophic bacteria of the two extreme regions (Dead Sea and West Antarctic) was performed on the basis of the nucleotide sequences of the 16S rRNA gene. Thermotolerant and halotolerant spore-forming bacteria 7t1 and 7t3 of terrestrial ecosystems Dead Sea identified as Bacillus licheniformis and B. subtilis subsp. subtilis, respectively. Taking into account remote location of thermotolerant strain 6t1 from closely related strains in the cluster Staphylococcus, 6t1 strain can be regarded as Staphylococcus sp. In terrestrial ecosystems, Galindez Island (Antarctic) detected taxonomically diverse psychrotolerant bacteria. From ornithogenic soil were isolated Micrococcus luteus O-1 and Microbacterium trichothecenolyticum O-3. Strains 4r5, 5r5 and 40r5, isolated from grass and lichens, can be referred to the genus Frondihabitans. These strains are taxonomically and ecologically isolated and on the tree diagram form the joint cluster with three isolates Frondihabitans sp., isolated from the lichen Austrian Alps, and psychrotolerant associated with plants F. cladoniiphilus CafT13(T). Isolates from black lichen in the different stationary observation points on the south side of a vertical cliff identified as: Rhodococcus fascians 181n3, Sporosarcina aquimarina O-7, Staphylococcus sp. 0-10. From orange biofilm of fouling on top of the vertical cliff isolated Arthrobacter sp. 28r5g1, from the moss-- Serratia sp. 6r1g. According to the results, Frondihabitans strains most frequently encountered among chemoorganotrophic aerobic bacteria in the Antarctic phytocenoses.
Bayesian methods for the design and analysis of noninferiority trials.
Gamalo-Siebers, Margaret; Gao, Aijun; Lakshminarayanan, Mani; Liu, Guanghan; Natanegara, Fanni; Railkar, Radha; Schmidli, Heinz; Song, Guochen
2016-01-01
The gold standard for evaluating treatment efficacy of a medical product is a placebo-controlled trial. However, when the use of placebo is considered to be unethical or impractical, a viable alternative for evaluating treatment efficacy is through a noninferiority (NI) study where a test treatment is compared to an active control treatment. The minimal objective of such a study is to determine whether the test treatment is superior to placebo. An assumption is made that if the active control treatment remains efficacious, as was observed when it was compared against placebo, then a test treatment that has comparable efficacy with the active control, within a certain range, must also be superior to placebo. Because of this assumption, the design, implementation, and analysis of NI trials present challenges for sponsors and regulators. In designing and analyzing NI trials, substantial historical data are often required on the active control treatment and placebo. Bayesian approaches provide a natural framework for synthesizing the historical data in the form of prior distributions that can effectively be used in design and analysis of a NI clinical trial. Despite a flurry of recent research activities in the area of Bayesian approaches in medical product development, there are still substantial gaps in recognition and acceptance of Bayesian approaches in NI trial design and analysis. The Bayesian Scientific Working Group of the Drug Information Association provides a coordinated effort to target the education and implementation issues on Bayesian approaches for NI trials. In this article, we provide a review of both frequentist and Bayesian approaches in NI trials, and elaborate on the implementation for two common Bayesian methods including hierarchical prior method and meta-analytic-predictive approach. Simulations are conducted to investigate the properties of the Bayesian methods, and some real clinical trial examples are presented for illustration.
Veeramah, Krishna R; Woerner, August E; Johnstone, Laurel; Gut, Ivo; Gut, Marta; Marques-Bonet, Tomas; Carbone, Lucia; Wall, Jeff D; Hammer, Michael F
2015-05-01
Gibbons are believed to have diverged from the larger great apes ∼16.8 MYA and today reside in the rainforests of Southeast Asia. Based on their diploid chromosome number, the family Hylobatidae is divided into four genera, Nomascus, Symphalangus, Hoolock, and Hylobates. Genetic studies attempting to elucidate the phylogenetic relationships among gibbons using karyotypes, mitochondrial DNA (mtDNA), the Y chromosome, and short autosomal sequences have been inconclusive . To examine the relationships among gibbon genera in more depth, we performed second-generation whole genome sequencing (WGS) to a mean of ∼15× coverage in two individuals from each genus. We developed a coalescent-based approximate Bayesian computation (ABC) method incorporating a model of sequencing error generated by high coverage exome validation to infer the branching order, divergence times, and effective population sizes of gibbon taxa. Although Hoolock and Symphalangus are likely sister taxa, we could not confidently resolve a single bifurcating tree despite the large amount of data analyzed. Instead, our results support the hypothesis that all four gibbon genera diverged at approximately the same time. Assuming an autosomal mutation rate of 1 × 10(-9)/site/year this speciation process occurred ∼5 MYA during a period in the Early Pliocene characterized by climatic shifts and fragmentation of the Sunda shelf forests. Whole genome sequencing of additional individuals will be vital for inferring the extent of gene flow among species after the separation of the gibbon genera. Copyright © 2015 by the Genetics Society of America.
Veeramah, Krishna R.; Woerner, August E.; Johnstone, Laurel; Gut, Ivo; Gut, Marta; Marques-Bonet, Tomas; Carbone, Lucia; Wall, Jeff D.; Hammer, Michael F.
2015-01-01
Gibbons are believed to have diverged from the larger great apes ∼16.8 MYA and today reside in the rainforests of Southeast Asia. Based on their diploid chromosome number, the family Hylobatidae is divided into four genera, Nomascus, Symphalangus, Hoolock, and Hylobates. Genetic studies attempting to elucidate the phylogenetic relationships among gibbons using karyotypes, mitochondrial DNA (mtDNA), the Y chromosome, and short autosomal sequences have been inconclusive . To examine the relationships among gibbon genera in more depth, we performed second-generation whole genome sequencing (WGS) to a mean of ∼15× coverage in two individuals from each genus. We developed a coalescent-based approximate Bayesian computation (ABC) method incorporating a model of sequencing error generated by high coverage exome validation to infer the branching order, divergence times, and effective population sizes of gibbon taxa. Although Hoolock and Symphalangus are likely sister taxa, we could not confidently resolve a single bifurcating tree despite the large amount of data analyzed. Instead, our results support the hypothesis that all four gibbon genera diverged at approximately the same time. Assuming an autosomal mutation rate of 1 × 10−9/site/year this speciation process occurred ∼5 MYA during a period in the Early Pliocene characterized by climatic shifts and fragmentation of the Sunda shelf forests. Whole genome sequencing of additional individuals will be vital for inferring the extent of gene flow among species after the separation of the gibbon genera. PMID:25769979
Dediu, Dan
2011-02-07
Language is a hallmark of our species and understanding linguistic diversity is an area of major interest. Genetic factors influencing the cultural transmission of language provide a powerful and elegant explanation for aspects of the present day linguistic diversity and a window into the emergence and evolution of language. In particular, it has recently been proposed that linguistic tone-the usage of voice pitch to convey lexical and grammatical meaning-is biased by two genes involved in brain growth and development, ASPM and Microcephalin. This hypothesis predicts that tone is a stable characteristic of language because of its 'genetic anchoring'. The present paper tests this prediction using a Bayesian phylogenetic framework applied to a large set of linguistic features and language families, using multiple software implementations, data codings, stability estimations, linguistic classifications and outgroup choices. The results of these different methods and datasets show a large agreement, suggesting that this approach produces reliable estimates of the stability of linguistic data. Moreover, linguistic tone is found to be stable across methods and datasets, providing suggestive support for the hypothesis of genetic influences on its distribution.
Bayesian accelerated failure time analysis with application to veterinary epidemiology.
Bedrick, E J; Christensen, R; Johnson, W O
2000-01-30
Standard methods for analysing survival data with covariates rely on asymptotic inferences. Bayesian methods can be performed using simple computations and are applicable for any sample size. We propose a practical method for making prior specifications and discuss a complete Bayesian analysis for parametric accelerated failure time regression models. We emphasize inferences for the survival curve rather than regression coefficients. A key feature of the Bayesian framework is that model comparisons for various choices of baseline distribution are easily handled by the calculation of Bayes factors. Such comparisons between non-nested models are difficult in the frequentist setting. We illustrate diagnostic tools and examine the sensitivity of the Bayesian methods. Copyright 2000 John Wiley & Sons, Ltd.
Dealing with Reflection Invariance in Bayesian Factor Analysis.
Erosheva, Elena A; Curtis, S McKay
2017-03-13
This paper considers the reflection unidentifiability problem in confirmatory factor analysis (CFA) and the associated implications for Bayesian estimation. We note a direct analogy between the multimodality in CFA models that is due to all possible column sign changes in the matrix of loadings and the multimodality in finite mixture models that is due to all possible relabelings of the mixture components. Drawing on this analogy, we derive and present a simple approach for dealing with reflection in variance in Bayesian factor analysis. We recommend fitting Bayesian factor analysis models without rotational constraints on the loadings-allowing Markov chain Monte Carlo algorithms to explore the full posterior distribution-and then using a relabeling algorithm to pick a factor solution that corresponds to one mode. We demonstrate our approach on the case of a bifactor model; however, the relabeling algorithm is straightforward to generalize for handling multimodalities due to sign invariance in the likelihood in other factor analysis models.
A phylogenetic analysis of the phylum Fibrobacteres.
Jewell, Kelsea A; Scott, Jarrod J; Adams, Sandra M; Suen, Garret
2013-09-01
Members of the phylum Fibrobacteres are highly efficient cellulolytic bacteria, best known for their role in rumen function and as potential sources of novel enzymes for bioenergy applications. Despite being key members of ruminants and other digestive microbial communities, our knowledge of this phylum remains incomplete, as much of our understanding is focused on two recognized species, Fibrobacter succinogenes and F. intestinalis. As a result, we lack insights regarding the environmental niche, host range, and phylogenetic organization of this phylum. Here, we analyzed over 1000 16S rRNA Fibrobacteres sequences available from public databases to establish a phylogenetic framework for this phylum. We identify both species- and genus-level clades that are suggestive of previously unknown taxonomic relationships between Fibrobacteres in addition to their putative lifestyles as host-associated or free-living. Our results shed light on this poorly understood phylum and will be useful for elucidating the function, distribution, and diversity of these bacteria in their niches.
TREEFINDER: a powerful graphical analysis environment for molecular phylogenetics
Jobb, Gangolf; von Haeseler, Arndt; Strimmer, Korbinian
2004-01-01
Background Most analysis programs for inferring molecular phylogenies are difficult to use, in particular for researchers with little programming experience. Results TREEFINDER is an easy-to-use integrative platform-independent analysis environment for molecular phylogenetics. In this paper the main features of TREEFINDER (version of April 2004) are described. TREEFINDER is written in ANSI C and Java and implements powerful statistical approaches for inferring gene tree and related analyzes. In addition, it provides a user-friendly graphical interface and a phylogenetic programming language. Conclusions TREEFINDER is a versatile framework for analyzing phylogenetic data across different platforms that is suited both for exploratory as well as advanced studies. PMID:15222900
A phylogenetic analysis of Aquifex pyrophilus
NASA Technical Reports Server (NTRS)
Burggraf, S.; Olsen, G. J.; Stetter, K. O.; Woese, C. R.
1992-01-01
The 16S rRNA of the bacterion Aquifex pyrophilus, a microaerophilic, oxygen-reducing hyperthermophile, has been sequenced directly from the the PCR amplified gene. Phylogenetic analyses show the Aq. pyrophilus lineage to be probably the deepest (earliest) in the (eu)bacterial tree. The addition of this deep branching to the bacterial tree further supports the argument that the Bacteria are of thermophilic ancestry.
A phylogenetic analysis of Aquifex pyrophilus.
Burggraf, S; Olsen, G J; Stetter, K O; Woese, C R
1992-08-01
The 16S rRNA of the bacterion Aquifex pyrophilus, a microaerophilic, oxygen-reducing hyperthermophile, has been sequenced directly from the the PCR amplified gene. Phylogenetic analyses show the Aq. pyrophilus lineage to be probably the deepest (earliest) in the (eu)bacterial tree. The addition of this deep branching to the bacterial tree further supports the argument that the Bacteria are of thermophilic ancestry.
Bayesian linkage and segregation analysis: factoring the problem.
Matthysse, S
2000-01-01
Complex segregation analysis and linkage methods are mathematical techniques for the genetic dissection of complex diseases. They are used to delineate complex modes of familial transmission and to localize putative disease susceptibility loci to specific chromosomal locations. The computational problem of Bayesian linkage and segregation analysis is one of integration in high-dimensional spaces. In this paper, three available techniques for Bayesian linkage and segregation analysis are discussed: Markov Chain Monte Carlo (MCMC), importance sampling, and exact calculation. The contribution of each to the overall integration will be explicitly discussed.
Wang, Wei; Xia, Minxuan; Chen, Jie; Deng, Fenni; Yuan, Rui; Zhang, Xiaopei; Shen, Fafu
2016-12-01
The data presented in this paper is supporting the research article "Genome-Wide Analysis of Superoxide Dismutase Gene Family in Gossypium raimondii and G. arboreum" [1]. In this data article, we present phylogenetic tree showing dichotomy with two different clusters of SODs inferred by the Bayesian method of MrBayes (version 3.2.4), "Bayesian phylogenetic inference under mixed models" [2], Ramachandran plots of G. raimondii and G. arboreum SODs, the protein sequence used to generate 3D sructure of proteins and the template accession via SWISS-MODEL server, "SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information." [3] and motif sequences of SODs identified by InterProScan (version 4.8) with the Pfam database, "Pfam: the protein families database" [4].
Jiang, Xianhuan; Gao, Jun; Ni, Liju; Hu, Jianhua; Li, Kai; Sun, Fengping; Xie, Jianyun; Bo, Xiong; Gao, Chen; Xiao, Junhua; Zhou, Yuxun
2012-05-01
Microtus fortis is a special resource of rodent in China. It is a promising experimental animal model for the study on the mechanism of Schistosome japonicum resistance. The first complete mitochondrial genome sequence for Microtus fortis calamorum, a subspecies of M. fortis (Arvicolinae, Rodentia), was reported in this study. The mitochondrial genome sequence of M. f. calamorum (Genbank: JF261175) showed a typical vertebrate pattern with 13 protein coding genes, 2 ribosomal RNAs, 22 transfer RNAs and one major noncoding region (CR region).The extended termination associated sequences (ETAS-1 and ETAS-2) and conserved sequence block 1 (CSB-1) were found in the CR region. The putative origin of replication for the light strand (O(L)) of M. f. calamorum was 35bp long and showed high conservation in stem and adjacent sequences, but the difference existed in the loop region among three species of genus Microtus. In order to investigate the phylogenetic position of M. f. calamorum, the phylogenetic trees (Maximum likelihood and Bayesian methods) were constructed based on 12 protein-coding genes (except for ND6 gene) on H strand from 16 rodent species. M. f. calamorum was classified into genus Microtus, Arvcicolinae for the highly phylogenetic relationship with Microtus kikuchii (Taiwan vole). Further phylogenetic analysis results based on the cytochrome b gene ranged from M. f. calamorum to one of the subspecies of M. fortis, which formed a sister group of Microtus middendorfii in the genus Microtus.
Genetic and phylogenetic analysis of feline calicivirus isolates in China.
Sun, Yaxin; Deng, Mingliang; Peng, Zhong; Hu, Ruiming; Chen, Huanchun; Wu, Bin
2017-02-01
The aim of this study was to determine the genetic diversity of Chinese feline calicivirus (FCV) isolates and their phylogenetic relationship with isolates from elsewhere in the world. Phylogenetic analysis was performed based on the partial open reading frame (ORF) 2 sequences (regions B-F) of 21 Chinese FCV isolates and 30 global isolates. The Chinese isolates included 13 isolates from Wuhan, which were isolated in this study, and eight previously published isolates. Sixteen Chinese isolates and two Japanese isolates formed a distinct phylogenetic cluster. Phylogenetic analysis based on the sequences of the complete genome, ORF1, ORF2 and ORF3 of selected isolates supported the above findings. Genogroup analysis revealed that FCV genogroup II is present in China. These findings suggest that Chinese FCV isolates are closely related to Japanese FCV isolates. Copyright © 2016 Elsevier Ltd. All rights reserved.
Steinfartz, Sebastian; Vicario, Saverio; Arntzen, J W; Caccone, Adalgisa
2007-03-15
The monophyly of European newts of the genus Triturus within the family Salamandridae has for decades rested on presumably homologous behavioral and morphological characters. Molecular data challenge this hypothesis, but the phylogenetic position of Triturus within the Salamandridae has not yet been convincingly resolved. We addressed this issue and the temporal divergence of Triturus within the Salamandridae with novel Bayesian approaches applied to DNA sequence data from three mitochondrial genes (12S, 16S and cytb). We included 38 salamandrid species comprising all 13 recognized species of Triturus and 16 out of 17 salamandrid genera. A clade comprising all the "Newts" can be separated from the "True Salamanders" and Salamandrina clades. Within the "Newts" well-supported clades are: Tylototriton-Pleurodeles, the "New World Newts" (Notophthalmus-Taricha), and the "Modern Eurasian Newts" (Cynops, Pachytriton, Paramesotriton=together the "Modern Asian Newts", Calotriton, Euproctus, Neurergus and Triturus species). We found that Triturus is a non-monophyletic species assemblage, which includes four groups that are themselves monophyletic: (i) the "Large-Bodied Triturus" (six species), (ii) the "Small-Bodied Triturus" (five species), (iii) T. alpestris and (iv) T. vittatus. We estimated that the last common ancestor of Triturus existed around 64 million years ago (mya) while the root of the Salamandridae dates back to 95 mya. This was estimated using a fossil-based molecular dating approach and an explicit framework to select calibration points that least underestimated their corresponding nodes. Using the molecular phylogeny we mapped the evolution of life history and courtship traits in Triturus and found that several Triturus-specific courtship traits evolved independently.
Aberer, Andre J; Stamatakis, Alexandros; Ronquist, Fredrik
2016-01-01
Sampling tree space is the most challenging aspect of Bayesian phylogenetic inference. The sheer number of alternative topologies is problematic by itself. In addition, the complex dependency between branch lengths and topology increases the difficulty of moving efficiently among topologies. Current tree proposals are fast but sample new trees using primitive transformations or re-mappings of old branch lengths. This reduces acceptance rates and presumably slows down convergence and mixing. Here, we explore branch proposals that do not rely on old branch lengths but instead are based on approximations of the conditional posterior. Using a diverse set of empirical data sets, we show that most conditional branch posteriors can be accurately approximated via a [Formula: see text] distribution. We empirically determine the relationship between the logarithmic conditional posterior density, its derivatives, and the characteristics of the branch posterior. We use these relationships to derive an independence sampler for proposing branches with an acceptance ratio of ~90% on most data sets. This proposal samples branches between 2× and 3× more efficiently than traditional proposals with respect to the effective sample size per unit of runtime. We also compare the performance of standard topology proposals with hybrid proposals that use the new independence sampler to update those branches that are most affected by the topological change. Our results show that hybrid proposals can sometimes noticeably decrease the number of generations necessary for topological convergence. Inconsistent performance gains indicate that branch updates are not the limiting factor in improving topological convergence for the currently employed set of proposals. However, our independence sampler might be essential for the construction of novel tree proposals that apply more radical topology changes.
Spatiotemporal Bayesian inference dipole analysis for MEG neuroimaging data.
Jun, Sung C; George, John S; Paré-Blagoev, Juliana; Plis, Sergey M; Ranken, Doug M; Schmidt, David M; Wood, C C
2005-10-15
Recently, we described a Bayesian inference approach to the MEG/EEG inverse problem that used numerical techniques to estimate the full posterior probability distributions of likely solutions upon which all inferences were based [Schmidt, D.M., George, J.S., Wood, C.C., 1999. Bayesian inference applied to the electromagnetic inverse problem. Human Brain Mapping 7, 195; Schmidt, D.M., George, J.S., Ranken, D.M., Wood, C.C., 2001. Spatial-temporal bayesian inference for MEG/EEG. In: Nenonen, J., Ilmoniemi, R. J., Katila, T. (Eds.), Biomag 2000: 12th International Conference on Biomagnetism. Espoo, Norway, p. 671]. Schmidt et al. (1999) focused on the analysis of data at a single point in time employing an extended region source model. They subsequently extended their work to a spatiotemporal Bayesian inference analysis of the full spatiotemporal MEG/EEG data set. Here, we formulate spatiotemporal Bayesian inference analysis using a multi-dipole model of neural activity. This approach is faster than the extended region model, does not require use of the subject's anatomical information, does not require prior determination of the number of dipoles, and yields quantitative probabilistic inferences. In addition, we have incorporated the ability to handle much more complex and realistic estimates of the background noise, which may be represented as a sum of Kronecker products of temporal and spatial noise covariance components. This reduces the effects of undermodeling noise. In order to reduce the rigidity of the multi-dipole formulation which commonly causes problems due to multiple local minima, we treat the given covariance of the background as uncertain and marginalize over it in the analysis. Markov Chain Monte Carlo (MCMC) was used to sample the many possible likely solutions. The spatiotemporal Bayesian dipole analysis is demonstrated using simulated and empirical whole-head MEG data.
Nested sampling applied in Bayesian room-acoustics decay analysis.
Jasa, Tomislav; Xiang, Ning
2012-11-01
Room-acoustic energy decays often exhibit single-rate or multiple-rate characteristics in a wide variety of rooms/halls. Both the energy decay order and decay parameter estimation are of practical significance in architectural acoustics applications, representing two different levels of Bayesian probabilistic inference. This paper discusses a model-based sound energy decay analysis within a Bayesian framework utilizing the nested sampling algorithm. The nested sampling algorithm is specifically developed to evaluate the Bayesian evidence required for determining the energy decay order with decay parameter estimates as a secondary result. Taking the energy decay analysis in architectural acoustics as an example, this paper demonstrates that two different levels of inference, decay model-selection and decay parameter estimation, can be cohesively accomplished by the nested sampling algorithm.
Alfaro, Michael E; Zoller, Stefan; Lutzoni, François
2003-02-01
Bayesian Markov chain Monte Carlo sampling has become increasingly popular in phylogenetics as a method for both estimating the maximum likelihood topology and for assessing nodal confidence. Despite the growing use of posterior probabilities, the relationship between the Bayesian measure of confidence and the most commonly used confidence measure in phylogenetics, the nonparametric bootstrap proportion, is poorly understood. We used computer simulation to investigate the behavior of three phylogenetic confidence methods: Bayesian posterior probabilities calculated via Markov chain Monte Carlo sampling (BMCMC-PP), maximum likelihood bootstrap proportion (ML-BP), and maximum parsimony bootstrap proportion (MP-BP). We simulated the evolution of DNA sequence on 17-taxon topologies under 18 evolutionary scenarios and examined the performance of these methods in assigning confidence to correct monophyletic and incorrect monophyletic groups, and we examined the effects of increasing character number on support value. BMCMC-PP and ML-BP were often strongly correlated with one another but could provide substantially different estimates of support on short internodes. In contrast, BMCMC-PP correlated poorly with MP-BP across most of the simulation conditions that we examined. For a given threshold value, more correct monophyletic groups were supported by BMCMC-PP than by either ML-BP or MP-BP. When threshold values were chosen that fixed the rate of accepting incorrect monophyletic relationship as true at 5%, all three methods recovered most of the correct relationships on the simulated topologies, although BMCMC-PP and ML-BP performed better than MP-BP. BMCMC-PP was usually a less biased predictor of phylogenetic accuracy than either bootstrapping method. BMCMC-PP provided high support values for correct topological bipartitions with fewer characters than was needed for nonparametric bootstrap.
Bayesian Analysis of the Pattern Informatics Technique
NASA Astrophysics Data System (ADS)
Cho, N.; Tiampo, K.; Klein, W.; Rundle, J.
2007-12-01
The pattern informatics (PI) [Rundle et al., 2000; Tiampo et al., 2002; Holliday et al., 2005] is a technique that uses phase dynamics in order to quantify temporal variations in seismicity patterns. This technique has shown interesting results for forecasting earthquakes with magnitude greater than or equal to 5 in southern California from 2000 to 2010 [Rundle et al., 2002]. In this work, a Bayesian approach is used to obtain a modified updated version of the PI called Bayesian pattern informatics (BPI). This alternative method uses the PI result as a prior probability and models such as ETAS [Ogata, 1988, 2004; Helmstetter and Sornette, 2002] or BASS [Turcotte et al., 2007] in order to obtain the likelihood. Its result is similar to the one obtained by the PI: the determination of regions, known as hotspots, that are most susceptible to the occurrence of events with M=5 and larger during the forecast period. As an initial test, retrospective forecasts for the southern California region from 1990 to 2000 were made with both the BPI and the PI techniques, and the results are discussed in this work.
Feldman, Sanford H; Ntenda, Abraham M
2011-01-01
Uncertainties in ozone concentrations predicted with a Lagrangian photochemical air quality model have been estimated using Bayesian Monte Carlo (BMC) analysis. Bayesian Monte Carlo analysis provides a means of combining subjective "prior" uncertainty estimates developed
Bayesian networks as a tool for epidemiological systems analysis
NASA Astrophysics Data System (ADS)
Lewis, F. I.
2012-11-01
Bayesian network analysis is a form of probabilistic modeling which derives from empirical data a directed acyclic graph (DAG) describing the dependency structure between random variables. Bayesian networks are increasingly finding application in areas such as computational and systems biology, and more recently in epidemiological analyses. The key distinction between standard empirical modeling approaches, such as generalised linear modeling, and Bayesian network analyses is that the latter attempts not only to identify statistically associated variables, but to additionally, and empirically, separate these into those directly and indirectly dependent with one or more outcome variables. Such discrimination is vastly more ambitious but has the potential to reveal far more about key features of complex disease systems. Applying Bayesian network modeling to biological and medical data has considerable computational demands, combined with the need to ensure robust model selection given the vast model space of possible DAGs. These challenges require the use of approximation techniques, such as the Laplace approximation, Markov chain Monte Carlo simulation and parametric bootstrapping, along with computational parallelization. A case study in structure discovery - identification of an optimal DAG for given data - is presented which uses additive Bayesian networks to explore veterinary disease data of industrial and medical relevance.
PhyloPat: phylogenetic pattern analysis of eukaryotic genes
Hulsen, Tim; de Vlieg, Jacob; Groenen, Peter MA
2006-01-01
Background Phylogenetic patterns show the presence or absence of certain genes or proteins in a set of species. They can also be used to determine sets of genes or proteins that occur only in certain evolutionary branches. Phylogenetic patterns analysis has routinely been applied to protein databases such as COG and OrthoMCL, but not upon gene databases. Here we present a tool named PhyloPat which allows the complete Ensembl gene database to be queried using phylogenetic patterns. Description PhyloPat is an easy-to-use webserver, which can be used to query the orthologies of all complete genomes within the EnsMart database using phylogenetic patterns. This enables the determination of sets of genes that occur only in certain evolutionary branches or even single species. We found in total 446,825 genes and 3,164,088 orthologous relationships within the EnsMart v40 database. We used a single linkage clustering algorithm to create 147,922 phylogenetic lineages, using every one of the orthologies provided by Ensembl. PhyloPat provides the possibility of querying with either binary phylogenetic patterns (created by checkboxes) or regular expressions. Specific branches of a phylogenetic tree of the 21 included species can be selected to create a branch-specific phylogenetic pattern. Users can also input a list of Ensembl or EMBL IDs to check which phylogenetic lineage any gene belongs to. The output can be saved in HTML, Excel or plain text format for further analysis. A link to the FatiGO web interface has been incorporated in the HTML output, creating easy access to functional information. Finally, lists of omnipresent, polypresent and oligopresent genes have been included. Conclusion PhyloPat is the first tool to combine complete genome information with phylogenetic pattern querying. Since we used the orthologies generated by the accurate pipeline of Ensembl, the obtained phylogenetic lineages are reliable. The completeness and reliability of these phylogenetic
On Bayesian analysis of on-off measurements
NASA Astrophysics Data System (ADS)
Nosek, Dalibor; Nosková, Jana
2016-06-01
We propose an analytical solution to the on-off problem within the framework of Bayesian statistics. Both the statistical significance for the discovery of new phenomena and credible intervals on model parameters are presented in a consistent way. We use a large enough family of prior distributions of relevant parameters. The proposed analysis is designed to provide Bayesian solutions that can be used for any number of observed on-off events, including zero. The procedure is checked using Monte Carlo simulations. The usefulness of the method is demonstrated on examples from γ-ray astronomy.
Bayesian analysis of a disability model for lung cancer survival.
Armero, C; Cabras, S; Castellanos, M E; Perra, S; Quirós, A; Oruezábal, M J; Sánchez-Rubio, J
2016-02-01
Bayesian reasoning, survival analysis and multi-state models are used to assess survival times for Stage IV non-small-cell lung cancer patients and the evolution of the disease over time. Bayesian estimation is done using minimum informative priors for the Weibull regression survival model, leading to an automatic inferential procedure. Markov chain Monte Carlo methods have been used for approximating posterior distributions and the Bayesian information criterion has been considered for covariate selection. In particular, the posterior distribution of the transition probabilities, resulting from the multi-state model, constitutes a very interesting tool which could be useful to help oncologists and patients make efficient and effective decisions. © The Author(s) 2012.
Bayesian analysis, pattern analysis, and data mining in health care.
Lucas, Peter
2004-10-01
To discuss the current role of data mining and Bayesian methods in biomedicine and heath care, in particular critical care. Bayesian networks and other probabilistic graphical models are beginning to emerge as methods for discovering patterns in biomedical data and also as a basis for the representation of the uncertainties underlying clinical decision-making. At the same time, techniques from machine learning are being used to solve biomedical and health-care problems. With the increasing availability of biomedical and health-care data with a wide range of characteristics there is an increasing need to use methods which allow modeling the uncertainties that come with the problem, are capable of dealing with missing data, allow integrating data from various sources, explicitly indicate statistical dependence and independence, and allow integrating biomedical and clinical background knowledge. These requirements have given rise to an influx of new methods into the field of data analysis in health care, in particular from the fields of machine learning and probabilistic graphical models. Copyright 2004 Lippincott Williams & Wilkins
A Comparison of Imputation Methods for Bayesian Factor Analysis Models
ERIC Educational Resources Information Center
Merkle, Edgar C.
2011-01-01
Imputation methods are popular for the handling of missing data in psychology. The methods generally consist of predicting missing data based on observed data, yielding a complete data set that is amiable to standard statistical analyses. In the context of Bayesian factor analysis, this article compares imputation under an unrestricted…
bamr: Bayesian analysis of mass and radius observations
NASA Astrophysics Data System (ADS)
Steiner, Andrew W.
2014-08-01
bamr is an MPI implementation of a Bayesian analysis of neutron star mass and radius data that determines the mass versus radius curve and the equation of state of dense matter. Written in C++, bamr provides some EOS models. This code requires O2scl (ascl:1408.019) be installed before compilation.
Exploration of phylogenetic data using a global sequence analysis method
Chapus, Charles; Dufraigne, Christine; Edwards, Scott; Giron, Alain; Fertil, Bernard; Deschavanne, Patrick
2005-01-01
Background Molecular phylogenetic methods are based on alignments of nucleic or peptidic sequences. The tremendous increase in molecular data permits phylogenetic analyses of very long sequences and of many species, but also requires methods to help manage large datasets. Results Here we explore the phylogenetic signal present in molecular data by genomic signatures, defined as the set of frequencies of short oligonucleotides present in DNA sequences. Although violating many of the standard assumptions of traditional phylogenetic analyses – in particular explicit statements of homology inherent in character matrices – the use of the signature does permit the analysis of very long sequences, even those that are unalignable, and is therefore most useful in cases where alignment is questionable. We compare the results obtained by traditional phylogenetic methods to those inferred by the signature method for two genes: RAG1, which is easily alignable, and 18S RNA, where alignments are often ambiguous for some regions. We also apply this method to a multigene data set of 33 genes for 9 bacteria and one archea species as well as to the whole genome of a set of 16 γ-proteobacteria. In addition to delivering phylogenetic results comparable to traditional methods, the comparison of signatures for the sequences involved in the bacterial example identified putative candidates for horizontal gene transfers. Conclusion The signature method is therefore a fast tool for exploring phylogenetic data, providing not only a pretreatment for discovering new sequence relationships, but also for identifying cases of sequence evolution that could confound traditional phylogenetic analysis. PMID:16280081
A new Bayesian Earthquake Analysis Tool (BEAT)
NASA Astrophysics Data System (ADS)
Vasyura-Bathke, Hannes; Dutta, Rishabh; Jónsson, Sigurjón; Mai, Martin
2017-04-01
Modern earthquake source estimation studies increasingly use non-linear optimization strategies to estimate kinematic rupture parameters, often considering geodetic and seismic data jointly. However, the optimization process is complex and consists of several steps that need to be followed in the earthquake parameter estimation procedure. These include pre-describing or modeling the fault geometry, calculating the Green's Functions (often assuming a layered elastic half-space), and estimating the distributed final slip and possibly other kinematic source parameters. Recently, Bayesian inference has become popular for estimating posterior distributions of earthquake source model parameters given measured/estimated/assumed data and model uncertainties. For instance, some research groups consider uncertainties of the layered medium and propagate these to the source parameter uncertainties. Other groups make use of informative priors to reduce the model parameter space. In addition, innovative sampling algorithms have been developed that efficiently explore the often high-dimensional parameter spaces. Compared to earlier studies, these improvements have resulted in overall more robust source model parameter estimates that include uncertainties. However, the computational demands of these methods are high and estimation codes are rarely distributed along with the published results. Even if codes are made available, it is often difficult to assemble them into a single optimization framework as they are typically coded in different programing languages. Therefore, further progress and future applications of these methods/codes are hampered, while reproducibility and validation of results has become essentially impossible. In the spirit of providing open-access and modular codes to facilitate progress and reproducible research in earthquake source estimations, we undertook the effort of producing BEAT, a python package that comprises all the above-mentioned features in one
A phylogenetic analysis of the megadiverse Chalcidoidea (Hymenoptera)
USDA-ARS?s Scientific Manuscript database
Chalcidoidea (Hymenoptera) are extremely diverse with an estimated 500,000 species. We present the first phylogenetic analysis of the superfamily based on a cladistic analysis of both morphological and molecular data. A total of 233 morphological characters were scored for 300 taxa and 265 genera, a...
ERIC Educational Resources Information Center
Bayesian analysis of genetic differentiation between populations.
Corander, Jukka; Waldmann, Patrik; Sillanpää, Mikko J
2003-01-01
We introduce a Bayesian method for estimating hidden population substructure using multilocus molecular markers and geographical information provided by the sampling design. The joint posterior distribution of the substructure and allele frequencies of the respective populations is available in an analytical form when the number of populations is small, whereas an approximation based on a Markov chain Monte Carlo simulation approach can be obtained for a moderate or large number of populations. Using the joint posterior distribution, posteriors can also be derived for any evolutionary population parameters, such as the traditional fixation indices. A major advantage compared to most earlier methods is that the number of populations is treated here as an unknown parameter. What is traditionally considered as two genetically distinct populations, either recently founded or connected by considerable gene flow, is here considered as one panmictic population with a certain probability based on marker data and prior information. Analyses of previously published data on the Moroccan argan tree (Argania spinosa) and of simulated data sets suggest that our method is capable of estimating a population substructure, while not artificially enforcing a substructure when it does not exist. The software (BAPS) used for the computations is freely available from http://www.rni.helsinki.fi/~mjs. PMID:12586722
Phylogenetic Analysis and Epidemic History of Hepatitis C Virus Genotype 2 in Tunisia, North Africa
Rajhi, Mouna; Ghedira, Kais; Chouikha, Anissa; Djebbi, Ahlem; Cheikh, Imed; Ben Yahia, Ahlem; Sadraoui, Amel; Hammami, Walid; Azouz, Msaddek; Ben Mami, Nabil; Triki, Henda
2016-01-01
HCV genotype 2 (HCV-2) has a worldwide distribution with prevalence rates that vary from country to country. High genetic diversity and long-term endemicity were suggested in West African countries. A global dispersal of HCV-2 would have occurred during the 20th century, especially in European countries. In Tunisia, genotype 2 was the second prevalent genotype after genotype 1 and most isolates belong to subtypes 2c and 2k. In this study, phylogenetic analyses based on the NS5B genomic sequences of 113 Tunisian HCV isolates from subtypes 2c and 2k were carried out. A Bayesian coalescent-based framework was used to estimate the origin and the spread of these subtypes circulating in Tunisia. Phylogenetic analyses of HCV-2c sequences suggest the absence of country-specific or time-specific variants. In contrast, the phylogenetic grouping of HCV-2k sequences shows the existence of two major genetic clusters that may represent two distinct circulating variants. Coalescent analysis indicated a most recent common ancestor (tMRCA) of Tunisian HCV-2c around 1886 (1869–1902) before the introduction of HCV-2k in 1901 (1867–1931). Our findings suggest that the introduction of HCV-2c in Tunisia is possibly a result of population movements between Tunisia and European population following the French colonization. PMID:27100294
Phylogenetic Analysis and Epidemic History of Hepatitis C Virus Genotype 2 in Tunisia, North Africa.
Rajhi, Mouna; Ghedira, Kais; Chouikha, Anissa; Djebbi, Ahlem; Cheikh, Imed; Ben Yahia, Ahlem; Sadraoui, Amel; Hammami, Walid; Azouz, Msaddek; Ben Mami, Nabil; Triki, Henda
2016-01-01
HCV genotype 2 (HCV-2) has a worldwide distribution with prevalence rates that vary from country to country. High genetic diversity and long-term endemicity were suggested in West African countries. A global dispersal of HCV-2 would have occurred during the 20th century, especially in European countries. In Tunisia, genotype 2 was the second prevalent genotype after genotype 1 and most isolates belong to subtypes 2c and 2k. In this study, phylogenetic analyses based on the NS5B genomic sequences of 113 Tunisian HCV isolates from subtypes 2c and 2k were carried out. A Bayesian coalescent-based framework was used to estimate the origin and the spread of these subtypes circulating in Tunisia. Phylogenetic analyses of HCV-2c sequences suggest the absence of country-specific or time-specific variants. In contrast, the phylogenetic grouping of HCV-2k sequences shows the existence of two major genetic clusters that may represent two distinct circulating variants. Coalescent analysis indicated a most recent common ancestor (tMRCA) of Tunisian HCV-2c around 1886 (1869-1902) before the introduction of HCV-2k in 1901 (1867-1931). Our findings suggest that the introduction of HCV-2c in Tunisia is possibly a result of population movements between Tunisia and European population following the French colonization.
Zhang, Xiao; Zhou, Tao; Kanwal, Nazish; Zhao, Yuemei; Bai, Guoqing; Zhao, Guifang
2017-01-01
Gynostemma BL., belonging to the family Cucurbitaceae, is a genus containing 17 creeping herbaceous species mainly distributed in East Asia. It can be divided into two subgenera based on different fruit morphology. Herein, we report eight complete chloroplast genome sequences of the genus Gynostemma, which were obtained by Illumina paired-end sequencing, assembly, and annotation. The length of the eight complete cp genomes ranged from 157,576 bp (G. pentaphyllum) to 158,273 bp (G. laxiflorum). Each encoded 133 genes, including 87 protein-coding genes, 37 tRNA genes, eight rRNA genes, and one pseudogene. The four types of repeated sequences had been discovered and indicated that the repeated structure for species in the Subgen. Triostellum was greater than that for species in the Subgen. Gynostemma. The percentage of variation of the eight cp genomes in different regions were calculated, which demonstrated that the coding and inverted repeats regions were highly conserved. Phylogenetic analysis based on Bayesian inference and maximum likelihood methods strongly supported the phylogenetic position of the genus Gynostemma as a member of family Cucurbitaceae. The phylogenetic relationships among the eight species were clearly resolved using the complete cp genome sequences in this study. It will also provide potential molecular markers and candidate DNA barcodes for future studies and enrich the valuable complete cp genome resources of Cucurbitaceae.
A Bayesian Analysis of the Flood Frequency Hydrology Concept
2016-02-01
ERDC/CHL CHETN-X-1 February 2016 Approved for public release; distribution is unlimited. A Bayesian Analysis of the Flood Frequency Hydrology... flood frequency hydrology concept as a formal probabilistic-based means by which to coherently combine and also evaluate the worth of different types...of additional data (i.e., temporal, spatial, and causal) in a flood frequency analysis. This approach is responsive to the stated ultimate goal of
A Bayesian Analysis of Scale-Invariant Processes
2012-01-01
Analysis of Scale-Invariant Processes Jingfeng Wang, Rafael L. Bras, Veronica Nieves Georgia Tech Research Corporation Office of Sponsored Programs...processes Veronica Nieves , Jingfeng Wang, and Rafael L. Bras Citation: AIP Conf. Proc. 1443, 56 (2012); doi: 10.1063/1.3703620 View online: http...http://proceedings.aip.org/about/rights_permissions A Bayesian Analysis of Scale-Invariant Processes Veronica Nieves ∗, Jingfeng Wang† and Rafael L. Bras
Model-based Bayesian inference for ROC data analysis
NASA Astrophysics Data System (ADS)
Lei, Tianhu; Bae, K. Ty
2013-03-01
This paper presents a study of model-based Bayesian inference to Receiver Operating Characteristics (ROC) data. The model is a simple version of general non-linear regression model. Different from Dorfman model, it uses a probit link function with a covariate variable having zero-one two values to express binormal distributions in a single formula. Model also includes a scale parameter. Bayesian inference is implemented by Markov Chain Monte Carlo (MCMC) method carried out by Bayesian analysis Using Gibbs Sampling (BUGS). Contrast to the classical statistical theory, Bayesian approach considers model parameters as random variables characterized by prior distributions. With substantial amount of simulated samples generated by sampling algorithm, posterior distributions of parameters as well as parameters themselves can be accurately estimated. MCMC-based BUGS adopts Adaptive Rejection Sampling (ARS) protocol which requires the probability density function (pdf) which samples are drawing from be log concave with respect to the targeted parameters. Our study corrects a common misconception and proves that pdf of this regression model is log concave with respect to its scale parameter. Therefore, ARS's requirement is satisfied and a Gaussian prior which is conjugate and possesses many analytic and computational advantages is assigned to the scale parameter. A cohort of 20 simulated data sets and 20 simulations from each data set are used in our study. Output analysis and convergence diagnostics for MCMC method are assessed by CODA package. Models and methods by using continuous Gaussian prior and discrete categorical prior are compared. Intensive simulations and performance measures are given to illustrate our practice in the framework of model-based Bayesian inference using MCMC method.
An Overview of Bayesian Methods for Neural Spike Train Analysis
2013-01-01
Neural spike train analysis is an important task in computational neuroscience which aims to understand neural mechanisms and gain insights into neural circuits. With the advancement of multielectrode recording and imaging technologies, it has become increasingly demanding to develop statistical tools for analyzing large neuronal ensemble spike activity. Here we present a tutorial overview of Bayesian methods and their representative applications in neural spike train analysis, at both single neuron and population levels. On the theoretical side, we focus on various approximate Bayesian inference techniques as applied to latent state and parameter estimation. On the application side, the topics include spike sorting, tuning curve estimation, neural encoding and decoding, deconvolution of spike trains from calcium imaging signals, and inference of neuronal functional connectivity and synchrony. Some research challenges and opportunities for neural spike train analysis are discussed. PMID:24348527
Bayesian principal geodesic analysis in diffeomorphic image registration.
Zhang, Miaomiao; Fletcher, P Thomas
2014-01-01
Computing a concise representation of the anatomical variability found in large sets of images is an important first step in many statistical shape analyses. In this paper, we present a generative Bayesian approach for automatic dimensionality reduction of shape variability represented through diffeomorphic mappings. To achieve this, we develop a latent variable model for principal geodesic analysis (PGA) that provides a probabilistic framework for factor analysis on diffeomorphisms. Our key contribution is a Bayesian inference procedure for model parameter estimation and simultaneous detection of the effective dimensionality of the latent space. We evaluate our proposed model for atlas and principal geodesic estimation on the OASIS brain database of magnetic resonance images. We show that the automatically selected latent dimensions from our model are able to reconstruct unseen brain images with lower error than equivalent linear principal components analysis (LPCA) models in the image space, and it also outperforms tangent space PCA (TPCA) models in the diffeomorphism setting.
Molak, Martyna; Suchard, Marc A; Ho, Simon Y W; Beilman, David W; Shapiro, Beth
2015-01-01
Studies of DNA from ancient samples provide a valuable opportunity to gain insight into past evolutionary and demographic processes. Bayesian phylogenetic methods can estimate evolutionary rates and timescales from ancient DNA sequences, with the ages of the samples acting as calibrations for the molecular clock. Sample ages are often estimated using radiocarbon dating, but the associated measurement error is rarely taken into account. In addition, the total uncertainty quantified by converting radiocarbon dates to calendar dates is typically ignored. Here, we present a tool for incorporating both of these sources of uncertainty into Bayesian phylogenetic analyses of ancient DNA. This empirical calibrated radiocarbon sampler (ECRS) integrates the age uncertainty for each ancient sequence over the calibrated probability density function estimated for its radiocarbon date and associated error. We use the ECRS to analyse three ancient DNA data sets. Accounting for radiocarbon-dating and calibration error appeared to have little impact on estimates of evolutionary rates and related parameters for these data sets. However, analyses of other data sets, particularly those with few or only very old radiocarbon dates, might be more sensitive to using artificially precise sample ages and should benefit from use of the ECRS.
A Deliberate Practice Approach to Teaching Phylogenetic Analysis
ERIC Educational Resources Information Center
Hobbs, F. Collin; Johnson, Daniel J.; Kearns, Katherine D.
2013-01-01
One goal of postsecondary education is to assist students in developing expert-level understanding. Previous attempts to encourage expert-level understanding of phylogenetic analysis in college science classrooms have largely focused on isolated, or "one-shot," in-class activities. Using a deliberate practice instructional approach, we…
A Deliberate Practice Approach to Teaching Phylogenetic Analysis
ERIC Educational Resources Information Center
Hobbs, F. Collin; Johnson, Daniel J.; Kearns, Katherine D.
2013-01-01
One goal of postsecondary education is to assist students in developing expert-level understanding. Previous attempts to encourage expert-level understanding of phylogenetic analysis in college science classrooms have largely focused on isolated, or "one-shot," in-class activities. Using a deliberate practice instructional approach, we…
Phylogenetic analysis on the soil bacteria distributed in karst forest
Zhou, JunPei; Huang, Ying; Mo, MingHe
2009-01-01
Phylogenetic composition of bacterial community in soil of a karst forest was analyzed by culture-independent molecular approach. The bacterial 16S rRNA gene was amplified directly from soil DNA and cloned to generate a library. After screening the clone library by RFLP, 16S rRNA genes of representative clones were sequenced and the bacterial community was analyzed phylogenetically. The 16S rRNA gene inserts of 190 clones randomly selected were analyzed by RFLP and generated 126 different RFLP types. After sequencing, 126 non-chimeric sequences were obtained, generating 113 phylotypes. Phylogenetic analysis revealed that the bacteria distributed in soil of the karst forest included the members assigning into Proteobacteria, Acidobacteria, Planctomycetes, Chloroflexi (Green nonsulfur bacteria), Bacteroidetes, Verrucomicrobia, Nitrospirae, Actinobacteria (High G+C Gram-positive bacteria), Firmicutes (Low G+C Gram-positive bacteria) and candidate divisions (including the SPAM and GN08). PMID:24031430
Bayesian Variable Selection in Cost-Effectiveness Analysis
Negrín, Miguel A.; Vázquez-Polo, Francisco J.; Martel, María; Moreno, Elías; Girón, Francisco J.
2010-01-01
Linear regression models are often used to represent the cost and effectiveness of medical treatment. The covariates used may include sociodemographic variables, such as age, gender or race; clinical variables, such as initial health status, years of treatment or the existence of concomitant illnesses; and a binary variable indicating the treatment received. However, most studies estimate only one model, which usually includes all the covariates. This procedure ignores the question of uncertainty in model selection. In this paper, we examine four alternative Bayesian variable selection methods that have been proposed. In this analysis, we estimate the inclusion probability of each covariate in the real model conditional on the data. Variable selection can be useful for estimating incremental effectiveness and incremental cost, through Bayesian model averaging, as well as for subgroup analysis. PMID:20617047
Calibration of Boltzmann distribution priors in Bayesian data analysis.
Mechelke, Martin; Habeck, Michael
2012-12-01
The Boltzmann distribution is commonly used as a prior probability in Bayesian data analysis. Examples include the Ising model in statistical image analysis and the canonical ensemble based on molecular dynamics force fields in protein structure calculation. These models involve a temperature or weighting factor that needs to be inferred from the data. Bayesian inference stipulates to determine the temperature based on the model evidence. This is challenging because the model evidence, a ratio of two high-dimensional normalization integrals, cannot be calculated analytically. We outline a replica-exchange Monte Carlo scheme that allows us to estimate the model evidence by use of multiple histogram reweighting. The method is illustrated for an Ising model and examples in protein structure determination.
Mahardika, G N K; Dibia, N; Budayanti, N S; Susilawathi, N M; Subrata, K; Darwinata, A E; Wignall, F S; Richt, J A; Valdivia-Granda, W A; Sudewi, A A R
2014-06-01
The emergence of human and animal rabies in Bali since November 2008 has attracted local, national and international interest. The potential origin and time of introduction of rabies virus to Bali is described. The nucleoprotein (N) gene of rabies virus from dog brain and human clinical specimens was sequenced using an automated DNA sequencer. Phylogenetic inference with Bayesian Markov Chain Monte Carlo (MCMC) analysis using the Bayesian Evolutionary Analysis by Sampling Trees (BEAST) v. 1.7.5 software confirmed that the outbreak of rabies in Bali was caused by an Indonesian lineage virus following a single introduction. The ancestor of Bali viruses was the descendant of a virus from Kalimantan. Contact tracing showed that the event most likely occurred in early 2008. The introduction of rabies into a large unvaccinated dog population in Bali clearly demonstrates the risk of disease transmission for government agencies and should lead to an increased preparedness and efforts for sustained risk reduction to prevent such events from occurring in future.
Phylogenetic and evolutionary analysis of influenza A H7N9 virus.
Babakir-Mina, Muhammed; Dimonte, Slavatore; Lo Presti, Alessandra; Cella, Eleonora; Perno, Carlo Federico; Ciotti, Marco; Ciccozzi, Massimo
2014-07-01
Recently, human infections with the novel avian-origin influenza A H7N9 virus have been reported from various provinces in China. Human infections with avian influenza A viruses are rare and may cause a wide spectrum of clinical symptoms. This is the first time that human infection with a low pathogenic avian influenza A virus has been associated with a fatal outcome. Here, a phylogenetic and positive selective pressure analysis of haemagglutin (HA), neuraminidase (NA), and matrix protein (MP) genes of the novel reassortant H7N9 virus was carried out. The analysis showed that both structural genes of this reassortant virus likely originated from Euro-Asiatic birds, while NA was more likely to have originated from South Korean birds. The Bayesian phylogenetic tree of the MP showed a main clade and an outside cluster including four sequences from China. The United States and Guatemala classical H7N9-isolates appeared homogeneous and clustered together, although they are distinct from other classical Euro-Asiatic and novel H7N9 viruses. Selective pressure analysis did not reveal any site under statistically significant positive selective pressure in any of the three genes analyzed. Unknown certain intermediate hosts involved might be implicated, so extensive global surveillance and bird-to-person transmission should be closely considered in the future.
Fusarium culmorum is a single phylogenetic species based on multilocus sequence analysis.
Obanor, Friday; Erginbas-Orakci, G; Tunali, B; Nicol, J M; Chakraborty, S
2010-09-01
Fusarium culmorum is a major pathogen of wheat and barley causing head blight and crown rot in cooler temperate climates of Australia, Europe, West Asia and North Africa. To better understand its evolutionary history we partially sequenced single copy nuclear genes encoding translation elongation factor 1-α (TEF), reductase (RED) and phosphate permease (PHO) in 100 F. culmorum isolates with 11 isolates of Fusarium crookwellense, Fusarium graminearum and Fusarium pseudograminearum. Phylogenetic analysis of multilocus sequence (MLS) data using Bayesian inference and maximum parsimony analysis showed that F. culmorum from wheat is a single phylogenetic species with no significant linkage disequilibrium and little or no lineage development along geographic origin. Both MLS and TEF and RED gene sequence analysis separated the four Fusarium species used and delineated three to four groups within the F. culmorum clade. But the PHO gene could not completely resolve isolates into their respective species. Fixation index and gene flow suggest significant genetic exchange between the isolates from distant geographic regions. A lack of strong lineage structure despite the geographic separation of the three collections indicates a frequently recombining species and/or widespread distribution of genotypes due to international trade, tourism and long-range dispersal of macroconidia. Moreover, the two mating type genes were present in equal proportion among the F. culmorum collection used in this study, leaving open the possibility of sexual reproduction. Copyright © 2010 The British Mycological Society. Published by Elsevier Ltd. All rights reserved.
[Comparison of Bayesian interim analysis and classical interim analysis in group sequential design].
Yuan, Lingling; Zhan, Zhiying; Tan, Xuhui
2015-11-01
To explore the differences between the Bayesian interim analysis and the classical interim analysis. To compare the means of two independent samples between control and treatment, superior hypothesis test was established. In line with the data requirements for group sequential design, Type Iota error of Bayesian interim analysis based on various prior distributions, Power, Average Sample Size and Average Stage were estimated in the interim analysis. In the Pocock and O' Brien & Fleming designs, the Type Iota errors in the Bayesian interim analysis based on the skeptical prior distribution and the handicap prior distribution were controlled at around 0.05. When the powers of these two classical designs were both 80%, Bayesian powers of the skeptical prior distribution and the handicap prior distribution were markedly lower. The powers of the non-informative prior distribution and the enthusiastic prior distribution were distinctly higher than 80%. In the Bayesian interim analysis based on the skeptical prior distribution and the handicap Prior distribution, the Type Iota errors can be well controlled. Bayesian interim analyses using these two prior distributions, compared with the analysis adopting the O' Brien & Fleming method, can markedly increase the possibility of ending the clinical trials ahead of time. The Bayesian interim analyses based on these two distributions do not have practical value for group sequential design of the Pocock method.
Bayesian methods for the analysis of inequality constrained contingency tables.
Laudy, Olav; Hoijtink, Herbert
2007-04-01
A Bayesian methodology for the analysis of inequality constrained models for contingency tables is presented. The problem of interest lies in obtaining the estimates of functions of cell probabilities subject to inequality constraints, testing hypotheses and selection of the best model. Constraints on conditional cell probabilities and on local, global, continuation and cumulative odds ratios are discussed. A Gibbs sampler to obtain a discrete representation of the posterior distribution of the inequality constrained parameters is used. Using this discrete representation, the credibility regions of functions of cell probabilities can be constructed. Posterior model probabilities are used for model selection and hypotheses are tested using posterior predictive checks. The Bayesian methodology proposed is illustrated in two examples.
Bayesian tomography and integrated data analysis in fusion diagnostics
Li, Dong Dong, Y. B.; Deng, Wei; Shi, Z. B.; Fu, B. Z.; Gao, J. M.; Wang, T. B.; Zhou, Yan; Liu, Yi; Yang, Q. W.; Duan, X. R.
2016-11-15
In this article, a Bayesian tomography method using non-stationary Gaussian process for a prior has been introduced. The Bayesian formalism allows quantities which bear uncertainty to be expressed in the probabilistic form so that the uncertainty of a final solution can be fully resolved from the confidence interval of a posterior probability. Moreover, a consistency check of that solution can be performed by checking whether the misfits between predicted and measured data are reasonably within an assumed data error. In particular, the accuracy of reconstructions is significantly improved by using the non-stationary Gaussian process that can adapt to the varying smoothness of emission distribution. The implementation of this method to a soft X-ray diagnostics on HL-2A has been used to explore relevant physics in equilibrium and MHD instability modes. This project is carried out within a large size inference framework, aiming at an integrated analysis of heterogeneous diagnostics.
Bayesian analysis of structural equation models with dichotomous variables.
Lee, Sik-Yum; Song, Xin-Yuan
2003-10-15
Structural equation modelling has been used extensively in the behavioural and social sciences for studying interrelationships among manifest and latent variables. Recently, its uses have been well recognized in medical research. This paper introduces a Bayesian approach to analysing general structural equation models with dichotomous variables. In the posterior analysis, the observed dichotomous data are augmented with the hypothetical missing values, which involve the latent variables in the model and the unobserved continuous measurements underlying the dichotomous data. An algorithm based on the Gibbs sampler is developed for drawing the parameters values and the hypothetical missing values from the joint posterior distributions. Useful statistics, such as the Bayesian estimates and their standard error estimates, and the highest posterior density intervals, can be obtained from the simulated observations. A posterior predictive p-value is used to test the goodness-of-fit of the posited model. The methodology is applied to a study of hypertensive patient non-adherence to medication.
BAYESIAN ANALYSIS OF MULTIPLE HARMONIC OSCILLATIONS IN THE SOLAR CORONA
Arregui, I.; Asensio Ramos, A.; Diaz, A. J.
2013-03-01
The detection of multiple mode harmonic kink oscillations in coronal loops enables us to obtain information on coronal density stratification and magnetic field expansion using seismology inversion techniques. The inference is based on the measurement of the period ratio between the fundamental mode and the first overtone and theoretical results for the period ratio under the hypotheses of coronal density stratification and magnetic field expansion of the wave guide. We present a Bayesian analysis of multiple mode harmonic oscillations for the inversion of the density scale height and magnetic flux tube expansion under each of the hypotheses. The two models are then compared using a Bayesian model comparison scheme to assess how plausible each one is given our current state of knowledge.
A Bayesian on-off analysis of cosmic ray data
NASA Astrophysics Data System (ADS)
Nosek, Dalibor; Nosková, Jana
2017-09-01
We deal with the analysis of on-off measurements designed for the confirmation of a weak source of events whose presence is hypothesized, based on former observations. The problem of a small number of source events that are masked by an imprecisely known background is addressed from a Bayesian point of view. We examine three closely related variables, the posterior distributions of which carry relevant information about various aspects of the investigated phenomena. This information is utilized for predictions of further observations, given actual data. Backed by details of detection, we propose how to quantify disparities between different measurements. The usefulness of the Bayesian inference is demonstrated on examples taken from cosmic ray physics.
A Bayesian semiparametric factor analysis model for subtype identification.
Sun, Jiehuan; Warren, Joshua L; Zhao, Hongyu
2017-04-25
Disease subtype identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to infer disease subtypes, which often lead to biologically meaningful insights into disease. Despite many successes, existing clustering methods may not perform well when genes are highly correlated and many uninformative genes are included for clustering due to the high dimensionality. In this article, we introduce a novel subtype identification method in the Bayesian setting based on gene expression profiles. This method, called BCSub, adopts an innovative semiparametric Bayesian factor analysis model to reduce the dimension of the data to a few factor scores for clustering. Specifically, the factor scores are assumed to follow the Dirichlet process mixture model in order to induce clustering. Through extensive simulation studies, we show that BCSub has improved performance over commonly used clustering methods. When applied to two gene expression datasets, our model is able to identify subtypes that are clinically more relevant than those identified from the existing methods.
Bayesian tomography and integrated data analysis in fusion diagnostics
NASA Astrophysics Data System (ADS)
Li, Dong; Dong, Y. B.; Deng, Wei; Shi, Z. B.; Fu, B. Z.; Gao, J. M.; Wang, T. B.; Zhou, Yan; Liu, Yi; Yang, Q. W.; Duan, X. R.
2016-11-01
In this article, a Bayesian tomography method using non-stationary Gaussian process for a prior has been introduced. The Bayesian formalism allows quantities which bear uncertainty to be expressed in the probabilistic form so that the uncertainty of a final solution can be fully resolved from the confidence interval of a posterior probability. Moreover, a consistency check of that solution can be performed by checking whether the misfits between predicted and measured data are reasonably within an assumed data error. In particular, the accuracy of reconstructions is significantly improved by using the non-stationary Gaussian process that can adapt to the varying smoothness of emission distribution. The implementation of this method to a soft X-ray diagnostics on HL-2A has been used to explore relevant physics in equilibrium and MHD instability modes. This project is carried out within a large size inference framework, aiming at an integrated analysis of heterogeneous diagnostics.
Character analysis in morphological phylogenetics: problems and solutions.
Wiens, J J
2001-01-01
Many aspects of morphological phylogenetics are controversial in the theoretical systematics literature and yet are often poorly explained and justified in empirical studies. In this paper, I argue that most morphological characters describe variation that is fundamentally quantitative, regardless of whether they are coded qualitatively or quantitatively by systematists. Given this view, three fundamental problems in morphological character analysis (definition, delimitation, and ordering of character states) may have a common solution: coding morphological characters as continuous quantitative traits. A new parsimony method (step-matrix gap-weighting, a modification of Thiele's approach) is proposed that allows quantitative traits to be analyzed as continuous variables. The problem of scaling or weighting quantitative characters relative to qualitative characters (and to each other) is reviewed, and three possible solutions are described. The new coding method is applied to data from hoplocercid lizards, and the results show the sensitivity of phylogenetic conclusions to different scaling methods. Although some authors reject the use of continuous, overlapping, quantitative characters in phylogenetic analysis, quantitative data from hoplocercid lizards that are coded using the new approach contain significant phylogenetic structure and exhibit levels of homoplasy similar to those seen in data that are coded qualitatively.
A Deliberate Practice Approach to Teaching Phylogenetic Analysis
Hobbs, F. Collin; Johnson, Daniel J.; Kearns, Katherine D.
2013-01-01
One goal of postsecondary education is to assist students in developing expert-level understanding. Previous attempts to encourage expert-level understanding of phylogenetic analysis in college science classrooms have largely focused on isolated, or “one-shot,” in-class activities. Using a deliberate practice instructional approach, we designed a set of five assignments for a 300-level plant systematics course that incrementally introduces the concepts and skills used in phylogenetic analysis. In our assignments, students learned the process of constructing phylogenetic trees through a series of increasingly difficult tasks; thus, skill development served as a framework for building content knowledge. We present results from 5 yr of final exam scores, pre- and postconcept assessments, and student surveys to assess the impact of our new pedagogical materials on student performance related to constructing and interpreting phylogenetic trees. Students improved in their ability to interpret relationships within trees and improved in several aspects related to between-tree comparisons and tree construction skills. Student feedback indicated that most students believed our approach prepared them to engage in tree construction and gave them confidence in their abilities. Overall, our data confirm that instructional approaches implementing deliberate practice address student misconceptions, improve student experiences, and foster deeper understanding of difficult scientific concepts. PMID:24297294
A deliberate practice approach to teaching phylogenetic analysis.
Hobbs, F Collin; Johnson, Daniel J; Kearns, Katherine D
2013-01-01
One goal of postsecondary education is to assist students in developing expert-level understanding. Previous attempts to encourage expert-level understanding of phylogenetic analysis in college science classrooms have largely focused on isolated, or "one-shot," in-class activities. Using a deliberate practice instructional approach, we designed a set of five assignments for a 300-level plant systematics course that incrementally introduces the concepts and skills used in phylogenetic analysis. In our assignments, students learned the process of constructing phylogenetic trees through a series of increasingly difficult tasks; thus, skill development served as a framework for building content knowledge. We present results from 5 yr of final exam scores, pre- and postconcept assessments, and student surveys to assess the impact of our new pedagogical materials on student performance related to constructing and interpreting phylogenetic trees. Students improved in their ability to interpret relationships within trees and improved in several aspects related to between-tree comparisons and tree construction skills. Student feedback indicated that most students believed our approach prepared them to engage in tree construction and gave them confidence in their abilities. Overall, our data confirm that instructional approaches implementing deliberate practice address student misconceptions, improve student experiences, and foster deeper understanding of difficult scientific concepts.
Martínez-Salazar, Elizabeth A; Rosas-Valdez, Rogelio; Gregory, T Ryan; Violante-González, Juan
2016-08-01
: Infidum similis Travassos, 1916 (Dicrocoeliidae: Leipertrematinae) was found in the gall bladder of Leptophis diplotropis Günther, 1872 from El Podrido, Acapulco, Guerrero, Mexico. A phylogenetic analysis based on partial sequences of the 28S ribosomal RNA using maximum likelihood (ML) and Bayesian inference (BI) analyses was carried out to assess its phylogenetic position within suborder Xiphidiata, alongside members of the superfamilies Gorgoderoidea and Plagiorchoidea. The phylogenetic trees showed that the genus is most-closely related to the Plagiorchoidea rather than to the Gorgoderoidea, in keeping with previous taxonomic designations. Phylogenies obtained from ML and BI analysis of the 28S rDNA gene revealed a well supported clade in which Choledocystus hepaticus (Lutz, 1928) Sullivan, 1977 is sister to I. similis. On the other hand, a tree obtained using a partial sequence of the cytochrome c oxidase subunit 1 (cox1) mtDNA gene (ML and BI analysis), with species supposed to be closely related to I. similis according to 28S, does not support this relatedness. Based on the independence of Infidum from the subfamily Leipertrematinae Yamaguti, 1958 , our results clearly demonstrated that the genus corresponds to a different family and with species closely related to C. hepaticus within Plagiorchoidea. New data are presented about the tegumental surface of I. similis by scanning electron microscopy as well as the estimation of its haploid genome size using Feulgen Image Analysis Densitometry of sperm nuclei as part of the characterization of this species. This is the first genome size estimated for a member of Plagiorchiida, and these data will provide a new source of knowledge on helminth diversity and evolutionary studies. This constitutes the first host record, and new geographical distribution, for this species in Mexico.
Tarasov, Sergei; Dimitrov, Dimitar
2016-11-29
Dung beetles (subfamily Scarabaeinae) are popular model organisms in ecology and developmental biology, and for the last two decades they have experienced a systematics renaissance with the adoption of modern phylogenetic approaches. Within this period 16 key phylogenies and numerous additional studies with limited scope have been published, but higher-level relationships of this pivotal group of beetles remain contentious and current classifications contain many unnatural groupings. The present study provides a robust phylogenetic framework and a revised classification of dung beetles. We assembled the so far largest molecular dataset for dung beetles using sequences of 8 gene regions and 547 terminals including the outgroup taxa. This dataset was analyzed using Bayesian, maximum likelihood and parsimony approaches. In order to test the sensitivity of results to different analytical treatments, we evaluated alternative partitioning schemes based on secondary structure, domains and codon position. We assessed substitution models adequacy using Bayesian framework and used these results to exclude partitions where substitution models did not adequately depict the processes that generated the data. We show that exclusion of partitions that failed the model adequacy evaluation has a potential to improve phylogenetic inference, but efficient implementation of this approach on large datasets is problematic and awaits development of new computationally advanced software. In the class Insecta it is uncommon for the results of molecular phylogenetic analysis to lead to substantial changes in classification. However, the results presented here are congruent with recent morphological studies and support the largest change in dung beetle systematics for the last 50 years. Here we propose the revision of the concepts for the tribes Deltochilini (Canthonini), Dichotomiini and Coprini; additionally, we redefine the tribe Sisyphini. We provide and illustrate synapomorphies and
Cao, Y; Hao, J S; Sun, X Y; Zheng, B; Yang, Q
2016-12-02
Pieridae is a butterfly family whose evolutionary history is poorly understood. Due to the difficulties in identifying morphological synapomorphies within the group and the scarcity of the fossil records, only a few studies on higher phylogeny of Pieridae have been reported to date. In this study, we describe the complete mitochondrial genomes of four pierid butterfly species (Aporia martineti, Aporia hippia, Aporia bieti, and Mesapia peloria), in order to better characterize the pierid butterfly mitogenomes and perform the phylogenetic analyses using all available mitogenomic sequence data (13PCGs, rRNAs, and tRNAs) from the 18 pierid butterfly species comprising the three main subfamilies (Dismorphiinae, Coliadinae and Pierinae). Our analysis shows that the four new mitogenomes share similar features with other known pierid mitogenomes in gene order and organization. Phylogenetic analyses by maximum likelihood and Bayesian inference show that the pierid higher-level relationship is: Dismorphiinae + (Coliadinae + Pierinae), which corroborates the results of some previous molecular and morphological studies. However, we found that the Hebomoia and Anthocharis make a sister group, supporting the traditional tribe Anthocharidini; in addition, the Mesapia peloria was shown to be clustered within the Aporia group, suggesting that the genus Mesapia should be reduced to the taxonomic status of subgenus. Our molecular dating analysis indicates that the family Pieridae began to diverge during the Late Cretaceous about 92 million years ago (mya), while the subfamily Pierinae diverged from the Coliadinae at about 86 mya (Late Cretaceous).
2010-01-01
Background Likelihood-based phylogenetic inference is generally considered to be the most reliable classification method for unknown sequences. However, traditional likelihood-based phylogenetic methods cannot be applied to large volumes of short reads from next-generation sequencing due to computational complexity issues and lack of phylogenetic signal. "Phylogenetic placement," where a reference tree is fixed and the unknown query sequences are placed onto the tree via a reference alignment, is a way to bring the inferential power offered by likelihood-based approaches to large data sets. Results This paper introduces pplacer, a software package for phylogenetic placement and subsequent visualization. The algorithm can place twenty thousand short reads on a reference tree of one thousand taxa per hour per processor, has essentially linear time and memory complexity in the number of reference taxa, and is easy to run in parallel. Pplacer features calculation of the posterior probability of a placement on an edge, which is a statistically rigorous way of quantifying uncertainty on an edge-by-edge basis. It also can inform the user of the positional uncertainty for query sequences by calculating expected distance between placement locations, which is crucial in the estimation of uncertainty with a well-sampled reference tree. The software provides visualizations using branch thickness and color to represent number of placements and their uncertainty. A simulation study using reads generated from 631 COG alignments shows a high level of accuracy for phylogenetic placement over a wide range of alignment diversity, and the power of edge uncertainty estimates to measure placement confidence. Conclusions Pplacer enables efficient phylogenetic placement and subsequent visualization, making likelihood-based phylogenetics methodology practical for large collections of reads; it is freely available as source code, binaries, and a web service. PMID:21034504
A Bayesian Nonparametric Meta-Analysis Model
ERIC Educational Resources Information Center
Karabatsos, George; Talbott, Elizabeth; Walker, Stephen G.
2015-01-01
In a meta-analysis, it is important to specify a model that adequately describes the effect-size distribution of the underlying population of studies. The conventional normal fixed-effect and normal random-effects models assume a normal effect-size population distribution, conditionally on parameters and covariates. For estimating the mean overall…
A Bayesian Nonparametric Meta-Analysis Model
Structure-Based Phylogenetic Analysis of the Lipocalin Superfamily
Lakshmi, Balasubramanian; Mishra, Madhulika; Srinivasan, Narayanaswamy; Archunan, Govindaraju
2015-01-01
Lipocalins constitute a superfamily of extracellular proteins that are found in all three kingdoms of life. Although very divergent in their sequences and functions, they show remarkable similarity in 3-D structures. Lipocalins bind and transport small hydrophobic molecules. Earlier sequence-based phylogenetic studies of lipocalins highlighted that they have a long evolutionary history. However the molecular and structural basis of their functional diversity is not completely understood. The main objective of the present study is to understand functional diversity of the lipocalins using a structure-based phylogenetic approach. The present study with 39 protein domains from the lipocalin superfamily suggests that the clusters of lipocalins obtained by structure-based phylogeny correspond well with the functional diversity. The detailed analysis on each of the clusters and sub-clusters reveals that the 39 lipocalin domains cluster based on their mode of ligand binding though the clustering was performed on the basis of gross domain structure. The outliers in the phylogenetic tree are often from single member families. Also structure-based phylogenetic approach has provided pointers to assign putative function for the domains of unknown function in lipocalin family. The approach employed in the present study can be used in the future for the functional identification of new lipocalin proteins and may be extended to other protein families where members show poor sequence similarity but high structural similarity. PMID:26263546
Molecular identification and phylogenetic analysis of baculoviruses from Lepidoptera.
Jehle, Johannes A; Lange, Martin; Wang, Hualin; Hu, Zhihong; Wang, Yongjie; Hauschild, Rüdiger
2006-03-01
PCR amplification of the highly conserved baculovirus genes late expression factor 8 (lef-8), late expression factor 9 (lef-9) and polyhedrin/granulin (polh/gran) combined with molecular phylogenetic analyses provide a powerful tool to identify lepidopteran-specific baculoviruses and to study their diversity. In the present investigation, we have improved the degenerate oligonucleotides and corroborated the approach that was recently described by Lange et al. (Lange, M., Wang, H., Zhihong, H., Jehle, J.A., 2004. Towards a molecular identification and classification system of lepidopteran-specific baculoviruses. Virology 325, 36-47.). Baculovirus DNA was isolated from 71 uncharacterized historic baculovirus samples, and partial gene sequences were amplified by using gene-specific degenerate PCR primers. The obtained PCR products were directly sequenced, and the deduced amino acid sequences were compiled and aligned with published sequences of these target genes. A phylogenetic tree of 117 baculoviruses was inferred using maximum parsimony and distance methods. Based on the comprehensive phylogenetic analysis of the partial lef-8, lef-9 and polh/gran genes, we propose a phylogenetic species criterion for lepidopteran-specific baculoviruses that uses the genetic distances of these genes for species demarcation.
Risk analysis using a hybrid Bayesian-approximate reasoning methodology.
Bott, T. F.; Eisenhawer, S. W.
2001-01-01
Analysts are sometimes asked to make frequency estimates for specific accidents in which the accident frequency is determined primarily by safety controls. Under these conditions, frequency estimates use considerable expert belief in determining how the controls affect the accident frequency. To evaluate and document beliefs about control effectiveness, we have modified a traditional Bayesian approach by using approximate reasoning (AR) to develop prior distributions. Our method produces accident frequency estimates that separately express the probabilistic results produced in Bayesian analysis and possibilistic results that reflect uncertainty about the prior estimates. Based on our experience using traditional methods, we feel that the AR approach better documents beliefs about the effectiveness of controls than if the beliefs are buried in Bayesian prior distributions. We have performed numerous expert elicitations in which probabilistic information was sought from subject matter experts not trained In probability. We find it rnuch easier to elicit the linguistic variables and fuzzy set membership values used in AR than to obtain the probability distributions used in prior distributions directly from these experts because it better captures their beliefs and better expresses their uncertainties.
Spectral Analysis of B Stars: An Application of Bayesian Statistics
NASA Astrophysics Data System (ADS)
Mugnes, J.-M.; Robert, C.
2012-12-01
To better understand the processes involved in stellar physics, it is necessary to obtain accurate stellar parameters (effective temperature, surface gravity, abundances…). Spectral analysis is a powerful tool for investigating stars, but it is also vital to reduce uncertainties at a decent computational cost. Here we present a spectral analysis method based on a combination of Bayesian statistics and grids of synthetic spectra obtained with TLUSTY. This method simultaneously constrains the stellar parameters by using all the lines accessible in observed spectra and thus greatly reduces uncertainties and improves the overall spectrum fitting. Preliminary results are shown using spectra from the Observatoire du Mont-Mégantic.
Acute Abdominal Pain: Bayesian Analysis in the Emergency Room
Harvey, A. C.; Moodie, P. F.
1982-01-01
A non-sequential Bayesian analysis was deemed a suitable approach to the important clinical problem of analysis of acute abdominal pain in the Emergency Room. Using series reported in the literature as a data source complemented by expert clinical estimates of probabilities of clinical data a program has been established in St. Boniface, Canada. Prior to implementing the program as an online, quickly available diagnostic aid, a prospective preliminary study has shown that the performance of computer plus clinician is significantly better than either clinician or computer alone. A major emphasis has been developing the acceptability of the program in real-life diagnoses in the Emergency Room.
Phylogenetic and Recombination Analysis of Tomato Spotted Wilt Virus
Yu, Jisuk; Kim, Mi-Kyeong; Choi, Hong-Soo; Kim, Kook-Hyung
2013-01-01
Tomato spotted wilt virus (TSWV) severely damages and reduces the yield of many economically important plants worldwide. In this study, we determined the whole-genome sequences of 10 TSWV isolates recently identified from various regions and hosts in Korea. Phylogenetic analysis of these 10 isolates as well as the three previously sequenced isolates indicated that the 13 Korean TSWV isolates could be divided into two groups reflecting either two different origins or divergences of Korean TSWV isolates. In addition, the complete nucleotide sequences for the 13 Korean TSWV isolates along with previously sequenced TSWV RNA segments from Korea and other countries were subjected to phylogenetic and recombination analysis. The phylogenetic analysis indicated that both the RNA L and RNA M segments of most Korean isolates might have originated in Western Europe and North America but that the RNA S segments for all Korean isolates might have originated in China and Japan. Recombination analysis identified a total of 12 recombination events among all isolates and segments and five recombination events among the 13 Korea isolates; among the five recombinants from Korea, three contained the whole RNA L segment, suggesting reassortment rather than recombination. Our analyses provide evidence that both recombination and reassortment have contributed to the molecular diversity of TSWV. PMID:23696821
Phylogenetic and recombination analysis of tomato spotted wilt virus.
Lian, Sen; Lee, Jong-Seung; Cho, Won Kyong; Yu, Jisuk; Kim, Mi-Kyeong; Choi, Hong-Soo; Kim, Kook-Hyung
2013-01-01
Tomato spotted wilt virus (TSWV) severely damages and reduces the yield of many economically important plants worldwide. In this study, we determined the whole-genome sequences of 10 TSWV isolates recently identified from various regions and hosts in Korea. Phylogenetic analysis of these 10 isolates as well as the three previously sequenced isolates indicated that the 13 Korean TSWV isolates could be divided into two groups reflecting either two different origins or divergences of Korean TSWV isolates. In addition, the complete nucleotide sequences for the 13 Korean TSWV isolates along with previously sequenced TSWV RNA segments from Korea and other countries were subjected to phylogenetic and recombination analysis. The phylogenetic analysis indicated that both the RNA L and RNA M segments of most Korean isolates might have originated in Western Europe and North America but that the RNA S segments for all Korean isolates might have originated in China and Japan. Recombination analysis identified a total of 12 recombination events among all isolates and segments and five recombination events among the 13 Korea isolates; among the five recombinants from Korea, three contained the whole RNA L segment, suggesting reassortment rather than recombination. Our analyses provide evidence that both recombination and reassortment have contributed to the molecular diversity of TSWV.
Ari, Eszter; Ittzés, Péter; Podani, János; Thi, Quynh Chi Le; Jakó, Eena
2012-04-01
Boolean analysis (or BOOL-AN; Jakó et al., 2009. BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction. Mol. Phylogenet. Evol. 52, 887-97.), a recently developed method for sequence comparison uses the Iterative Canonical Form of Boolean functions. It considers sequence information in a way entirely different from standard phylogenetic methods (i.e. Maximum Parsimony, Maximum-Likelihood, Neighbor-Joining, and Bayesian analysis). The performance and reliability of Boolean analysis were tested and compared with the standard phylogenetic methods, using artificially evolved - simulated - nucleotide sequences and the 22 mitochondrial tRNA genes of the great apes. At the outset, we assumed that the phylogeny of Hominidae is generally well established, and the guide tree of artificial sequence evolution can also be used as a benchmark. These offer a possibility to compare and test the performance of different phylogenetic methods. Trees were reconstructed by each method from 2500 simulated sequences and 22 mitochondrial tRNA sequences. We also introduced a special re-sampling method for Boolean analysis on permuted sequence sites, the P-BOOL-AN procedure. Considering the reliability values (branch support values of consensus trees and Robinson-Foulds distances) we used for simulated sequence trees produced by different phylogenetic methods, BOOL-AN appeared as the most reliable method. Although the mitochondrial tRNA sequences of great apes are relatively short (59-75 bases long) and the ratio of their constant characters is about 75%, BOOL-AN, P-BOOL-AN and the Bayesian approach produced the same tree-topology as the established phylogeny, while the outcomes of Maximum Parsimony, Maximum-Likelihood and Neighbor-Joining methods were equivocal. We conclude that Boolean analysis is a promising alternative to existing methods of sequence comparison for phylogenetic reconstruction and congruence analysis. Copyright Â© 2012 Elsevier Inc. All
Leaché, Adam D; Crews, Sarah C; Hickerson, Michael J
2007-12-22
Many species inhabiting the Peninsular Desert of Baja California demonstrate a phylogeographic break at the mid-peninsula, and previous researchers have attributed this shared pattern to a single vicariant event, a mid-peninsular seaway. However, previous studies have not explicitly considered the inherent stochasticity associated with the gene-tree coalescence for species preceding the time of the putative mid-peninsular divergence. We use a Bayesian analysis of a hierarchical model to test for simultaneous vicariance across co-distributed sister lineages sharing a genealogical break at the mid-peninsula. This Bayesian method is advantageous over traditional phylogenetic interpretations of biogeography because it considers the genetic variance associated with the coalescent and mutational processes, as well as the among-lineage demographic differences that affect gene-tree coalescent patterns. Mitochondrial DNA data from six small mammals and six squamate reptiles do not support the perception of a shared vicariant history among lineages exhibiting a north-south divergence at the mid-peninsula, and instead support two events differentially structuring genetic diversity in this region.
PHYRN: A Robust Method for Phylogenetic Analysis of Highly Divergent Sequences
Bhardwaj, Gaurav; Ko, Kyung Dae; Hong, Yoojin; Zhang, Zhenhai; Ho, Ngai Lam; Chintapalli, Sree V.; Kline, Lindsay A.; Gotlin, Matthew; Hartranft, David Nicholas; Patterson, Morgen E.; Dave, Foram; Smith, Evan J.; Holmes, Edward C.; Patterson, Randen L.; van Rossum, Damian B.
2012-01-01
Both multiple sequence alignment and phylogenetic analysis are problematic in the “twilight zone” of sequence similarity (≤25% amino acid identity). Herein we explore the accuracy of phylogenetic inference at extreme sequence divergence using a variety of simulated data sets. We evaluate four leading multiple sequence alignment (MSA) methods (MAFFT, T-COFFEE, CLUSTAL, and MUSCLE) and six commonly used programs of tree estimation (Distance-based: Neighbor-Joining; Character-based: PhyML, RAxML, GARLI, Maximum Parsimony, and Bayesian) against a novel MSA-independent method (PHYRN) described here. Strikingly, at “midnight zone” genetic distances (∼7% pairwise identity and 4.0 gaps per position), PHYRN returns high-resolution phylogenies that outperform traditional approaches. We reason this is due to PHRYN's capability to amplify informative positions, even at the most extreme levels of sequence divergence. We also assess the applicability of the PHYRN algorithm for inferring deep evolutionary relationships in the divergent DANGER protein superfamily, for which PHYRN infers a more robust tree compared to MSA-based approaches. Taken together, these results demonstrate that PHYRN represents a powerful mechanism for mapping uncharted frontiers in highly divergent protein sequence data sets. PMID:22514627
Ross, Cody T; Strimling, Pontus; Ericksen, Karen Paige; Lindenfors, Patrik; Mulder, Monique Borgerhoff
2016-06-01
We present formal evolutionary models for the origins and persistence of the practice of Female Genital Modification (FGMo). We then test the implications of these models using normative cross-cultural data on FGMo in Africa and Bayesian phylogenetic methods that explicitly model adaptive evolution. Empirical evidence provides some support for the findings of our evolutionary models that the de novo origins of the FGMo practice should be associated with social stratification, and that social stratification should place selective pressures on the adoption of FGMo; these results, however, are tempered by the finding that FGMo has arisen in many cultures that have no social stratification, and that forces operating orthogonally to stratification appear to play a more important role in the cross-cultural distribution of FGMo. To explain these cases, one must consider cultural evolutionary explanations in conjunction with behavioral ecological ones. We conclude with a discussion of the implications of our study for policies designed to end the practice of FGMo.
2011-01-01
Background The universal common ancestry (UCA) of all known life is a fundamental component of modern evolutionary theory, supported by a wide range of qualitative molecular evidence. Nevertheless, recently both the status and nature of UCA has been questioned. In earlier work I presented a formal, quantitative test of UCA in which model selection criteria overwhelmingly choose common ancestry over independent ancestry, based on a dataset of universally conserved proteins. These model-based tests are founded in likelihoodist and Bayesian probability theory, in opposition to classical frequentist null hypothesis tests such as Karlin-Altschul E-values for sequence similarity. In a recent comment, Koonin and Wolf (K&W) claim that the model preference for UCA is "a trivial consequence of significant sequence similarity". They support this claim with a computational simulation, derived from universally conserved proteins, which produces similar sequences lacking phylogenetic structure. The model selection tests prefer common ancestry for this artificial data set. Results For the real universal protein sequences, hierarchical phylogenetic structure (induced by genealogical history) is the overriding reason for why the tests choose UCA; sequence similarity is a relatively minor factor. First, for cases of conflicting phylogenetic structure, the tests choose independent ancestry even with highly similar sequences. Second, certain models, like star trees and K&W's profile model (corresponding to their simulation), readily explain sequence similarity yet lack phylogenetic structure. However, these are extremely poor models for the real proteins, even worse than independent ancestry models, though they explain K&W's artificial data well. Finally, K&W's simulation is an implementation of a well-known phylogenetic model, and it produces sequences that mimic homologous proteins. Therefore the model selection tests work appropriately with the artificial data. Conclusions For K
Theobald, Douglas L
2011-11-24
The universal common ancestry (UCA) of all known life is a fundamental component of modern evolutionary theory, supported by a wide range of qualitative molecular evidence. Nevertheless, recently both the status and nature of UCA has been questioned. In earlier work I presented a formal, quantitative test of UCA in which model selection criteria overwhelmingly choose common ancestry over independent ancestry, based on a dataset of universally conserved proteins. These model-based tests are founded in likelihoodist and Bayesian probability theory, in opposition to classical frequentist null hypothesis tests such as Karlin-Altschul E-values for sequence similarity. In a recent comment, Koonin and Wolf (K&W) claim that the model preference for UCA is "a trivial consequence of significant sequence similarity". They support this claim with a computational simulation, derived from universally conserved proteins, which produces similar sequences lacking phylogenetic structure. The model selection tests prefer common ancestry for this artificial data set. For the real universal protein sequences, hierarchical phylogenetic structure (induced by genealogical history) is the overriding reason for why the tests choose UCA; sequence similarity is a relatively minor factor. First, for cases of conflicting phylogenetic structure, the tests choose independent ancestry even with highly similar sequences. Second, certain models, like star trees and K&W's profile model (corresponding to their simulation), readily explain sequence similarity yet lack phylogenetic structure. However, these are extremely poor models for the real proteins, even worse than independent ancestry models, though they explain K&W's artificial data well. Finally, K&W's simulation is an implementation of a well-known phylogenetic model, and it produces sequences that mimic homologous proteins. Therefore the model selection tests work appropriately with the artificial data. For K&W's artificial protein data
Bayesian Sensitivity Analysis of Statistical Models with Missing Data
ZHU, HONGTU; IBRAHIM, JOSEPH G.; TANG, NIANSHENG
2013-01-01
Methods for handling missing data depend strongly on the mechanism that generated the missing values, such as missing completely at random (MCAR) or missing at random (MAR), as well as other distributional and modeling assumptions at various stages. It is well known that the resulting estimates and tests may be sensitive to these assumptions as well as to outlying observations. In this paper, we introduce various perturbations to modeling assumptions and individual observations, and then develop a formal sensitivity analysis to assess these perturbations in the Bayesian analysis of statistical models with missing data. We develop a geometric framework, called the Bayesian perturbation manifold, to characterize the intrinsic structure of these perturbations. We propose several intrinsic influence measures to perform sensitivity analysis and quantify the effect of various perturbations to statistical models. We use the proposed sensitivity analysis procedure to systematically investigate the tenability of the non-ignorable missing at random (NMAR) assumption. Simulation studies are conducted to evaluate our methods, and a dataset is analyzed to illustrate the use of our diagnostic measures. PMID:24753718
Analysis of diversification: combining phylogenetic and taxonomic data.
Paradis, Emmanuel
2003-01-01
The estimation of diversification rates using phylogenetic data has attracted a lot of attention in the past decade. In this context, the analysis of incomplete phylogenies (e.g. phylogenies resolved at the family level but unresolved at the species level) has remained difficult. I present here a likelihood-based method to combine partly resolved phylogenies with taxonomic (species-richness) data to estimate speciation and extinction rates. This method is based on fitting a birth-and-death model to both phylogenetic and taxonomic data. Some examples of the method are presented with data on birds and on mammals. The method is compared with existing approaches that deal with incomplete phylogenies. Some applications and generalizations of the approach introduced in this paper are further discussed. PMID:14667342
A Bayesian analysis of pentaquark signals from CLAS data
David Ireland; Bryan McKinnon; Dan Protopopescu; Pawel Ambrozewicz; Marco Anghinolfi; G. Asryan; Harutyun Avakian; H. Bagdasaryan; Nathan Baillie; Jacques Ball; Nathan Baltzell; V. Batourine; Marco Battaglieri; Ivan Bedlinski; Ivan Bedlinskiy; Matthew Bellis; Nawal Benmouna; Barry Berman; Angela Biselli; Lukasz Blaszczyk; Sylvain Bouchigny; Sergey Boyarinov; Robert Bradford; Derek Branford; William Briscoe; William Brooks; Volker Burkert; Cornel Butuceanu; John Calarco; Sharon Careccia; Daniel Carman; Liam Casey; Shifeng Chen; Lu Cheng; Philip Cole; Patrick Collins; Philip Coltharp; Donald Crabb; Volker Crede; Natalya Dashyan; Rita De Masi; Raffaella De Vita; Enzo De Sanctis; Pavel Degtiarenko; Alexandre Deur; Richard Dickson; Chaden Djalali; Gail Dodge; Joseph Donnelly; David Doughty; Michael Dugger; Oleksandr Dzyubak; Hovanes Egiyan; Kim Egiyan; Lamiaa Elfassi; Latifa Elouadrhiri; Paul Eugenio; Gleb Fedotov; Gerald Feldman; Ahmed Fradi; Herbert Funsten; Michel Garcon; Gagik Gavalian; Nerses Gevorgyan; Gerard Gilfoyle; Kevin Giovanetti; Francois-Xavier Girod; John Goetz; Wesley Gohn; Atilla Gonenc; Ralf Gothe; Keith Griffioen; Michel Guidal; Nevzat Guler; Lei Guo; Vardan Gyurjyan; Kawtar Hafidi; Hayk Hakobyan; Charles Hanretty; Neil Hassall; F. Hersman; Ishaq Hleiqawi; Maurik Holtrop; Charles Hyde; Yordanka Ilieva; Boris Ishkhanov; Eugeny Isupov; D. Jenkins; Hyon-Suk Jo; John Johnstone; Kyungseon Joo; Henry Juengst; Narbe Kalantarians; James Kellie; Mahbubul Khandaker; Wooyoung Kim; Andreas Klein; Franz Klein; Mikhail Kossov; Zebulun Krahn; Laird Kramer; Valery Kubarovsky; Joachim Kuhn; Sergey Kuleshov; Viacheslav Kuznetsov; Jeff Lachniet; Jean Laget; Jorn Langheinrich; D. Lawrence; Kenneth Livingston; Haiyun Lu; Marion MacCormick; Nikolai Markov; Paul Mattione; Bernhard Mecking; Mac Mestayer; Curtis Meyer; Tsutomu Mibe; Konstantin Mikhaylov; Marco Mirazita; Rory Miskimen; Viktor Mokeev; Brahim Moreno; Kei Moriya; Steven Morrow; Maryam Moteabbed; Edwin Munevar Espitia; Gordon Mutchler; Pawel Nadel-Turonski; Rakhsha Nasseripour; Silvia Niccolai; Gabriel Niculescu; Maria-Ioana Niculescu; Bogdan Niczyporuk; Megh Niroula; Rustam Niyazov; Mina Nozar; Mikhail Osipenko; Alexander Ostrovidov; Kijun Park; Evgueni Pasyuk; Craig Paterson; Sergio Pereira; Joshua Pierce; Nikolay Pivnyuk; Oleg Pogorelko; Sergey Pozdnyakov; John Price; Sebastien Procureur; Yelena Prok; Brian Raue; Giovanni Ricco; Marco Ripani; Barry Ritchie; Federico Ronchetti; Guenther Rosner; Patrizia Rossi; Franck Sabatie; Julian Salamanca; Carlos Salgado; Joseph Santoro; Vladimir Sapunenko; Reinhard Schumacher; Vladimir Serov; Youri Sharabian; Dmitri Sharov; Nikolay Shvedunov; Elton Smith; Lee Smith; Daniel Sober; Daria Sokhan; Aleksey Stavinskiy; Samuel Stepanyan; Stepan Stepanyan; Burnham Stokes; Paul Stoler; Steffen Strauch; Mauro Taiuti; David Tedeschi; Ulrike Thoma; Avtandil Tkabladze; Svyatoslav Tkachenko; Clarisse Tur; Maurizio Ungaro; Michael Vineyard; Alexander Vlassov; Daniel Watts; Lawrence Weinstein; Dennis Weygand; M. Williams; Elliott Wolin; M.H. Wood; Amrit Yegneswaran; Lorenzo Zana; Jixie Zhang; Bo Zhao; Zhiwen Zhao
2008-02-01
We examine the results of two measurements by the CLAS collaboration, one of which claimed evidence for a $\\Theta^{+}$ pentaquark, whilst the other found no such evidence. The unique feature of these two experiments was that they were performed with the same experimental setup. Using a Bayesian analysis we find that the results of the two experiments are in fact compatible with each other, but that the first measurement did not contain sufficient information to determine unambiguously the existence of a $\\Theta^{+}$. Further, we suggest a means by which the existence of a new candidate particle can be tested in a rigorous manner.
Bayesian Analysis of Pentaquark Signals from CLAS Data
Ireland, D. G.; McKinnon, B.; Protopopescu, D.; Donnelly, J.; Hassall, N.; Johnstone, J. R.; Kellie, J. D.; Livingston, K.; Paterson, C.; Rosner, G.; Ambrozewicz, P.; Gonenc, A.; Moteabbed, M.; Anghinolfi, M.; Battaglieri, M.; De Vita, R.; Ricco, G.; Ripani, M.; Taiuti, M.; Asryan, G.
2008-02-08
We examine the results of two measurements by the CLAS collaboration, one of which claimed evidence for a {theta}{sup +} pentaquark, while the other found no such evidence. The unique feature of these two experiments was that they were performed with the same experimental setup. Using a Bayesian analysis, we find that the results of the two experiments are in fact compatible with each other, but that the first measurement did not contain sufficient information to determine unambiguously the existence of a {theta}{sup +}. Further, we suggest a means by which the existence of a new candidate particle can be tested in a rigorous manner.
Variational Bayesian Learning for Wavelet Independent Component Analysis
NASA Astrophysics Data System (ADS)
Roussos, E.; Roberts, S.; Daubechies, I.
2005-11-01
In an exploratory approach to data analysis, it is often useful to consider the observations as generated from a set of latent generators or "sources" via a generally unknown mapping. For the noisy overcomplete case, where we have more sources than observations, the problem becomes extremely ill-posed. Solutions to such inverse problems can, in many cases, be achieved by incorporating prior knowledge about the problem, captured in the form of constraints. This setting is a natural candidate for the application of the Bayesian methodology, allowing us to incorporate "soft" constraints in a natural manner. The work described in this paper is mainly driven by problems in functional magnetic resonance imaging of the brain, for the neuro-scientific goal of extracting relevant "maps" from the data. This can be stated as a `blind' source separation problem. Recent experiments in the field of neuroscience show that these maps are sparse, in some appropriate sense. The separation problem can be solved by independent component analysis (ICA), viewed as a technique for seeking sparse components, assuming appropriate distributions for the sources. We derive a hybrid wavelet-ICA model, transforming the signals into a domain where the modeling assumption of sparsity of the coefficients with respect to a dictionary is natural. We follow a graphical modeling formalism, viewing ICA as a probabilistic generative model. We use hierarchical source and mixing models and apply Bayesian inference to the problem. This allows us to perform model selection in order to infer the complexity of the representation, as well as automatic denoising. Since exact inference and learning in such a model is intractable, we follow a variational Bayesian mean-field approach in the conjugate-exponential family of distributions, for efficient unsupervised learning in multi-dimensional settings. The performance of the proposed algorithm is demonstrated on some representative experiments.
[A phylogenetic analysis of plant communities of Teberda Biosphere Reserve].
Shulakov, A A; Egorov, A V; Onipchenko, V G
2016-01-01
Phylogenetic analysis of communities is based on the comparison of distances on the phylogenetic tree between species of a community under study and those distances in random samples taken out of local flora. It makes it possible to determine to what extent a community composition is formed by more closely related species (i.e., "clustered") or, on the opposite, it is more even and includes species that are less related with each other. The first case is usually interpreted as a result of strong influence caused by abiotic factors, due to which species with similar ecology, a priori more closely related, would remain: In the second case, biotic factors, such as competition, may come to the fore and lead to forming a community out of distant clades due to divergence of their ecological niches: The aim of this' study Was Ad explore the phylogenetic structure in communities of the northwestern Caucasus at two spatial scales - the scale of area from 4 to 100 m2 and the smaller scale within a community. The list of local flora of the alpine belt has been composed using the database of geobotanic descriptions carried out in Teberda Biosphere Reserve at true altitudes exceeding.1800 m. It includes 585 species of flowering plants belonging to 57 families. Basal groups of flowering plants are.not represented in the list. At the scale of communities of three classes, namely Thlaspietea rotundifolii - commumties formed on screes and pebbles, Calluno-Ulicetea - alpine meadow, and Mulgedio-Aconitetea subalpine meadows, have not demonstrated significant distinction of phylogenetic structure. At intra level, for alpine meadows the larger share of closely related species. (clustered community) is detected. Significantly clustered happen to be those communities developing on rocks (class Asplenietea trichomanis) and alpine (class Juncetea trifidi). At the same time, alpine lichen proved to have even phylogenetic structure at the small scale. Alpine (class Salicetea herbaceae) that
Bayesian analysis of inflationary features in Planck and SDSS data
NASA Astrophysics Data System (ADS)
Benetti, Micol; Alcaniz, Jailson S.
2016-07-01
We perform a Bayesian analysis to study possible features in the primordial inflationary power spectrum of scalar perturbations. In particular, we analyze the possibility of detecting the imprint of these primordial features in the anisotropy temperature power spectrum of the cosmic microwave background (CMB) and also in the matter power spectrum P (k ) . We use the most recent CMB data provided by the Planck Collaboration and P (k ) measurements from the 11th data release of the Sloan Digital Sky Survey. We focus our analysis on a class of potentials whose features are localized at different intervals of angular scales, corresponding to multipoles in the ranges 10 <ℓ<60 (Oscill-1) and 150 <ℓ<300 (Oscill-2). Our results show that one of the step potentials (Oscill-1) provides a better fit to the CMB data than does the featureless Λ CDM scenario, with moderate Bayesian evidence in favor of the former. Adding the P (k ) data to the analysis weakens the evidence of the Oscill-1 potential relative to the standard model and strengthens the evidence of this latter scenario with respect to the Oscill-2 model.
Conditional adaptive Bayesian spectral analysis of nonstationary biomedical time series.
Bruce, Scott A; Hall, Martica H; Buysse, Daniel J; Krafty, Robert T
2017-05-08
Many studies of biomedical time series signals aim to measure the association between frequency-domain properties of time series and clinical and behavioral covariates. However, the time-varying dynamics of these associations are largely ignored due to a lack of methods that can assess the changing nature of the relationship through time. This article introduces a method for the simultaneous and automatic analysis of the association between the time-varying power spectrum and covariates, which we refer to as conditional adaptive Bayesian spectrum analysis (CABS). The procedure adaptively partitions the grid of time and covariate values into an unknown number of approximately stationary blocks and nonparametrically estimates local spectra within blocks through penalized splines. CABS is formulated in a fully Bayesian framework, in which the number and locations of partition points are random, and fit using reversible jump Markov chain Monte Carlo techniques. Estimation and inference averaged over the distribution of partitions allows for the accurate analysis of spectra with both smooth and abrupt changes. The proposed methodology is used to analyze the association between the time-varying spectrum of heart rate variability and self-reported sleep quality in a study of older adults serving as the primary caregiver for their ill spouse. © 2017, The International Biometric Society.
Implementation of a Bayesian Engine for Uncertainty Analysis
Leng Vang; Curtis Smith; Steven Prescott
2014-08-01
In probabilistic risk assessment, it is important to have an environment where analysts have access to a shared and secured high performance computing and a statistical analysis tool package. As part of the advanced small modular reactor probabilistic risk analysis framework implementation, we have identified the need for advanced Bayesian computations. However, in order to make this technology available to non-specialists, there is also a need of a simplified tool that allows users to author models and evaluate them within this framework. As a proof-of-concept, we have implemented an advanced open source Bayesian inference tool, OpenBUGS, within the browser-based cloud risk analysis framework that is under development at the Idaho National Laboratory. This development, the “OpenBUGS Scripter” has been implemented as a client side, visual web-based and integrated development environment for creating OpenBUGS language scripts. It depends on the shared server environment to execute the generated scripts and to transmit results back to the user. The visual models are in the form of linked diagrams, from which we automatically create the applicable OpenBUGS script that matches the diagram. These diagrams can be saved locally or stored on the server environment to be shared with other users.
Kitahara, Marcelo V; Cairns, Stephen D; Stolarski, Jarosław; Blair, David; Miller, David J
2010-07-08
Classical morphological taxonomy places the approximately 1400 recognized species of Scleractinia (hard corals) into 27 families, but many aspects of coral evolution remain unclear despite the application of molecular phylogenetic methods. In part, this may be a consequence of such studies focusing on the reef-building (shallow water and zooxanthellate) Scleractinia, and largely ignoring the large number of deep-sea species. To better understand broad patterns of coral evolution, we generated molecular data for a broad and representative range of deep sea scleractinians collected off New Caledonia and Australia during the last decade, and conducted the most comprehensive molecular phylogenetic analysis to date of the order Scleractinia. Partial (595 bp) sequences of the mitochondrial cytochrome oxidase subunit 1 (CO1) gene were determined for 65 deep-sea (azooxanthellate) scleractinians and 11 shallow-water species. These new data were aligned with 158 published sequences, generating a 234 taxon dataset representing 25 of the 27 currently recognized scleractinian families. There was a striking discrepancy between the taxonomic validity of coral families consisting predominantly of deep-sea or shallow-water species. Most families composed predominantly of deep-sea azooxanthellate species were monophyletic in both maximum likelihood and Bayesian analyses but, by contrast (and consistent with previous studies), most families composed predominantly of shallow-water zooxanthellate taxa were polyphyletic, although Acroporidae, Poritidae, Pocilloporidae, and Fungiidae were exceptions to this general pattern. One factor contributing to this inconsistency may be the greater environmental stability of deep-sea environments, effectively removing taxonomic "noise" contributed by phenotypic plasticity. Our phylogenetic analyses imply that the most basal extant scleractinians are azooxanthellate solitary corals from deep-water, their divergence predating that of the robust and
Kitahara, Marcelo V.; Cairns, Stephen D.; Stolarski, Jarosław; Blair, David; Miller, David J.
2010-01-01
Background Classical morphological taxonomy places the approximately 1400 recognized species of Scleractinia (hard corals) into 27 families, but many aspects of coral evolution remain unclear despite the application of molecular phylogenetic methods. In part, this may be a consequence of such studies focusing on the reef-building (shallow water and zooxanthellate) Scleractinia, and largely ignoring the large number of deep-sea species. To better understand broad patterns of coral evolution, we generated molecular data for a broad and representative range of deep sea scleractinians collected off New Caledonia and Australia during the last decade, and conducted the most comprehensive molecular phylogenetic analysis to date of the order Scleractinia. Methodology Partial (595 bp) sequences of the mitochondrial cytochrome oxidase subunit 1 (CO1) gene were determined for 65 deep-sea (azooxanthellate) scleractinians and 11 shallow-water species. These new data were aligned with 158 published sequences, generating a 234 taxon dataset representing 25 of the 27 currently recognized scleractinian families. Principal Findings/Conclusions There was a striking discrepancy between the taxonomic validity of coral families consisting predominantly of deep-sea or shallow-water species. Most families composed predominantly of deep-sea azooxanthellate species were monophyletic in both maximum likelihood and Bayesian analyses but, by contrast (and consistent with previous studies), most families composed predominantly of shallow-water zooxanthellate taxa were polyphyletic, although Acroporidae, Poritidae, Pocilloporidae, and Fungiidae were exceptions to this general pattern. One factor contributing to this inconsistency may be the greater environmental stability of deep-sea environments, effectively removing taxonomic “noise” contributed by phenotypic plasticity. Our phylogenetic analyses imply that the most basal extant scleractinians are azooxanthellate solitary corals from deep
Reginal Frequency Analysis Based on Scaling Properties and Bayesian Models
NASA Astrophysics Data System (ADS)
Kwon, Hyun-Han; Lee, Jeong-Ju; Moon, Young-Il
2010-05-01
A regional frequency analysis based on Hierarchical Bayesian Network (HBN) and scaling theory was developmed. Many recording rain gauges over South Korea were used for the analysis. First, a scaling approach combined with extreme distribution was employed to derive regional formula for frequency analysis. Second, HBN model was used to represent additional information about the regional structure of the scaling parameters, especially the location parameter and shape parameter. The location and shape parameters of the extreme distribution were estimated by utilizing scaling properties in a regression framework, and the scaling parameters linking the parameters (location and shape) to various duration times were simultaneously estimated. It was found that the regional frequency analysis combined with HBN and scaling properties show promising results in terms of establishing regional IDF curves.
Phylogenetic analysis of uroporphyrinogen III synthase (UROS) gene.
Shaik, Abjal Pasha; Alsaeed, Abbas H; Sultana, Asma
2012-01-01
The uroporphyrinogen III synthase (UROS) enzyme (also known as hydroxymethylbilane hydrolyase) catalyzes the cyclization of hydroxymethylbilane to uroporphyrinogen III during heme biosynthesis. A deficiency of this enzyme is associated with the very rare Gunther's disease or congenital erythropoietic porphyria, an autosomal recessive inborn error of metabolism. The current study investigated the possible role of UROS (Homo sapiens [EC: 4.2.1.75; 265 aa; 1371 bp mRNA; Entrez Pubmed ref NP_000366.1, NM_000375.2]) in evolution by studying the phylogenetic relationship and divergence of this gene using computational methods. The UROS protein sequences from various taxa were retrieved from GenBank database and were compared using Clustal-W (multiple sequence alignment) with defaults and a first-pass phylogenetic tree was built using neighbor-joining method as in DELTA BLAST 2.2.27+ version. A total of 163 BLAST hits were found for the uroporphyrinogen III synthase query sequence and these hits showed putative conserved domain, HemD superfamily (as on 14(th) Nov 2012). We then narrowed down the search by manually deleting the proteins which were not UROS sequences and sequences belonging to phyla other than Chordata were deleted. A repeat phylogenetic analysis of 39 taxa was performed using PhyML and TreeDyn software to confirm that UROS is a highly conserved protein with approximately 85% conserved sequences in almost all chordate taxons emphasizing its importance in heme synthesis.
Phylogenetic analysis reveals a scattered distribution of autumn colours
Archetti, Marco
2009-01-01
Background and Aims Leaf colour in autumn is rarely considered informative for taxonomy, but there is now growing interest in the evolution of autumn colours and different hypotheses are debated. Research efforts are hindered by the lack of basic information: the phylogenetic distribution of autumn colours. It is not known when and how autumn colours evolved. Methods Data are reported on the autumn colours of 2368 tree species belonging to 400 genera of the temperate regions of the world, and an analysis is made of their phylogenetic relationships in order to reconstruct the evolutionary origin of red and yellow in autumn leaves. Key Results Red autumn colours are present in at least 290 species (70 genera), and evolved independently at least 25 times. Yellow is present independently from red in at least 378 species (97 genera) and evolved at least 28 times. Conclusions The phylogenetic reconstruction suggests that autumn colours have been acquired and lost many times during evolution. This scattered distribution could be explained by hypotheses involving some kind of coevolutionary interaction or by hypotheses that rely on the need for photoprotection. PMID:19126636
A phylogenetic analysis of the mycoplasmas: basis for their classification.
Weisburg, W G; Tully, J G; Rose, D L; Petzel, J P; Oyaizu, H; Yang, D; Mandelco, L; Sechrest, J; Lawrence, T G; Van Etten, J
1989-01-01
Small-subunit rRNA sequences were determined for almost 50 species of mycoplasmas and their walled relatives, providing the basis for a phylogenetic systematic analysis of these organisms. Five groups of mycoplasmas per se were recognized (provisional names are given): the hominis group (which included species such as Mycoplasma hominis, Mycoplasma lipophilum, Mycoplasma pulmonis, and Mycoplasma neurolyticum), the pneumoniae group (which included species such as Mycoplasma pneumoniae and Mycoplasma muris), the spiroplasma group (which included species such as Mycoplasma mycoides, Spiroplasma citri, and Spiroplasma apis), the anaeroplasma group (which encompassed the anaeroplasmas and acholeplasmas), and a group known to contain only the isolated species Asteroleplasma anaerobium. In addition to these five mycoplasma groups, a sixth group of variously named gram-positive, walled organisms (which included lactobacilli, clostridia, and other organisms) was also included in the overall phylogenetic unit. In each of these six primary groups, subgroups were readily recognized and defined. Although the phylogenetic units identified by rRNA comparisons are difficult to recognize on the basis of mutually exclusive phenotypic characters alone, phenotypic justification can be given a posteriori for a number of them. PMID:2592342
Detection and phylogenetic analysis of bacteriophage WO in spiders (Araneae).
Yan, Qian; Qiao, Huping; Gao, Jin; Yun, Yueli; Liu, Fengxiang; Peng, Yu
2015-11-01
Phage WO is a bacteriophage found in Wolbachia. Herein, we represent the first phylogenetic study of WOs that infect spiders (Araneae). Seven species of spiders (Araneus alternidens, Nephila clavata, Hylyphantes graminicola, Prosoponoides sinensis, Pholcus crypticolens, Coleosoma octomaculatum, and Nurscia albofasciata) from six families were infected by Wolbachia and WO, followed by comprehensive sequence analysis. Interestingly, WO could be only detected Wolbachia-infected spiders. The relative infection rates of those seven species of spiders were 75, 100, 88.9, 100, 62.5, 72.7, and 100 %, respectively. Our results indicated that both Wolbachia and WO were found in three different body parts of N. clavata, and WO could be passed to the next generation of H. graminicola by vertical transmission. There were three different sequences for WO infected in A. alternidens and two different WO sequences from C. octomaculatum. Only one sequence of WO was found for the other five species of spiders. The discovered sequence of WO ranged from 239 to 311 bp. Phylogenetic tree was generated using maximum likelihood (ML) based on the orf7 gene sequences. According to the phylogenetic tree, WOs in N. clavata and H. graminicola were clustered in the same group. WOs from A. alternidens (WAlt1) and C. octomaculatum (WOct2) were closely related to another clade, whereas WO in P. sinensis was classified as a sole cluster.
Jacquemin, Stephen J.; Doll, Jason C.
2014-01-01
We combine evolutionary biology and community ecology to test whether two species traits, body size and geographic range, explain long term variation in local scale freshwater stream fish assemblages. Body size and geographic range are expected to influence several aspects of fish ecology, via relationships with niche breadth, dispersal, and abundance. These traits are expected to scale inversely with niche breadth or current abundance, and to scale directly with dispersal potential. However, their utility to explain long term temporal patterns in local scale abundance is not known. Comparative methods employing an existing molecular phylogeny were used to incorporate evolutionary relatedness in a test for covariation of body size and geographic range with long term (1983 – 2010) local scale population variation of fishes in West Fork White River (Indiana, USA). The Bayesian model incorporating phylogenetic uncertainty and correlated predictors indicated that neither body size nor geographic range explained significant variation in population fluctuations over a 28 year period. Phylogenetic signal data indicated that body size and geographic range were less similar among taxa than expected if trait evolution followed a purely random walk. We interpret this as evidence that local scale population variation may be influenced less by species-level traits such as body size or geographic range, and instead may be influenced more strongly by a taxon’s local scale habitat and biotic assemblages. PMID:24691075
Dolz, Roser; Valle, Rosa; Perera, Carmen L.; Bertran, Kateri; Frías, Maria T.; Majó, Natàlia; Ganges, Llilianne; Pérez, Lester J.
2013-01-01
Background Infectious bursal disease is a highly contagious and acute viral disease caused by the infectious bursal disease virus (IBDV); it affects all major poultry producing areas of the world. The current study was designed to rigorously measure the global phylogeographic dynamics of IBDV strains to gain insight into viral population expansion as well as the emergence, spread and pattern of the geographical structure of very virulent IBDV (vvIBDV) strains. Methodology/Principal Findings Sequences of the hyper-variable region of the VP2 (HVR-VP2) gene from IBDV strains isolated from diverse geographic locations were obtained from the GenBank database; Cuban sequences were obtained in the current work. All sequences were analysed by Bayesian phylogeographic analysis, implemented in the Bayesian Evolutionary Analysis Sampling Trees (BEAST), Bayesian Tip-association Significance testing (BaTS) and Spatial Phylogenetic Reconstruction of Evolutionary Dynamics (SPREAD) software packages. Selection pressure on the HVR-VP2 was also assessed. The phylogeographic association-trait analysis showed that viruses sampled from individual countries tend to cluster together, suggesting a geographic pattern for IBDV strains. Spatial analysis from this study revealed that strains carrying sequences that were linked to increased virulence of IBDV appeared in Iran in 1981 and spread to Western Europe (Belgium) in 1987, Africa (Egypt) around 1990, East Asia (China and Japan) in 1993, the Caribbean Region (Cuba) by 1995 and South America (Brazil) around 2000. Selection pressure analysis showed that several codons in the HVR-VP2 region were under purifying selection. Conclusions/Significance To our knowledge, this work is the first study applying the Bayesian phylogeographic reconstruction approach to analyse the emergence and spread of vvIBDV strains worldwide. PMID:23805195
Node Augmentation Technique in Bayesian Network Evidence Analysis and Marshaling
Keselman, Dmitry; Tompkins, George H; Leishman, Deborah A
2010-01-01
Given a Bayesian network, sensitivity analysis is an important activity. This paper begins by describing a network augmentation technique which can simplifY the analysis. Next, we present two techniques which allow the user to determination the probability distribution of a hypothesis node under conditions of uncertain evidence; i.e. the state of an evidence node or nodes is described by a user specified probability distribution. Finally, we conclude with a discussion of three criteria for ranking evidence nodes based on their influence on a hypothesis node. All of these techniques have been used in conjunction with a commercial software package. A Bayesian network based on a directed acyclic graph (DAG) G is a graphical representation of a system of random variables that satisfies the following Markov property: any node (random variable) is independent of its non-descendants given the state of all its parents (Neapolitan, 2004). For simplicities sake, we consider only discrete variables with a finite number of states, though most of the conclusions may be generalized.
Bayesian analysis of physiologically based toxicokinetic and toxicodynamic models.
Hack, C Eric
2006-04-17
Physiologically based toxicokinetic (PBTK) and toxicodynamic (TD) models of bromate in animals and humans would improve our ability to accurately estimate the toxic doses in humans based on available animal studies. These mathematical models are often highly parameterized and must be calibrated in order for the model predictions of internal dose to adequately fit the experimentally measured doses. Highly parameterized models are difficult to calibrate and it is difficult to obtain accurate estimates of uncertainty or variability in model parameters with commonly used frequentist calibration methods, such as maximum likelihood estimation (MLE) or least squared error approaches. The Bayesian approach called Markov chain Monte Carlo (MCMC) analysis can be used to successfully calibrate these complex models. Prior knowledge about the biological system and associated model parameters is easily incorporated in this approach in the form of prior parameter distributions, and the distributions are refined or updated using experimental data to generate posterior distributions of parameter estimates. The goal of this paper is to give the non-mathematician a brief description of the Bayesian approach and Markov chain Monte Carlo analysis, how this technique is used in risk assessment, and the issues associated with this approach.
Analysis of magnetic field fluctuation thermometry using Bayesian inference
NASA Astrophysics Data System (ADS)
Wübbeler, G.; Schmähling, F.; Beyer, J.; Engert, J.; Elster, C.
2012-12-01
A Bayesian approach is proposed for the analysis of magnetic field fluctuation thermometry. The approach addresses the estimation of temperature from the measurement of a noise power spectrum as well as the analysis of previous calibration measurements. A key aspect is the reliable determination of uncertainties associated with the obtained temperature estimates, and the proposed approach naturally accounts for both the uncertainties in the calibration stage and the noise in the temperature measurement. Erlang distributions are employed to model the fluctuations of thermal noise power spectra and we show that such a procedure is justified in the light of the data. We describe in detail the Bayesian approach and briefly refer to Markov Chain Monte Carlo techniques used in the numerical calculation of the results. The MATLAB® software package we used for calculating our results is provided. The proposed approach is validated using magnetic field fluctuation power spectra recorded in the sub-kelvin region for which an independently determined reference temperature is available. As a result, the obtained temperature estimates were found to be fully consistent with the reference temperature.
Phylogenetic analysis of a transfusion-transmitted hepatitis A outbreak.
Hettmann, Andrea; Juhász, Gabriella; Dencs, Ágnes; Tresó, Bálint; Rusvai, Erzsébet; Barabás, Éva; Takács, Mária
2017-02-01
A transfusion-associated hepatitis A outbreak was found in the first time in Hungary. The outbreak involved five cases. Parenteral transmission of hepatitis A is rare, but may occur during viraemia. Direct sequencing of nested PCR products was performed, and all the examined samples were identical in the VP1/2A region of the hepatitis A virus genome. HAV sequences found in recent years were compared and phylogenetic analysis showed that the strain which caused these cases is the same as that had spread in Hungary recently causing several hepatitis A outbreaks throughout the country.
A Bayesian Framework for Reliability Analysis of Spacecraft Deployments
NASA Technical Reports Server (NTRS)
Evans, John W.; Gallo, Luis; Kaminsky, Mark
2012-01-01
Deployable subsystems are essential to mission success of most spacecraft. These subsystems enable critical functions including power, communications and thermal control. The loss of any of these functions will generally result in loss of the mission. These subsystems and their components often consist of unique designs and applications for which various standardized data sources are not applicable for estimating reliability and for assessing risks. In this study, a two stage sequential Bayesian framework for reliability estimation of spacecraft deployment was developed for this purpose. This process was then applied to the James Webb Space Telescope (JWST) Sunshield subsystem, a unique design intended for thermal control of the Optical Telescope Element. Initially, detailed studies of NASA deployment history, "heritage information", were conducted, extending over 45 years of spacecraft launches. This information was then coupled to a non-informative prior and a binomial likelihood function to create a posterior distribution for deployments of various subsystems uSing Monte Carlo Markov Chain sampling. Select distributions were then coupled to a subsequent analysis, using test data and anomaly occurrences on successive ground test deployments of scale model test articles of JWST hardware, to update the NASA heritage data. This allowed for a realistic prediction for the reliability of the complex Sunshield deployment, with credibility limits, within this two stage Bayesian framework.
Bayesian Models for fMRI Data Analysis
Zhang, Linlin; Guindani, Michele; Vannucci, Marina
2015-01-01
Functional magnetic resonance imaging (fMRI), a noninvasive neuroimaging method that provides an indirect measure of neuronal activity by detecting blood flow changes, has experienced an explosive growth in the past years. Statistical methods play a crucial role in understanding and analyzing fMRI data. Bayesian approaches, in particular, have shown great promise in applications. A remarkable feature of fully Bayesian approaches is that they allow a flexible modeling of spatial and temporal correlations in the data. This paper provides a review of the most relevant models developed in recent years. We divide methods according to the objective of the analysis. We start from spatio-temporal models for fMRI data that detect task-related activation patterns. We then address the very important problem of estimating brain connectivity. We also touch upon methods that focus on making predictions of an individual's brain activity or a clinical or behavioral response. We conclude with a discussion of recent integrative models that aim at combining fMRI data with other imaging modalities, such as EEG/MEG and DTI data, measured on the same subjects. We also briefly discuss the emerging field of imaging genetics. PMID:25750690
Learning Bayesian networks for clinical time series analysis.
van der Heijden, Maarten; Velikova, Marina; Lucas, Peter J F
2014-04-01
Autonomous chronic disease management requires models that are able to interpret time series data from patients. However, construction of such models by means of machine learning requires the availability of costly health-care data, often resulting in small samples. We analysed data from chronic obstructive pulmonary disease (COPD) patients with the goal of constructing a model to predict the occurrence of exacerbation events, i.e., episodes of decreased pulmonary health status. Data from 10 COPD patients, gathered with our home monitoring system, were used for temporal Bayesian network learning, combined with bootstrapping methods for data analysis of small data samples. For comparison a temporal variant of augmented naive Bayes models and a temporal nodes Bayesian network (TNBN) were constructed. The performances of the methods were first tested with synthetic data. Subsequently, different COPD models were compared to each other using an external validation data set. The model learning methods are capable of finding good predictive models for our COPD data. Model averaging over models based on bootstrap replications is able to find a good balance between true and false positive rates on predicting COPD exacerbation events. Temporal naive Bayes offers an alternative that trades some performance for a reduction in computation time and easier interpretation. Copyright © 2013 Elsevier Inc. All rights reserved.
A Bayesian subgroup analysis using collections of ANOVA models.
Liu, Jinzhong; Sivaganesan, Siva; Laud, Purushottam W; Müller, Peter
2017-03-20
We develop a Bayesian approach to subgroup analysis using ANOVA models with multiple covariates, extending an earlier work. We assume a two-arm clinical trial with normally distributed response variable. We also assume that the covariates for subgroup finding are categorical and are a priori specified, and parsimonious easy-to-interpret subgroups are preferable. We represent the subgroups of interest by a collection of models and use a model selection approach to finding subgroups with heterogeneous effects. We develop suitable priors for the model space and use an objective Bayesian approach that yields multiplicity adjusted posterior probabilities for the models. We use a structured algorithm based on the posterior probabilities of the models to determine which subgroup effects to report. Frequentist operating characteristics of the approach are evaluated using simulation. While our approach is applicable in more general cases, we mainly focus on the 2 × 2 case of two covariates each at two levels for ease of presentation. The approach is illustrated using a real data example.
Analysis of runoff extremes using spatial hierarchical Bayesian modeling
Reza Najafi, Mohammad; Moradkhani, Hamid
2013-10-01
A spatial hierarchical Bayesian method is developed to model the extreme runoffs over two spatial domains in Columbia River Basin, USA. This method combines the limited number of data from different locations. The two spatial domains contain 31 and 20 gage stations, respectively, with daily streamflow records ranging from 30 to over 130 years. The generalized Pareto distribution (GPD) is employed for the analysis of extremes. Temporally independent data are generated using declustering procedure, where runoff extremes are first grouped into clusters and then the maximum of each cluster is retained. The GPD scale parameter is modeled based on a Gaussian geostatistical process and additional variables including the latitude, longitude, elevation, and drainage area are incorporated by means of a hierarchy. Metropolis-Hasting within Gibbs Sampler is used to infer the parameters of the GPD and the geostatistical process to estimate the return levels across the basins. The performance of the hierarchical Bayesian model is evaluated by comparing the estimates of 100 year return level floods with the maximum likelihood estimates at sites that are not used during the parameter inference process. Various prior distributions are used to assess the sensitivity of the posterior distributions. The selected model is then employed to estimate floods with different return levels in time slices of 15 years in order to detect possible trends in runoff extremes. The results show cyclic variations in the spatial average of the 100 year return level floods across the basins with consistent increasing trends distinguishable in some areas.
Bayesian analysis of U.S. hurricane climate
Elsner, James B.; Bossak, Brian H.
2001-01-01
Predictive climate distributions of U.S. landfalling hurricanes are estimated from observational records over the period 1851–2000. The approach is Bayesian, combining the reliable records of hurricane activity during the twentieth century with the less precise accounts of activity during the nineteenth century to produce a best estimate of the posterior distribution on the annual rates. The methodology provides a predictive distribution of future activity that serves as a climatological benchmark. Results are presented for the entire coast as well as for the Gulf Coast, Florida, and the East Coast. Statistics on the observed annual counts of U.S. hurricanes, both for the entire coast and by region, are similar within each of the three consecutive 50-yr periods beginning in 1851. However, evidence indicates that the records during the nineteenth century are less precise. Bayesian theory provides a rational approach for defining hurricane climate that uses all available information and that makes no assumption about whether the 150-yr record of hurricanes has been adequately or uniformly monitored. The analysis shows that the number of major hurricanes expected to reach the U.S. coast over the next 30 yr is 18 and the number of hurricanes expected to hit Florida is 20.
Phylogenetic Analysis of Human Immunodeficiency Virus Type 2 Group B
Cella, Eleonora; Lo Presti, Alessandra; Giovanetti, Marta; Veo, Carla; Lai, Alessia; Dicuonzo, Giordano; Angeletti, Silvia; Ciotti, Marco; Zehender, Gianguglielmo; Ciccozzi, Massimo
2016-01-01
Context: Human immunodeficiency virus type 2 (HIV-2) infections are mainly restricted to West Africa; however, in the recent years, the prevalence of HIV-2 is a growing concern in some European countries and the Southwestern region of India. Despite the presence of different HIV-2 groups, only A and B Groups have established human-to-human transmission chains. Aims: This work aimed to evaluate the phylogeographic inference of HIV-2 Group B worldwide to estimate their data of origin and the population dynamics. Materials and Methods: The evolutionary rates, the demographic history for HIV-2 Group B dataset, and the phylogeographic analysis were estimated using a Bayesian approach. The viral gene flow analysis was used to count viral gene out/in flow among different locations. Results: The root of the Bayesian maximum clade credibility tree of HIV-2 Group B dated back to 1957. The demographic history of HIV-2 Group B showed that the epidemic remained constant up to 1970 when started an exponential growth. From 1985 to early 2000s, the epidemic reached a plateau, and then it was characterized by two bottlenecks and a new plateau at the end of 2000s. Phylogeographic reconstruction showed that the most probable location for the root of the tree was Ghana. Regarding the viral gene flow of HIV-2 Group B, the only observed viral gene flow was from Africa to France, Belgium, and Luxembourg. Conclusions: The study gives insights into the origin, history, and phylogeography of HIV-2 Group B epidemic. The growing number of infections of HIV-2 worldwide indicates the need for strengthening surveillance. PMID:27621561
BaTMAn: Bayesian Technique for Multi-image Analysis
Casado, J.; Ascasibar, Y.; García-Benito, R.; Guidi, G.; Choudhury, O. S.; Bellocchi, E.; Sánchez, S. F.; Díaz, A. I.
2016-12-01
Bayesian Technique for Multi-image Analysis (BaTMAn) characterizes any astronomical dataset containing spatial information and performs a tessellation based on the measurements and errors provided as input. The algorithm iteratively merges spatial elements as long as they are statistically consistent with carrying the same information (i.e. identical signal within the errors). The output segmentations successfully adapt to the underlying spatial structure, regardless of its morphology and/or the statistical properties of the noise. BaTMAn identifies (and keeps) all the statistically-significant information contained in the input multi-image (e.g. an IFS datacube). The main aim of the algorithm is to characterize spatially-resolved data prior to their analysis.
Developing and Testing a Bayesian Analysis of Fluorescence Lifetime Measurements
Needleman, Daniel J.
2017-01-01
FRET measurements can provide dynamic spatial information on length scales smaller than the diffraction limit of light. Several methods exist to measure FRET between fluorophores, including Fluorescence Lifetime Imaging Microscopy (FLIM), which relies on the reduction of fluorescence lifetime when a fluorophore is undergoing FRET. FLIM measurements take the form of histograms of photon arrival times, containing contributions from a mixed population of fluorophores both undergoing and not undergoing FRET, with the measured distribution being a mixture of exponentials of different lifetimes. Here, we present an analysis method based on Bayesian inference that rigorously takes into account several experimental complications. We test the precision and accuracy of our analysis on controlled experimental data and verify that we can faithfully extract model parameters, both in the low-photon and low-fraction regimes. PMID:28060890
Risk analysis of dust explosion scenarios using Bayesian networks.
Yuan, Zhi; Khakzad, Nima; Khan, Faisal; Amyotte, Paul
2015-02-01
In this study, a methodology has been proposed for risk analysis of dust explosion scenarios based on Bayesian network. Our methodology also benefits from a bow-tie diagram to better represent the logical relationships existing among contributing factors and consequences of dust explosions. In this study, the risks of dust explosion scenarios are evaluated, taking into account common cause failures and dependencies among root events and possible consequences. Using a diagnostic analysis, dust particle properties, oxygen concentration, and safety training of staff are identified as the most critical root events leading to dust explosions. The probability adaptation concept is also used for sequential updating and thus learning from past dust explosion accidents, which is of great importance in dynamic risk assessment and management. We also apply the proposed methodology to a case study to model dust explosion scenarios, to estimate the envisaged risks, and to identify the vulnerable parts of the system that need additional safety measures.
Bayesian imperfect information analysis for clinical recurrent data.
Chang, Chih-Kuang; Chang, Chi-Chang
2015-01-01
In medical research, clinical practice must often be undertaken with imperfect information from limited resources. This study applied Bayesian imperfect information-value analysis to realistic situations to produce likelihood functions and posterior distributions, to a clinical decision-making problem for recurrent events. In this study, three kinds of failure models are considered, and our methods illustrated with an analysis of imperfect information from a trial of immunotherapy in the treatment of chronic granulomatous disease. In addition, we present evidence toward a better understanding of the differing behaviors along with concomitant variables. Based on the results of simulations, the imperfect information value of the concomitant variables was evaluated and different realistic situations were compared to see which could yield more accurate results for medical decision-making.
Bayesian data analysis: estimating the efficacy of T'ai Chi as a case study.
Carpenter, Jacque; Gajewski, Byron; Teel, Cynthia; Aaronson, Lauren S
2008-01-01
Bayesian inference provides a formal framework for updating knowledge by combining prior knowledge with current data. Over the past 10 years, the Bayesian paradigm has become a popular analytic tool in health research. Although the nursing literature contains examples of Bayes' theorem applications to clinical decision making, it lacks an adequate introduction to Bayesian data analysis. Bayesian data analysis is introduced through a fully Bayesian model for determining the efficacy of tai chi as an illustrative example. The mechanics of using Bayesian models to combine prior knowledge, or data from previous studies, with observed data from a current study are discussed. The primary outcome in the illustrative example was physical function. Three prior probability distributions (priors) were generated for physical function using data from a similar study found in the literature. Each prior was combined with the likelihood from observed data in the current study to obtain a posterior probability distribution. In each case, the posterior distribution showed that the probability that the control group is better than the tai chi treatment group was low. Bayesian analysis is a valid technique that allows the researcher to manage varying amounts of data appropriately. As advancements in computer software continue, Bayesian techniques will become more accessible. Researchers must educate themselves on applications for Bayesian inference, as well as its methods and implications for future research.
A Bayesian Hierarchical Approach to Regional Frequency Analysis of Extremes
Renard, B.
2010-12-01
Rainfall and runoff frequency analysis is a major issue for the hydrological community. The distribution of hydrological extremes varies in space and possibly in time. Describing and understanding this spatiotemporal variability are primary challenges to improve hazard quantification and risk assessment. This presentation proposes a general approach based on a Bayesian hierarchical model, following previous work by Cooley et al. [2007], Micevski [2007], Aryal et al. [2009] or Lima and Lall [2009; 2010]. Such a hierarchical model is made up of two levels: (1) a data level modeling the distribution of observations, and (2) a process level describing the fluctuation of the distribution parameters in space and possibly in time. At the first level of the model, at-site data (e.g., annual maxima series) are modeled with a chosen distribution (e.g., a GEV distribution). Since data from several sites are considered, the joint distribution of a vector of (spatial) observations needs to be derived. This is challenging because data are in general not spatially independent, especially for nearby sites. An elliptical copula is therefore used to formally account for spatial dependence between at-site data. This choice might be questionable in the context of extreme value distributions. However, it is motivated by its applicability in spatial highly dimensional problems, where the joint pdf of a vector of n observations is required to derive the likelihood function (with n possibly amounting to hundreds of sites). At the second level of the model, parameters of the chosen at-site distribution are then modeled by a Gaussian spatial process, whose mean may depend on covariates (e.g. elevation, distance to sea, weather pattern, time). In particular, this spatial process allows estimating parameters at ungauged sites, and deriving the predictive distribution of rainfall/runoff at every pixel/catchment of the studied domain. An application to extreme rainfall series from the French
Phylogenetic analysis of diprotodontian marsupials based on complete mitochondrial genomes.
Munemasa, Maruo; Nikaido, Masato; Donnellan, Stephen; Austin, Christopher C; Okada, Norihiro; Hasegawa, Masami
2006-06-01
Australidelphia is the cohort, originally named by Szalay, of all Australian marsupials and the South American Dromiciops. A lot of mitochondria and nuclear genome studies support the hypothesis of a monophyly of Australidelphia, but some familial relationships in Australidelphia are still unclear. In particular, the familial relationships among the order Diprotodontia (koala, wombat, kangaroos and possums) are ambiguous. These Diprotodontian families are largely grouped into two suborders, Vombatiformes, which contains Phascolarctidae (koala) and Vombatidae (wombat), and Phalangerida, which contains Macropodidae, Potoroidae, Phalangeridae, Petauridae, Pseudocheiridae, Acrobatidae, Tarsipedidae and Burramyidae. Morphological evidence and some molecular analyses strongly support monophyly of the two families in Vombatiformes. The monophyly of Phalangerida as well as the phylogenetic relationships of families in Phalangerida remains uncertain, however, despite searches for morphological synapomorphy and mitochondrial DNA sequence analyses. Moreover, phylogenetic relationships among possum families (Phalangeridae, Petauridae, Pseudocheiridae, Acrobatidae, Tarsipedidae and Burramyidae) as well as a sister group of Macropodoidea (Macropodidae and Potoroidae) remain unclear. To evaluate familial relationships among Dromiciops and Australian marsupials as well as the familial relationships in Diprotodontia, we determined the complete mitochondrial sequence of six Diprotodontian species. We used Maximum Likelihood analyses with concatenated amino acid and codon sequences of 12 mitochondrial protein genomes. Our analysis of mitochondria amino acid sequence supports monophyly of Australian marsupials+Dromiciops and monophyly of Phalangerida. The close relatedness between Macropodidae and Phalangeridae is also weakly supported by our analysis.
Molecular detection and phylogenetic analysis of bovine astrovirus in Brazil.
Candido, Marcelo; Alencar, Anna Luiza Farias; Almeida-Queiroz, Sabrina R; Buzinaro, Maria da Glória; Munin, Flavia Simone; de Godoy, Silvia Helena Seraphin; Livonesi, Marcia Cristina; Fernandes, Andrezza Maria; de Sousa, Ricardo Luiz Moro
2015-06-01
Bovine astrovirus (BoAstV) is associated with gastroenterical disorders such as diarrhea, particularly in neonates and immunocompromised animals. Its prevalence is >60 % in the first five weeks of the animal's life. The aim of this study was to detect and perform a phylogenetic analysis of BoAstV in Brazilian cattle. A prevalence of 14.3 % of BoAstV in fecal samples from 272 head of cattle from different Brazilian states was detected, and 11 samples were analyzed by nucleotide sequencing. The majority of positive samples were obtained from diarrheic animals (p < 0.01). Phylogenetic analysis revealed that Brazilian samples were grouped in clades along with other BoAstV isolates. There was 74.3 %-96.5 % amino acid sequence similarity between the samples in this study and >74.8 % when compared with reference samples for enteric BoAstV. Our results indicate, for the first time, the occurrence of BoAstV circulation in cattle from different regions of Brazil, prevalently in diarrheic calves.
BATMAN: Bayesian Technique for Multi-image Analysis
Casado, J.; Ascasibar, Y.; García-Benito, R.; Guidi, G.; Choudhury, O. S.; Bellocchi, E.; Sánchez, S. F.; Díaz, A. I.
2017-04-01
This paper describes the Bayesian Technique for Multi-image Analysis (BATMAN), a novel image-segmentation technique based on Bayesian statistics that characterizes any astronomical data set containing spatial information and performs a tessellation based on the measurements and errors provided as input. The algorithm iteratively merges spatial elements as long as they are statistically consistent with carrying the same information (i.e. identical signal within the errors). We illustrate its operation and performance with a set of test cases including both synthetic and real integral-field spectroscopic data. The output segmentations adapt to the underlying spatial structure, regardless of its morphology and/or the statistical properties of the noise. The quality of the recovered signal represents an improvement with respect to the input, especially in regions with low signal-to-noise ratio. However, the algorithm may be sensitive to small-scale random fluctuations, and its performance in presence of spatial gradients is limited. Due to these effects, errors may be underestimated by as much as a factor of 2. Our analysis reveals that the algorithm prioritizes conservation of all the statistically significant information over noise reduction, and that the precise choice of the input data has a crucial impact on the results. Hence, the philosophy of BaTMAn is not to be used as a 'black box' to improve the signal-to-noise ratio, but as a new approach to characterize spatially resolved data prior to its analysis. The source code is publicly available at http://astro.ft.uam.es/SELGIFS/BaTMAn.
Three case studies in the Bayesian analysis of cognitive models.
Lee, Michael D
2008-02-01
Bayesian statistical inference offers a principled and comprehensive approach for relating psychological models to data. This article presents Bayesian analyses of three influential psychological models: multidimensional scaling models of stimulus representation, the generalized context model of category learning, and a signal detection theory model of decision making. In each case, the model is recast as a probabilistic graphical model and is evaluated in relation to a previously considered data set. In each case, it is shown that Bayesian inference is able to provide answers to important theoretical and empirical questions easily and coherently. The generality of the Bayesian approach and its potential for the understanding of models and data in psychology are discussed.
Bayesian Library for the Analysis of Neutron Diffraction Data
Ratcliff, William; Lesniewski, Joseph; Quintana, Dylan
During this talk, I will introduce the Bayesian Library for the Analysis of Neutron Diffraction Data. In this library we use of the DREAM algorithm to effectively sample parameter space. This offers several advantages over traditional least squares fitting approaches. It gives us more robust estimates of the fitting parameters, their errors, and their correlations. It also is more stable than least squares methods and provides more confidence in finding a global minimum. I will discuss the algorithm and its application to several materials. I will show applications to both structural and magnetic diffraction patterns. I will present examples of fitting both powder and single crystal data. We would like to acknowledge support from the Department of Commerce and the NSF.
Testing Hardy-Weinberg equilibrium: an objective Bayesian analysis.
Consonni, Guido; Moreno, Elías; Venturini, Sergio
2011-01-15
We analyze the general (multiallelic) Hardy-Weinberg equilibrium problem from an objective Bayesian testing standpoint. We argue that for small or moderate sample sizes the answer is rather sensitive to the prior chosen, and this suggests to carry out a sensitivity analysis with respect to the prior. This goal is achieved through the identification of a class of priors specifically designed for this testing problem. In this paper, we consider the class of intrinsic priors under the full model, indexed by a tuning quantity, the training sample size. These priors are objective, satisfy Savage's continuity condition and have proved to behave extremely well for many statistical testing problems. We compute the posterior probability of the Hardy-Weinberg equilibrium model for the class of intrinsic priors, assess robustness over the range of plausible answers, as well as stability of the decision in favor of either hypothesis. Copyright © 2010 John Wiley & Sons, Ltd.
Objective Bayesian Comparison of Constrained Analysis of Variance Models.
Consonni, Guido; Paroli, Roberta
2016-10-04
In the social sciences we are often interested in comparing models specified by parametric equality or inequality constraints. For instance, when examining three group means [Formula: see text] through an analysis of variance (ANOVA), a model may specify that [Formula: see text], while another one may state that [Formula: see text], and finally a third model may instead suggest that all means are unrestricted. This is a challenging problem, because it involves a combination of nonnested models, as well as nested models having the same dimension. We adopt an objective Bayesian approach, requiring no prior specification from the user, and derive the posterior probability of each model under consideration. Our method is based on the intrinsic prior methodology, suitably modified to accommodate equality and inequality constraints. Focussing on normal ANOVA models, a comparative assessment is carried out through simulation studies. We also present an application to real data collected in a psychological experiment.
Bayesian analysis of factors associated with fibromyalgia syndrome subjects
Jayawardana, Veroni; Mondal, Sumona; Russek, Leslie
2015-01-01
Factors contributing to movement-related fear were assessed by Russek, et al. 2014 for subjects with Fibromyalgia (FM) based on the collected data by a national internet survey of community-based individuals. The study focused on the variables, Activities-Specific Balance Confidence scale (ABC), Primary Care Post-Traumatic Stress Disorder screen (PC-PTSD), Tampa Scale of Kinesiophobia (TSK), a Joint Hypermobility Syndrome screen (JHS), Vertigo Symptom Scale (VSS-SF), Obsessive-Compulsive Personality Disorder (OCPD), Pain, work status and physical activity dependent from the "Revised Fibromyalgia Impact Questionnaire" (FIQR). The study presented in this paper revisits same data with a Bayesian analysis where appropriate priors were introduced for variables selected in the Russek's paper.
Bayesian Analysis of Peak Ground Acceleration Attenuation Relationship
Mu Heqing; Yuen Kaveng
2010-05-21
Estimation of peak ground acceleration is one of the main issues in civil and earthquake engineering practice. The Boore-Joyner-Fumal empirical formula is well known for this purpose. In this paper we propose to use the Bayesian probabilistic model class selection approach to obtain the most suitable prediction model class for the seismic attenuation formula. The optimal model class is robust in the sense that it has balance between the data fitting capability and the sensitivity to noise. A database of strong-motion records is utilized for the analysis. It turns out that the optimal model class is simpler than the full order attenuation model suggested by Boore, Joyner and Fumal (1993).
Bayesian analysis of galaxy SEDs from FUV to FIR
Noll, S.; Burgarella, D.; Marcillac, D.; Giovannoli, E.; Buat, V.
2008-11-01
Photometric data of galaxies ranging from rest-frame far-UV to far-IR allow to derive galaxy properties in a robust way by fitting the attenuated stellar emission and the related dust emission at the same time. For this purpose we have written a code which uses model spectra composed of the Maraston stellar population models, synthetic attenuation functions based on a modified Calzetti law, spectral line templates, and the Dale & Helou dust emission models. Depending on the input redshifts filter fluxes are computed for the model set and compared to the galaxy photometry by carrying out a Bayesian analysis. The code is tested by analysing a subset of the SINGS sample of nearby galaxies. We illustrate the quality of the results by comparing them to literature data and discuss the importance of IR data for the reliability of the fitting.
BASE-9: Bayesian Analysis for Stellar Evolution with nine variables
Robinson, Elliot; von Hippel, Ted; Stein, Nathan; Stenning, David; Wagner-Kaiser, Rachel; Si, Shijing; van Dyk, David
2016-08-01
The BASE-9 (Bayesian Analysis for Stellar Evolution with nine variables) software suite recovers star cluster and stellar parameters from photometry and is useful for analyzing single-age, single-metallicity star clusters, binaries, or single stars, and for simulating such systems. BASE-9 uses a Markov chain Monte Carlo (MCMC) technique along with brute force numerical integration to estimate the posterior probability distribution for the age, metallicity, helium abundance, distance modulus, line-of-sight absorption, and parameters of the initial-final mass relation (IFMR) for a cluster, and for the primary mass, secondary mass (if a binary), and cluster probability for every potential cluster member. The MCMC technique is used for the cluster quantities (the first six items listed above) and numerical integration is used for the stellar quantities (the last three items in the above list).
Bayesian analysis of clustered interval-censored data.
Wong, M C M; Lam, K F; Lo, E C M
2005-09-01
The recording of multiple interval-censored failure times is common in dental research. Modeling multilevel data has been a difficult task. This paper aims to use the Bayesian approach to analyze a set of multilevel clustered interval-censored data from a clinical study to investigate the effectiveness of silver diamine fluoride and sodium fluoride varnish in arresting active dentin caries in Chinese pre-school children. The time to arrest dentin caries on a surface was measured. A three-level random-effects Weibull regression model was used. Analysis was performed with WinBUGS. Results revealed a strong positive correlation (0.596) among the caries lesions' arrest times on different surfaces from the same child. The software WinBUGS made the above complicated estimation simple. In conclusion, the annual application of silver diamine fluoride on caries lesions, and caries removal before the application, were found to shorten the arrest time.
Bayesian Spectral Analysis of Chorus Sub-Elements
Crabtree, C. E.; Tejero, E. M.; Ganguli, G.; Hospodarsky, G. B.; Kletzing, C.
2016-12-01
We develop a Bayesian spectral analysis technique that calculates the probability distribution functions of a superposition of wave-modes each described by a linear growth rate, a frequency and a chirp rate. The Bayesian framework has a number of advantages, including 1) reducing the parameter space by integrating over the amplitude and phase of the wave, 2) incorporating the data from each channel to determine the model parameters such as frequency which leads to high resolution results in frequency and time, 3) the ability to consider the superposition of waves where the wave-parameters are closely spaced, 4) the ability to directly calculate the expectation value of wave parameters without resorting to ensemble averages, 5) the ability to calculate error bars on model parameters. We examine one rising-tone chorus element in detail from a disturbed time on November 14, 2012 using burst mode waveform data of the three components of the electric and magnetic field from the EMFISIS instrument on board NASA's Van Allen Probes. The results of the analysis demonstrate that whistler mode chorus sub-elements are composed of almost linear waves that are nearly parallel propagating with continuously changing wave parameters such as frequency and wave-vector. The change of wave-vector as a function of time is a three-dimensional phenomenon suggesting that 2D simulations may not accurately represent chorus. The initial parts of the sub-elements are in good agreement with the analytical theory of Omura et al. 2008. However, between sub-elements the wave parameters of the dominant mode undergo discrete changes in frequency and wave-vector. Near the boundary of sub-elements multiple waves are observed such that the evolution of the waves is reminiscent of wave-wave processes such as parametric decay or induced scattering by particles. These nonlinear processes are signatures of weak turbulence and may affect the saturation of the whistler-mode chorus instability.
Discrete Dynamic Bayesian Network Analysis of fMRI Data
Burge, John; Lane, Terran; Link, Hamilton; Qiu, Shibin; Clark, Vincent P.
2010-01-01
We examine the efficacy of using discrete Dynamic Bayesian Networks (dDBNs), a data-driven modeling technique employed in machine learning, to identify functional correlations among neuroanatomical regions of interest. Unlike many neuroimaging analysis techniques, this method is not limited by linear and/or Gaussian noise assumptions. It achieves this by modeling the time series of neuroanatomical regions as discrete, as opposed to continuous, random variables with multinomial distributions. We demonstrated this method using an fMRI dataset collected from healthy and demented elderly subjects and identify correlates based on a diagnosis of dementia. The results are validated in three ways. First, the elicited correlates are shown to be robust over leave-one-out cross-validation and, via a Fourier bootstrapping method, that they were not likely due to random chance. Second, the dDBNs identified correlates that would be expected given the experimental paradigm. Third, the dDBN's ability to predict dementia is competitive with two commonly employed machine-learning classifiers: the support vector machine and the Gaussian naïve Bayesian network. We also verify that the dDBN selects correlates based on non-linear criteria. Finally, we provide a brief analysis of the correlates elicited from Buckner et al.'s data that suggests that demented elderly subjects have reduced involvement of entorhinal and occipital cortex and greater involvement of the parietal lobe and amygdala in brain activity compared with healthy elderly (as measured via functional correlations among BOLD measurements). Limitations and extensions to the dDBN method are discussed. PMID:17990301
A Bayesian Seismic Hazard Analysis for the city of Naples
NASA Astrophysics Data System (ADS)
Faenza, Licia; Pierdominici, Simona; Hainzl, Sebastian; Cinti, Francesca R.; Sandri, Laura; Selva, Jacopo; Tonini, Roberto; Perfetti, Paolo
2016-04-01
In the last years many studies have been focused on determination and definition of the seismic, volcanic and tsunamogenic hazard in the city of Naples. The reason is that the town of Naples with its neighboring area is one of the most densely populated places in Italy. In addition, the risk is increased also by the type and condition of buildings and monuments in the city. It is crucial therefore to assess which active faults in Naples and surrounding area could trigger an earthquake able to shake and damage the urban area. We collect data from the most reliable and complete databases of macroseismic intensity records (from 79 AD to present). For each seismic event an active tectonic structure has been associated. Furthermore a set of active faults, well-known from geological investigations, located around the study area that they could shake the city, not associated with any earthquake, has been taken into account for our studies. This geological framework is the starting point for our Bayesian seismic hazard analysis for the city of Naples. We show the feasibility of formulating the hazard assessment procedure to include the information of past earthquakes into the probabilistic seismic hazard analysis. This strategy allows on one hand to enlarge the information used in the evaluation of the hazard, from alternative models for the earthquake generation process to past shaking and on the other hand to explicitly account for all kinds of information and their uncertainties. The Bayesian scheme we propose is applied to evaluate the seismic hazard of Naples. We implement five different spatio-temporal models to parameterize the occurrence of earthquakes potentially dangerous for Naples. Subsequently we combine these hazard curves with ShakeMap of past earthquakes that have been felt in Naples. The results are posterior hazard assessment for three exposure times, e.g., 50, 10 and 5 years, in a dense grid that cover the municipality of Naples, considering bedrock soil
Phylogenetic analysis and characterization of Korean bovine viral diarrhea viruses.
Oem, Jae-Ku; Hyun, Bang-Hun; Cha, Sang-Ho; Lee, Kyoung-Ki; Kim, Seong-Hee; Kim, Hye-Ryoung; Park, Choi-Kyu; Joo, Yi-Seok
2009-11-18
Thirty-six bovine viral disease viruses (BVDVs) were identified in bovine feces (n=16), brains (n=2), and aborted fetuses (n=18) in Korea. To reveal the genetic diversity and characteristics of these Korean strains, the sequences of their 5'-untranslated regions (5'-UTRs) were determined and then compared with published reference sequences. Neighbor-joining phylogenetic analysis revealed that most of the Korean viruses were of the BVDV subtypes 1a (n=17) or 2a (n=17). The remaining strains were of subtypes 1b (n=1) and 1n (n=1). This analysis indicates that the 1a and 2a BVDV subtypes are predominant and widespread in Korea. In addition, the prevalence of BVDV-2 was markedly higher in aborted fetuses than in other samples and was more often associated with reproductive problems and significant mortality in cattle.
Diagnosis and phylogenetic analysis of ovine pulmonary adenocarcinoma in China.
Zhang, Keshan; Kong, Hanjin; Liu, Yongjie; Shang, Youjun; Wu, Bin; Liu, Xiangtao
2014-02-01
Ovine pulmonary adenocarcinoma (OPA) is a lung tumor of sheep caused by jaagsiekte sheep retrovirus (JSRV). OPA is common in sheep, and it is most commonly observed in China. Without preventative vaccines and serological diagnostic tools for assay of OPA, identification of JSRV based on reverse transcription polymerase chain reaction (RT-PCR) is very important for prevention and control measures for OPA in practice management. In this study, the diagnosis of OPA was made from analysis of clinical signs, pathological observations, JSRV-like particle discovery, and RT-PCR of the target env gene. The phylogenetic analysis showed that the China Shandong (SD) strain studied in this article belonged to exogenous JSRV, and it was very similar to 92k3, which was isolated from sheep in the Kenya (Y18305). The current study reported a severe outbreak of OPA in Shandong Province, China. The observations could offer a comparative view of the env gene of JSRV.
A phylogenetic transform enhances analysis of compositional microbiota data.
Silverman, Justin D; Washburne, Alex D; Mukherjee, Sayan; David, Lawrence A
2017-02-15
Surveys of microbial communities (microbiota), typically measured as relative abundance of species, have illustrated the importance of these communities in human health and disease. Yet, statistical artifacts commonly plague the analysis of relative abundance data. Here, we introduce the PhILR transform, which incorporates microbial evolutionary models with the isometric log-ratio transform to allow off-the-shelf statistical tools to be safely applied to microbiota surveys. We demonstrate that analyses of community-level structure can be applied to PhILR transformed data with performance on benchmarks rivaling or surpassing standard tools. Additionally, by decomposing distance in the PhILR transformed space, we identified neighboring clades that may have adapted to distinct human body sites. Decomposing variance revealed that covariation of bacterial clades within human body sites increases with phylogenetic relatedness. Together, these findings illustrate how the PhILR transform combines statistical and phylogenetic models to overcome compositional data challenges and enable evolutionary insights relevant to microbial communities.
Phylogenetic analysis of cichlid fishes using nuclear DNA markers.
Sültmann, H; Mayer, W E; Figueroa, F; Tichy, H; Klein, J
1995-11-01
The recent explosive adaptive radiation of cichlids in the great lakes of Africa has attracted the attention of both morphologists and molecular biologists. To decipher the phylogenetic relationships among the various taxa within the family Cichlidae is a prerequisite for answering some fundamental questions about the nature of the speciation process. In the present study, we used the random amplification of polymorphic DNA (RAPD) technique to obtain sequence differences between selected cichlid species. We then designed specific primers based on these sequences and used them to amplify template DNA from a large number of species by the polymerase chain reaction (PCR). We sequenced the amplified products and searched the sequences for indels and shared substitutions. We identified a number of such characters at three loci--DXTU1, DXTU2, and DXTU3--and used them for phylogenetic and cladistic analysis of the relationships among the various cichlid groups. Our studies assign an outgroup position to Neotropical cichlids in relation to African cichlids, provide evidence for a sister-group relationship of tilapiines to the haplochromines, group Cyphotilapia frontosa with the lamprologines of Lake Tanganyika, place Astatoreochromis alluaudi to an outgroup position with respect to other haplochromines of Lakes Victoria and Malawi, and provide additional support for the monophyly of the remaining Lake Victoria haplochromines and the Lake Malawi haplochromines. The described approach holds great promise for further resolution of cichlid phylogeny.
Networks in phylogenetic analysis: new tools for population biology.
Morrison, David A
2005-04-30
Phylogenetic analysis has changed greatly in the past decade, including the more widespread appreciation of the idea that evolutionary histories are not always tree-like, and may, thus, be best represented as reticulated networks rather than as strictly dichotomous trees. Reconstructing such histories in the absence of a bifurcating speciation process is even more difficult than the usual procedure, and a range of alternative strategies have been developed. There seem to be two basic uses for a network model of evolution: the display of real but unobservable evolutionary events (i.e. a hypothesis of the true phylogenetic history), and the display of character conflict within the data itself (i.e. a summary of the data). These two general approaches are briefly reviewed here, and the strengths and weaknesses of the different implementations are compared and contrasted. Each network methodology seems to have limitations in terms of how it responds to increasing complexity (e.g. conflict) in the data, and therefore each is likely to be more appropriate for one of the two uses than for the other. Several examples using parasitological data sets illustrate the uses of networks within the context of population biology.
Phylogenetic analysis of the Argonaute protein family in platyhelminths.
Zheng, Yadong
2013-03-01
Argonaute proteins (AGOs) are mediators of gene silencing via recruitment of small regulatory RNAs to induce translational regression or degradation of targeted molecules. Platyhelminths have been reported to express microRNAs but the diversity of AGOs in the phylum has not been explored. Phylogenetic relationships of members of this protein family were studied using data from six platyhelminth genomes. Phylogenetic analysis showed that all cestode and trematode AGOs, along with some triclad planarian AGOs, were grouped into the Ago subfamily and its novel sister clade, here referred to as Cluster 1. These were very distant from Piwi and Class 3 subfamilies. By contrast, a number of planarian Piwi-like AGOs formed a novel sister clade to the Piwi subfamily. Extensive sequence searching revealed the presence of an additional locus for AGO2 in the cestode Echinococcus granulosus and exon expansion in this species and E. multilocularis. The current study suggests the absence of the Piwi subfamily and Class 3 AGOs in cestodes and trematodes and the Piwi-like AGO expansion in a free-living triclad planarian and the occurrence of exon expansion prior to or during the evolution of the most-recent common ancestor of the Echinococcus species studied. Copyright © 2012 Elsevier Inc. All rights reserved.
A phylogenetic transform enhances analysis of compositional microbiota data
Silverman, Justin D; Washburne, Alex D; Mukherjee, Sayan; David, Lawrence A
2017-01-01
Surveys of microbial communities (microbiota), typically measured as relative abundance of species, have illustrated the importance of these communities in human health and disease. Yet, statistical artifacts commonly plague the analysis of relative abundance data. Here, we introduce the PhILR transform, which incorporates microbial evolutionary models with the isometric log-ratio transform to allow off-the-shelf statistical tools to be safely applied to microbiota surveys. We demonstrate that analyses of community-level structure can be applied to PhILR transformed data with performance on benchmarks rivaling or surpassing standard tools. Additionally, by decomposing distance in the PhILR transformed space, we identified neighboring clades that may have adapted to distinct human body sites. Decomposing variance revealed that covariation of bacterial clades within human body sites increases with phylogenetic relatedness. Together, these findings illustrate how the PhILR transform combines statistical and phylogenetic models to overcome compositional data challenges and enable evolutionary insights relevant to microbial communities. DOI: http://dx.doi.org/10.7554/eLife.21887.001 PMID:28198697
Priya, R; Siva, Ramamoorthy
2015-07-01
During different environmental stress conditions, plant growth is regulated by the hormone abscisic acid (an apocarotenoid). In the biosynthesis of abscisic acid, the oxidative cleavage of cis-epoxycarotenoid catalyzed by 9-cis-epoxycarotenoid dioxygenase (NCED) is the crucial step. The NCED genes were isolated in numerous plant species and those genes were phylogenetically investigated to understand the evolution of NCED genes in various plant lineages comprising lycophyte, gymnosperm, dicot and monocot. A total of 93 genes were obtained from 48 plant species to statistically estimate their sequence conservation and functional divergence. Selaginella moellendorffii appeared to be evolutionarily distinct from those of the angiosperms, insisting the substantial influence of natural selection pressure on NCED genes. Further, using exon-intron structure analysis, the gene structures of NCED were found to be conserved across some species. In addition, the substitution rate ratio of non-synonymous (Ka) versus synonymous (Ks) mutations using the Bayesian inference approach, depicted the critical amino acid residues for functional divergence. A significant functional divergence was found between some subgroups through the co-efficient of type-I functional divergence. Our results suggest that the evolution of NCED genes occurred by duplication, diversification and exon intron loss events. The site-specific profile and functional diverge analysis revealed NCED genes might facilitate the tissue-specific functional divergence in NCED sub-families, that could combat different environmental stress conditions aiding plant survival.
Chao, Q J; Li, Y D; Geng, X X; Zhang, L; Dai, X; Zhang, X; Li, J; Zhang, H J
2014-04-14
This is the first report of a complete mitochondrial genome sequence from Himalayan marmot (Marmota himalayana, class Marmota). We determined the M. himalayana mitochondrial (mt) genome sequence by using long-PCR methods and a primer-walking sequencing strategy with genus-specific primers. The complete mt genome of M. himalayana was 16,443 bp in length and comprised 13 protein-coding genes, 2 ribosomal RNA (rRNA) genes, 22 transfer RNA (tRNA) genes, and a typical control region (CR). Gene order and orientation were identical to those in mt genomes of most vertebrates. The heavy strand showed an overall A+T content of 63.49%. AT and GC skews for the mt genome of the M. himalayana were 0.012 and -0.300, respectively, indicating a nucleotide bias against T and G. The control region was 997 bp in size and displayed some unusual features, including absence of repeated motifs and two conserved sequence blocks (CSB2 and CSB3), which is consistent with observations from two other rodent species, Sciurus vulgaris and Myoxus glis. Phylogenetic analysis of complete mt DNA sequences without the control region including 30 taxa of Rodentia was performed with Maximum-Likelihood (ML) and Bayesian Inference (BI) methods and provided strong support for Sciurognathi polyphyly and Hystricognathi monophyly. This analysis also provided evidence that M. himalayana mt DNA was closely related to that from Sciurus vulgaris (Sciuridae) and was similar to mt DNA from Myoxus glis.
Chung, Gregory K. W. K.; Dionne, Gary B.; Kaiser, William J.
2006-01-01
Our research question was whether we could develop a feasible technique, using Bayesian networks, to diagnose gaps in student knowledge. Thirty-four college-age participants completed tasks designed to measure conceptual knowledge, procedural knowledge, and problem-solving skills related to circuit analysis. A Bayesian network was used to model…
Li, Xiaoxu; Liu, Cheng; Li, Wei; Zhang, Zenglin; Gao, Xiaoming; Zhou, Hui; Guo, Yongfeng
2016-05-01
Members of the plant-specific WOX transcription factor family have been reported to play important roles in cell to cell communication as well as other physiological and developmental processes. In this study, ten members of the WOX transcription factor family were identified in Solanum lycopersicum with HMMER. Neighbor-joining phylogenetic tree, maximum-likelihood tree and Bayesian-inference tree were constructed and similar topologies were shown using the protein sequences of the homeodomain. Phylogenetic study revealed that the 25 WOX family members from Arabidopsis and tomato fall into three clades and nine subfamilies. The patterns of exon-intron structures and organization of conserved domains in Arabidopsis and tomato were consistent based on the phylogenetic results. Transcriptome analysis showed that the expression patterns of SlWOXs were different in different tissue types. Gene Ontology (GO) analysis suggested that, as transcription factors, the SlWOX family members could be involved in a number of biological processes including cell to cell communication and tissue development. Our results are useful for future studies on WOX family members in tomato and other plant species.
Evans, Jason; Sullivan, Jack
2011-01-01
A priori selection of models for use in phylogeny estimation from molecular sequence data is increasingly important as the number and complexity of available models increases. The Bayesian information criterion (BIC) and the derivative decision-theoretic (DT) approaches rely on a conservative approximation to estimate the posterior probability of a given model. Here, we extended the DT method by using reversible jump Markov chain Monte Carlo approaches to directly estimate model probabilities for an extended candidate pool of all 406 special cases of the general time reversible + Γ family. We analyzed 250 diverse data sets in order to evaluate the effectiveness of the BIC approximation for model selection under the BIC and DT approaches. Model choice under DT differed between the BIC approximation and direct estimation methods for 45% of the data sets (113/250), and differing model choice resulted in significantly different sets of trees in the posterior distributions for 26% of the data sets (64/250). The model with the lowest BIC score differed from the model with the highest posterior probability in 30% of the data sets (76/250). When the data indicate a clear model preference, the BIC approximation works well enough to result in the same model selection as with directly estimated model probabilities, but a substantial proportion of biological data sets lack this characteristic, which leads to selection of underparametrized models.
RFLP analysis of mtDNA from six platyrrhine genera: phylogenetic inferences.
Ruiz-García, M; Alvarez, D
2003-01-01
This study investigates the phylogenetic relationships of 10 species of platyrrhine primates using RFLP analysis of mtDNA. Three restriction enzymes were used to determine the restriction site haplotypes for a total of 276 individuals. Phylogenetic analysis using maximum parsimony was employed to construct phylogenetic trees. We found close phylogenetic relationships between Alouatta, Lagothrix and Ateles. We also found a close relationship between Cebus and Aotus, with Saimiri clustering with the atelines. Haplotype diversity was found in four of the species studied, in Cebus albifrons, Saimiri sciureus, Lagothrix lagotricha and Ateles fusciceps. These data provide additional information concerning the phylogenetic relationships between these platyrrhine genera and species.
Fresia, Pablo; Azeredo-Espin, Ana Maria L; Lyra, Mariana L
2013-01-01
Insect pest phylogeography might be shaped both by biogeographic events and by human influence. Here, we conducted an approximate Bayesian computation (ABC) analysis to investigate the phylogeography of the New World screwworm fly, Cochliomyia hominivorax, with the aim of understanding its population history and its order and time of divergence. Our ABC analysis supports that populations spread from North to South in the Americas, in at least two different moments. The first split occurred between the North/Central American and South American populations in the end of the Last Glacial Maximum (15,300-19,000 YBP). The second split occurred between the North and South Amazonian populations in the transition between the Pleistocene and the Holocene eras (9,100-11,000 YBP). The species also experienced population expansion. Phylogenetic analysis likewise suggests this north to south colonization and Maxent models suggest an increase in the number of suitable areas in South America from the past to present. We found that the phylogeographic patterns observed in C. hominivorax cannot be explained only by climatic oscillations and can be connected to host population histories. Interestingly we found these patterns are very coincident with general patterns of ancient human movements in the Americas, suggesting that humans might have played a crucial role in shaping the distribution and population structure of this insect pest. This work presents the first hypothesis test regarding the processes that shaped the current phylogeographic structure of C. hominivorax and represents an alternate perspective on investigating the problem of insect pests.
Azeredo-Espin, Ana Maria L.
2013-01-01
Insect pest phylogeography might be shaped both by biogeographic events and by human influence. Here, we conducted an approximate Bayesian computation (ABC) analysis to investigate the phylogeography of the New World screwworm fly, Cochliomyia hominivorax, with the aim of understanding its population history and its order and time of divergence. Our ABC analysis supports that populations spread from North to South in the Americas, in at least two different moments. The first split occurred between the North/Central American and South American populations in the end of the Last Glacial Maximum (15,300-19,000 YBP). The second split occurred between the North and South Amazonian populations in the transition between the Pleistocene and the Holocene eras (9,100-11,000 YBP). The species also experienced population expansion. Phylogenetic analysis likewise suggests this north to south colonization and Maxent models suggest an increase in the number of suitable areas in South America from the past to present. We found that the phylogeographic patterns observed in C. hominivorax cannot be explained only by climatic oscillations and can be connected to host population histories. Interestingly we found these patterns are very coincident with general patterns of ancient human movements in the Americas, suggesting that humans might have played a crucial role in shaping the distribution and population structure of this insect pest. This work presents the first hypothesis test regarding the processes that shaped the current phylogeographic structure of C. hominivorax and represents an alternate perspective on investigating the problem of insect pests. PMID:24098436
A Bayesian Analysis of Finite Mixtures in the LISREL Model.
Zhu, Hong-Tu; Lee, Sik-Yum
2001-01-01
Proposes a Bayesian framework for estimating finite mixtures of the LISREL model. The model augments the observed data of the manifest variables with the latent variables and allocation variables and uses the Gibbs sampler to obtain the Bayesian solution. Discusses other associated statistical inferences. (SLD)
Multivariate meta-analysis of mixed outcomes: a Bayesian approach
Bujkiewicz, Sylwia; Thompson, John R; Sutton, Alex J; Cooper, Nicola J; Harrison, Mark J; Symmons, Deborah PM; Abrams, Keith R
2013-01-01
Multivariate random effects meta-analysis (MRMA) is an appropriate way for synthesizing data from studies reporting multiple correlated outcomes. In a Bayesian framework, it has great potential for integrating evidence from a variety of sources. In this paper, we propose a Bayesian model for MRMA of mixed outcomes, which extends previously developed bivariate models to the trivariate case and also allows for combination of multiple outcomes that are both continuous and binary. We have constructed informative prior distributions for the correlations by using external evidence. Prior distributions for the within-study correlations were constructed by employing external individual patent data and using a double bootstrap method to obtain the correlations between mixed outcomes. The between-study model of MRMA was parameterized in the form of a product of a series of univariate conditional normal distributions. This allowed us to place explicit prior distributions on the between-study correlations, which were constructed using external summary data. Traditionally, independent ‘vague’ prior distributions are placed on all parameters of the model. In contrast to this approach, we constructed prior distributions for the between-study model parameters in a way that takes into account the inter-relationship between them. This is a flexible method that can be extended to incorporate mixed outcomes other than continuous and binary and beyond the trivariate case. We have applied this model to a motivating example in rheumatoid arthritis with the aim of incorporating all available evidence in the synthesis and potentially reducing uncertainty around the estimate of interest. © 2013 The Authors. Statistics inMedicine Published by John Wiley & Sons, Ltd. PMID:23630081
Ungvári, Ildikó; Hullám, Gábor; Antal, Péter; Kiszel, Petra Sz; Gézsi, András; Hadadi, Éva; Virág, Viktor; Hajós, Gergely; Millinghoffer, András; Nagy, Adrienne; Kiss, András; Semsei, Ágnes F; Temesi, Gergely; Melegh, Béla; Kisfali, Péter; Széll, Márta; Bikov, András; Gálffy, Gabriella; Tamási, Lilla; Falus, András; Szalai, Csaba
2012-01-01
Genetic studies indicate high number of potential factors related to asthma. Based on earlier linkage analyses we selected the 11q13 and 14q22 asthma susceptibility regions, for which we designed a partial genome screening study using 145 SNPs in 1201 individuals (436 asthmatic children and 765 controls). The results were evaluated with traditional frequentist methods and we applied a new statistical method, called bayesian network based bayesian multilevel analysis of relevance (BN-BMLA). This method uses bayesian network representation to provide detailed characterization of the relevance of factors, such as joint significance, the type of dependency, and multi-target aspects. We estimated posteriors for these relations within the bayesian statistical framework, in order to estimate the posteriors whether a variable is directly relevant or its association is only mediated.With frequentist methods one SNP (rs3751464 in the FRMD6 gene) provided evidence for an association with asthma (OR = 1.43(1.2-1.8); p = 3×10(-4)). The possible role of the FRMD6 gene in asthma was also confirmed in an animal model and human asthmatics.In the BN-BMLA analysis altogether 5 SNPs in 4 genes were found relevant in connection with asthma phenotype: PRPF19 on chromosome 11, and FRMD6, PTGER2 and PTGDR on chromosome 14. In a subsequent step a partial dataset containing rhinitis and further clinical parameters was used, which allowed the analysis of relevance of SNPs for asthma and multiple targets. These analyses suggested that SNPs in the AHNAK and MS4A2 genes were indirectly associated with asthma. This paper indicates that BN-BMLA explores the relevant factors more comprehensively than traditional statistical methods and extends the scope of strong relevance based methods to include partial relevance, global characterization of relevance and multi-target relevance.
A Bayesian Analysis of the Cepheid Distance Scale
Barnes, Thomas G., III; Jefferys, W. H.; Berger, J. O.; Mueller, Peter J.; Orr, K.; Rodriguez, R.
2003-07-01
We develop and describe a Bayesian statistical analysis to solve the surface brightness equations for Cepheid distances and stellar properties. Our analysis provides a mathematically rigorous and objective solution to the problem, including immunity from Lutz-Kelker bias. We discuss the choice of priors, show the construction of the likelihood distribution, and give sampling algorithms in a Markov chain Monte Carlo approach for efficiently and completely sampling the posterior probability distribution. Our analysis averages over the probabilities associated with several models rather than attempting to pick the ``best model'' from several possible models. Using a sample of 13 Cepheids we demonstrate the method. We discuss diagnostics of the analysis and the effects of the astrophysical choices going into the model. We show that we can objectively model the order of Fourier polynomial fits to the light and velocity data. By comparison with theoretical models of Bono et al. we find that EU Tau and SZ Tau are overtone pulsators, most likely without convective overshoot. The period-radius and period-luminosity relations we obtain are shown to be compatible with those in the recent literature. Specifically, we find log()=(0.693+/-0.037)[log(P)-1.2]+(2.042+/-0.047) and v>=-(2.690+/-0.169)[log(P)-1.2]-(4.699+/-0.216).
Bayesian principal geodesic analysis for estimating intrinsic diffeomorphic image variability.
Zhang, Miaomiao; Fletcher, P Thomas
2015-10-01
In this paper, we present a generative Bayesian approach for estimating the low-dimensional latent space of diffeomorphic shape variability in a population of images. We develop a latent variable model for principal geodesic analysis (PGA) that provides a probabilistic framework for factor analysis in the space of diffeomorphisms. A sparsity prior in the model results in automatic selection of the number of relevant dimensions by driving unnecessary principal geodesics to zero. To infer model parameters, including the image atlas, principal geodesic deformations, and the effective dimensionality, we introduce an expectation maximization (EM) algorithm. We evaluate our proposed model on 2D synthetic data and the 3D OASIS brain database of magnetic resonance images, and show that the automatically selected latent dimensions from our model are able to reconstruct unobserved testing images with lower error than both linear principal component analysis (LPCA) in the image space and tangent space principal component analysis (TPCA) in the diffeomorphism space. Copyright © 2015 Elsevier B.V. All rights reserved.
[Cloning, expression and phylogenetic analysis of Schistosoma japonicum calcyphosine gene].
Ju, Chuan; Peng, Jian-xin; Xu, Bin; Wang, Wei; Feng, Zheng; Hu, Wei
2006-10-01
To clone and express Schistosoma japonicum (Sj) calcyphosine gene, and purify the expressed protein. The encoding sequence selected from Sj cDNA library was amplified by PCR. After subcloned into prokaryotic expression vector pET-28a, the expressed protein was purified with His -Tag affinity chromatography. Western blotting was used to detect the immunogenicity. The structure and functions of the protein were analyzed by bioinformatics method, and the phylogenetic tree of the protein was drawn. The recombinant protein was specifically recognized by the Sj infected rabbit serum. The bioinformatics analysis showed 4 EF-hand domains. Besides, it was predicted that Sj calcyphosine contains two phosphorylation sites for protein kinase C, eight phosphorylation sites for casein kinase II and one N-myristoylation site. The Sj calcyphosine belonged to type-II calcyphosine. The calcyphosine gene is a calcium-binding protein and might be a potential candidate for diagnosis, vaccine or drug target.
Bayesian Analysis of Evolutionary Divergence with Genomic Data Under Diverse Demographic Models.
Chung, Yujin; Hey, Jody
2017-02-25
We present a new Bayesian method for estimating demographic and phylogenetic history using population genomic data. Several key innovations are introduced that allow the study of diverse models within an Isolation with Migration framework. The new method implements a 2-step analysis, with an initial Markov chain Monte Carlo (MCMC) phase that samples simple coalescent trees, followed by the calculation of the joint posterior density for the parameters of a demographic model. In step 1, the MCMC sampling phase, the method uses a reduced state space, consisting of coalescent trees without migration paths, and a simple importance sampling distribution without the demography of interest. Once obtained, a single sample of trees can be used in step 2 to calculate the joint posterior density for model parameters under multiple diverse demographic models, without having to repeat MCMC runs. Because migration paths are not included in the state space of the MCMC phase, but rather are handled by analytic integration in step 2 of the analysis, the method is scalable to a large number of loci with excellent MCMC mixing properties. With an implementation of the new method in the computer program MIST, we demonstrate the method's accuracy, scalability and other advantages using simulated data and DNA sequences of two common chimpanzee subspecies: Pan troglodytes troglodytes (P. t.) and P. t. verus.
Phylogenetic and Structural Analysis of Polyketide Synthases in Aspergilli
Bhetariya, Preetida J.; Prajapati, Madhvi; Bhaduri, Asani; Mandal, Rahul Shubhra; Varma, Anupam; Madan, Taruna; Singh, Yogendra; Sarma, P. Usha
2016-01-01
Polyketide synthases (PKSs) of Aspergillus species are multidomain and multifunctional megaenzymes that play an important role in the synthesis of diverse polyketide compounds. Putative PKS protein sequences from Aspergillus species representing medically, agriculturally, and industrially important Aspergillus species were chosen and screened for in silico studies. Six candidate Aspergillus species, Aspergillus fumigatus Af293, Aspergillus flavus NRRL3357, Aspergillus niger CBS 513.88, Aspergillus terreus NIH2624, Aspergillus oryzae RIB40, and Aspergillus clavatus NRRL1, were selected to study the PKS phylogeny. Full-length PKS proteins and only ketosynthase (KS) domain sequence were retrieved for independent phylogenetic analysis from the aforementioned species, and phylogenetic analysis was performed with characterized fungal PKS. This resulted into grouping of Aspergilli PKSs into nonreducing (NR), partially reducing (PR), and highly reducing (HR) PKS enzymes. Eight distinct clades with unique domain arrangements were classified based on homology with functionally characterized PKS enzymes. Conserved motif signatures corresponding to each type of PKS were observed. Three proteins from Protein Data Bank corresponding to NR, PR, and HR type of PKS (XP_002384329.1, XP_753141.2, and XP_001402408.2, respectively) were selected for mapping of conserved motifs on three-dimensional structures of KS domain. Structural variations were found at the active sites on modeled NR, PR, and HR enzymes of Aspergillus. It was observed that the number of iteration cycles was dependent on the size of the cavity in the active site of the PKS enzyme correlating with a type with reducing or NR products, such as pigment, 6MSA, and lovastatin. The current study reports the grouping and classification of PKS proteins of Aspergilli for possible exploration of novel polyketides based on sequence homology; this information can be useful for selection of PKS for polyketide exploration and
Phylogenetic analysis of dissimilatory Fe(III)-reducing bacteria
Lonergan, D.J.; Jenter, H.L.; Coates, J.D.; Phillips, E.J.P.; Schmidt, T.M.; Lovley, D.R.
1996-01-01
Evolutionary relationships among strictly anaerobic dissimilatory Fe(III)- reducing bacteria obtained from a diversity of sedimentary environments were examined by phylogenetic analysis of 16S rRNA gene sequences. Members of the genera Geobacter, Desulfuromonas, Pelobacter, and Desulfuromusa formed a monophyletic group within the delta subdivision of the class Proteobacteria. On the basis of their common ancestry and the shared ability to reduce Fe(III) and/or S0, we propose that this group be considered a single family, Geobacteraceae. Bootstrap analysis, characteristic nucleotides, and higher- order secondary structures support the division of Geobacteraceae into two subgroups, designated the Geobacter and Desulfuromonas clusters. The genus Desulfuromusa and Pelobacter acidigallici make up a distinct branch with the Desulfuromonas cluster. Several members of the family Geobacteraceae, none of which reduce sulfate, were found to contain the target sequences of probes that have been previously used to define the distribution of sulfate-reducing bacteria and sulfate-reducing bacterium-like microorganisms. The recent isolations of Fe(III)-reducing microorganisms distributed throughout the domain Bacteria suggest that development of 16S rRNA probes that would specifically target all Fe(III) reducers may not be feasible. However, all of the evidence suggests that if a 16S rRNA sequence falls within the family Geobacteraceae, then the organism has the capacity for Fe(III) reduction. The suggestion, based on geological evidence, that Fe(III) reduction was the first globally significant process for oxidizing organic matter back to carbon dioxide is consistent with the finding that acetate-oxidizing Fe(III) reducers are phylogenetically diverse.
Distribution and Phylogenetic Analysis of Family 19 Chitinases in Actinobacteria
Kawase, Tomokazu; Saito, Akihiro; Sato, Toshiya; Kanai, Ryo; Fujii, Takeshi; Nikaidou, Naoki; Miyashita, Kiyotaka; Watanabe, Takeshi
2004-01-01
In organisms other than higher plants, family 19 chitinase was first discovered in Streptomyces griseus HUT6037, and later, the general occurrence of this enzyme in Streptomyces species was demonstrated. In the present study, the distribution of family 19 chitinases in the class Actinobacteria and the phylogenetic relationship of Actinobacteria family 19 chitinases with family 19 chitinases of other organisms were investigated. Forty-nine strains were chosen to cover almost all the suborders of the class Actinobacteria, and chitinase production was examined. Of the 49 strains, 22 formed cleared zones on agar plates containing colloidal chitin and thus appeared to produce chitinases. These 22 chitinase-positive strains were subjected to Southern hybridization analysis by using a labeled DNA fragment corresponding to the catalytic domain of ChiC, and the presence of genes similar to chiC of S. griseus HUT6037 in at least 13 strains was suggested by the results. PCR amplification and sequencing of the DNA fragments corresponding to the major part of the catalytic domains of the family 19 chitinase genes confirmed the presence of family 19 chitinase genes in these 13 strains. The strains possessing family 19 chitinase genes belong to 6 of the 10 suborders in the order Actinomycetales, which account for the greatest part of the Actinobacteria. Phylogenetic analysis suggested that there is a close evolutionary relationship between family 19 chitinases found in Actinobacteria and plant class IV chitinases. The general occurrence of family 19 chitinase genes in Streptomycineae and the high sequence similarity among the genes found in Actinobacteria suggest that the family 19 chitinase gene was first acquired by an ancestor of the Streptomycineae and spread among the Actinobacteria through horizontal gene transfer. PMID:14766598
2013-01-01
Background Dendropsophus is a monophyletic anuran genus with a diploid number of 30 chromosomes as an important synapomorphy. However, the internal phylogenetic relationships of this genus are poorly understood. Interestingly, an intriguing interspecific variation in the telocentric chromosome number has been useful in species identification. To address certain uncertainties related to one of the species groups of Dendropsophus, the D. microcephalus group, we carried out a cytogenetic analysis combined with phylogenetic inferences based on mitochondrial sequences, which aimed to aid in the analysis of chromosomal characters. Populations of Dendropsophus nanus, Dendropsophus walfordi, Dendropsophus sanborni, Dendropsophus jimi and Dendropsophus elianeae, ranging from the extreme south to the north of Brazil, were cytogenetically compared. A mitochondrial region of the ribosomal 12S gene from these populations, as well as from 30 other species of Dendropsophus, was used for the phylogenetic inferences. Phylogenetic relationships were inferred using maximum parsimony and Bayesian analyses. Results The species D. nanus and D. walfordi exhibited identical karyotypes (2n = 30; FN = 52), with four pairs of telocentric chromosomes and a NOR located on metacentric chromosome pair 13. In all of the phylogenetic hypotheses, the paraphyly of D. nanus and D. walfordi was inferred. D. sanborni from Botucatu-SP and Torres-RS showed the same karyotype as D. jimi, with 5 pairs of telocentric chromosomes (2n = 30; FN = 50) and a terminal NOR in the long arm of the telocentric chromosome pair 12. Despite their karyotypic similarity, these species were not found to compose a monophyletic group. Finally, the phylogenetic and cytogenetic analyses did not cluster the specimens of D. elianeae according to their geographical occurrence or recognized morphotypes. Conclusions We suggest that a taxonomic revision of the taxa D. nanus and D. walfordi is quite necessary. We also
Zhou, Tai-Cheng; Sha, Tao; Irwin, David M; Zhang, Ya-Ping
2015-01-01
Pavo cristatus, known as the Indian peafowl, is endemic to India and Sri Lanka and has been domesticated for its ornamental and food value. However, its phylogenetic status is still debated. Here, to clarify the phylogenetic status of P. cristatus within Phasianidae, we analyzed its mitochondrial genome (mtDNA). The complete mitochondrial DNA (mtDNA) genome was determined using 34 pairs of primers. Our data show that the mtDNA genome of P. cristatus is 16,686 bp in length. Molecular phylogenetic analyses of P. cristatus was performed along with 22 complete mtDNA genomes belonging to other species in Phasianidae using Bayesian and maximum likelihood methods, where Aythya americana and Anas platyrhynchos were used as outgroups. Our results show that P. critatus has its closest genetic affinity with Pavo muticus and belongs to clade that contains Gallus, Bambusicola and Francolinus.
Buckley, Christopher D.
2012-01-01
The warp ikat method of making decorated textiles is one of the most geographically widespread in southeast Asia, being used by Austronesian peoples in Indonesia, Malaysia and the Philippines, and Daic peoples on the Asian mainland. In this study a dataset consisting of the decorative characters of 36 of these warp ikat weaving traditions is investigated using Bayesian and Neighbornet techniques, and the results are used to construct a phylogenetic tree and taxonomy for warp ikat weaving in southeast Asia. The results and analysis show that these diverse traditions have a common ancestor amongst neolithic cultures the Asian mainland, and parallels exist between the patterns of textile weaving descent and linguistic phylogeny for the Austronesian group. Ancestral state analysis is used to reconstruct some of the features of the ancestral weaving tradition. The widely held theory that weaving motifs originated in the late Bronze Age Dong-Son culture is shown to be inconsistent with the data. PMID:23272211
Buckley, Christopher D
2012-01-01
The warp ikat method of making decorated textiles is one of the most geographically widespread in southeast Asia, being used by Austronesian peoples in Indonesia, Malaysia and the Philippines, and Daic peoples on the Asian mainland. In this study a dataset consisting of the decorative characters of 36 of these warp ikat weaving traditions is investigated using Bayesian and Neighbornet techniques, and the results are used to construct a phylogenetic tree and taxonomy for warp ikat weaving in southeast Asia. The results and analysis show that these diverse traditions have a common ancestor amongst neolithic cultures the Asian mainland, and parallels exist between the patterns of textile weaving descent and linguistic phylogeny for the Austronesian group. Ancestral state analysis is used to reconstruct some of the features of the ancestral weaving tradition. The widely held theory that weaving motifs originated in the late Bronze Age Dong-Son culture is shown to be inconsistent with the data.
Hörandl, Elvira; Paun, Ovidiu; Johansson, Jan T; Lehnebach, Carlos; Armstrong, Tristan; Chen, Lixue; Lockhart, Peter
2005-08-01
Ranunculus is a large genus with a worldwide distribution. Phylogenetic analyses of c. 200 species of Ranunculus s.l. based on sequences of the nrITS using maximum parsimony and Bayesian inference yielded high congruence with previous cpDNA restriction site analyses, but strongly contradict previous classifications. A large core clade including Ranunculus subg. Ranunculus, subg. Batrachium, subg. Crymodes p.p., Ceratocephala, Myosurus, and Aphanostemma is separated from R. subg. Ficaria, subg. Pallasiantha, subg. Coptidium, subg. Crymodes p.p., Halerpestes, Peltocalathos, Callianthemoides, and Arcteranthis. Within the core clade, 19 clades can be described with morphological and karyological features. Several sections are not monophyletic. Parallel evolution of morphological characters in adaptation to climatic conditions may be a reason for incongruence of molecular data and morphology-based classifications. In some mountainous regions, groups of closely related species may have originated from adaptive radiation and rapid speciation. Split decomposition analysis indicated complex patterns of relationship and suggested hybridization in the apomictic R. auricomus complex, R. subg. Batrachium, and the white-flowering European alpines. The evolutionary success of the genus might be due to a combination of morphological plasticity and adaptations, hybridization and polyploidy as important factors for regional diversification, and a broad range of reproductive strategies.
Rex, Martina; Schulte, Katharina; Zizka, Georg; Peters, Jule; Vásquez, Roberto; Ibisch, Pierre L; Weising, Kurt
2009-06-01
The about 31 species of Fosterella L.B. Sm. (Bromeliaceae) are terrestrial herbs with a centre of diversity in the central South American Andes. To resolve infra- and intergeneric relationships among Fosterella and their putative allies, we conducted a phylogenetic analysis based on sequence data from four chloroplast DNA regions (matK gene, rps16 intron, atpB-rbcL and psbB-psbH intergenic spacers). Sequences were generated for 96 accessions corresponding to 60 species from 18 genera. Among these, 57 accessions represented 22 of the 31 recognized Fosterella species and one undescribed morphospecies. Maximum parsimony and Bayesian inference methods yielded well-resolved phylogenies. The monophyly of Fosterella was strongly supported, as was its sister relationship with a clade comprising Deuterocohnia, Dyckia and Encholirium. Six distinct evolutionary lineages were distinguished within Fosterella. Character mapping indicated that parallel evolution of identical character states is common in the genus. Relationships between species and lineages are discussed in the context of morphological, ecological and biogeographical data as well as the results of a previous amplified fragment length polymorphism (AFLP) study.
Bayesian Inference for NASA Probabilistic Risk and Reliability Analysis
Dezfuli, Homayoon; Kelly, Dana; Smith, Curtis; Vedros, Kurt; Galyean, William
2009-01-01
This document, Bayesian Inference for NASA Probabilistic Risk and Reliability Analysis, is intended to provide guidelines for the collection and evaluation of risk and reliability-related data. It is aimed at scientists and engineers familiar with risk and reliability methods and provides a hands-on approach to the investigation and application of a variety of risk and reliability data assessment methods, tools, and techniques. This document provides both: A broad perspective on data analysis collection and evaluation issues. A narrow focus on the methods to implement a comprehensive information repository. The topics addressed herein cover the fundamentals of how data and information are to be used in risk and reliability analysis models and their potential role in decision making. Understanding these topics is essential to attaining a risk informed decision making environment that is being sought by NASA requirements and procedures such as 8000.4 (Agency Risk Management Procedural Requirements), NPR 8705.05 (Probabilistic Risk Assessment Procedures for NASA Programs and Projects), and the System Safety requirements of NPR 8715.3 (NASA General Safety Program Requirements).
BEAST 2: a software platform for Bayesian evolutionary analysis.
Bouckaert, Remco; Heled, Joseph; Kühnert, Denise; Vaughan, Tim; Wu, Chieh-Hsi; Xie, Dong; Suchard, Marc A; Rambaut, Andrew; Drummond, Alexei J
2014-04-01
We present a new open source, extensible and flexible software platform for Bayesian evolutionary analysis called BEAST 2. This software platform is a re-design of the popular BEAST 1 platform to correct structural deficiencies that became evident as the BEAST 1 software evolved. Key among those deficiencies was the lack of post-deployment extensibility. BEAST 2 now has a fully developed package management system that allows third party developers to write additional functionality that can be directly installed to the BEAST 2 analysis platform via a package manager without requiring a new software release of the platform. This package architecture is showcased with a number of recently published new models encompassing birth-death-sampling tree priors, phylodynamics and model averaging for substitution models and site partitioning. A second major improvement is the ability to read/write the entire state of the MCMC chain to/from disk allowing it to be easily shared between multiple instances of the BEAST software. This facilitates checkpointing and better support for multi-processor and high-end computing extensions. Finally, the functionality in new packages can be easily added to the user interface (BEAUti 2) by a simple XML template-based mechanism because BEAST 2 has been re-designed to provide greater integration between the analysis engine and the user interface so that, for example BEAST and BEAUti use exactly the same XML file format.
BEAST 2: A Software Platform for Bayesian Evolutionary Analysis
Bouckaert, Remco; Heled, Joseph; Kühnert, Denise; Vaughan, Tim; Wu, Chieh-Hsi; Xie, Dong; Suchard, Marc A.; Rambaut, Andrew; Drummond, Alexei J.
Bayesian survival analysis in clinical trials: What methods are used in practice?
Brard, Caroline; Le Teuff, Gwénaël; Le Deley, Marie-Cécile; Hampson, Lisa V
2017-02-01
Background Bayesian statistics are an appealing alternative to the traditional frequentist approach to designing, analysing, and reporting of clinical trials, especially in rare diseases. Time-to-event endpoints are widely used in many medical fields. There are additional complexities to designing Bayesian survival trials which arise from the need to specify a model for the survival distribution. The objective of this article was to critically review the use and reporting of Bayesian methods in survival trials. Methods A systematic review of clinical trials using Bayesian survival analyses was performed through PubMed and Web of Science databases. This was complemented by a full text search of the online repositories of pre-selected journals. Cost-effectiveness, dose-finding studies, meta-analyses, and methodological papers using clinical trials were excluded. Results In total, 28 articles met the inclusion criteria, 25 were original reports of clinical trials and 3 were re-analyses of a clinical trial. Most trials were in oncology (n = 25), were randomised controlled (n = 21) phase III trials (n = 13), and half considered a rare disease (n = 13). Bayesian approaches were used for monitoring in 14 trials and for the final analysis only in 14 trials. In the latter case, Bayesian survival analyses were used for the primary analysis in four cases, for the secondary analysis in seven cases, and for the trial re-analysis in three cases. Overall, 12 articles reported fitting Bayesian regression models (semi-parametric, n = 3; parametric, n = 9). Prior distributions were often incompletely reported: 20 articles did not define the prior distribution used for the parameter of interest. Over half of the trials used only non-informative priors for monitoring and the final analysis (n = 12) when it was specified. Indeed, no articles fitting Bayesian regression models placed informative priors on the parameter of interest. The prior for the treatment
A computational analysis of the neural bases of Bayesian inference.
Kolossa, Antonio; Kopp, Bruno; Fingscheidt, Tim
2015-02-01
Empirical support for the Bayesian brain hypothesis, although of major theoretical importance for cognitive neuroscience, is surprisingly scarce. This hypothesis posits simply that neural activities code and compute Bayesian probabilities. Here, we introduce an urn-ball paradigm to relate event-related potentials (ERPs) such as the P300 wave to Bayesian inference. Bayesian model comparison is conducted to compare various models in terms of their ability to explain trial-by-trial variation in ERP responses at different points in time and over different regions of the scalp. Specifically, we are interested in dissociating specific ERP responses in terms of Bayesian updating and predictive surprise. Bayesian updating refers to changes in probability distributions given new observations, while predictive surprise equals the surprise about observations under current probability distributions. Components of the late positive complex (P3a, P3b, Slow Wave) provide dissociable measures of Bayesian updating and predictive surprise. Specifically, the updating of beliefs about hidden states yields the best fit for the anteriorly distributed P3a, whereas the updating of predictions of observations accounts best for the posteriorly distributed Slow Wave. In addition, parietally distributed P3b responses are best fit by predictive surprise. These results indicate that the three components of the late positive complex reflect distinct neural computations. As such they are consistent with the Bayesian brain hypothesis, but these neural computations seem to be subject to nonlinear probability weighting. We integrate these findings with the free-energy principle that instantiates the Bayesian brain hypothesis. Copyright © 2014 Elsevier Inc. All rights reserved.
Phylogenetic analysis of bovine astrovirus in Korean cattle.
Oem, Jae-Ku; An, Dong-Jun
2014-04-01
Bovine astrovirus (BAstV) belongs to a genetically divergent lineage within the genus Mamastrovirus. The present study showed that BAstV was associated with the gastroenteric tracts of cattle in nine positive fecal samples from 115 cattle, whereas no positive samples were found in the brain tissues of 14 downer cattle. Interestingly, the positive diarrheal samples were obtained mainly from calves aged 14 days-3 months. Bayesian inference tree analysis of the partial ORF1ab and capsid (ORF2) gene sequences of BAstVs identified four divergent groups. Eleven BAstVs, four porcine astroviruses, and two deer astroviruses (DAstVs; CcAstV-1 and -2) belonged to group 1; group 2 contained two BAstVs (BAstK08-51 and BAstK10-96) with another two in group 3 (BAstK08-2 and BAstK08-53); and group 4 comprised the BAstV-NeuroS1 strain derived from a cattle brain tissue sample and an ovine astrovirus. The same divergent groups were obtained when the pairwise alignments were produced using both amino acid and nucleotide sequences. The Korean BAstVs isolated from infected cattle had a nationwide distribution and they belonged to groups 1, 2, and 3.
A phylogenetic analysis of rissooidean and cingulopsoidean families (Gastropoda: Caenogastropoda).
Criscione, Francesco; Ponder, Winston Frank
2013-03-01
The Rissooidea is one of the largest and most diverse molluscan superfamilies, with 23 recognized Recent families including marine, freshwater and terrestrial members. The Cingulopsoidea are a group of three marine families previously included within the Rissooidea. A previous molecular analysis including two rissooideans and one cingulopsoidean, indicated the possibility that the Rissooidea is at least diphyletic. We use new molecular data to investigate the polyphyly of Rissooidea and test the monophyly of Cingulopsoidea with a greatly increased taxon set. This study includes the greatest sampling to date with 43 species of 14 families of Rissooidea and all families of Cingulopsoidea. Bayesian and maximum likelihood analyses of 16S and 28S show that there are two major clades encompassing taxa previously included in Rissooidea. These are the Rissooidea s.s. containing Rissoidae and Barleeiidae and the Truncatelloidea containing Anabathridae, Assimineidae, Falsicingulidae, Truncatellidae, Pomatiopsidae, Hydrobiidae s.l., Hydrococcidae, Stenothyridae, Calopiidae, Clenchiellidae, Caecidae, Tornidae, and Iravadiidae. Rissoidae is not monophyletic, with Lironoba grouping with Emblanda (Emblandidae) and Rissoina forming a separate clade with Barleeiidae. Iravadiidae is not monophyletic, with Nozeba being sister to the Tornidae. Tatea, usually included within Hydrobiidae, is distinct from that family and Nodulus, previously included in Anabathridae, groups with the hydrobiids. Copyright © 2012 Elsevier Inc. All rights reserved.
2016-01-01
The Fayum Depression of Egypt has yielded fossils of hystricognathous rodents from multiple Eocene and Oligocene horizons that range in age from ∼37 to ∼30 Ma and document several phases in the early evolution of crown Hystricognathi and one of its major subclades, Phiomorpha. Here we describe two new genera and species of basal phiomorphs, Birkamys korai and Mubhammys vadumensis, based on rostra and maxillary and mandibular remains from the terminal Eocene (∼34 Ma) Fayum Locality 41 (L-41). Birkamys is the smallest known Paleogene hystricognath, has very simple molars, and, like derived Oligocene-to-Recent phiomorphs (but unlike contemporaneous and older taxa) apparently retained dP4∕4 late into life, with no evidence for P4∕4 eruption or formation. Mubhammys is very similar in dental morphology to Birkamys, and also shows no evidence for P4∕4 formation or eruption, but is considerably larger. Though parsimony analysis with all characters equally weighted places Birkamys and Mubhammys as sister taxa of extant Thryonomys to the exclusion of much younger relatives of that genus, all other methods (standard Bayesian inference, Bayesian “tip-dating,” and parsimony analysis with scaled transitions between “fixed” and polymorphic states) place these species in more basal positions within Hystricognathi, as sister taxa of Oligocene-to-Recent phiomorphs. We also employ tip-dating as a means for estimating the ages of early hystricognath-bearing localities, many of which are not well-constrained by geological, geochronological, or biostratigraphic evidence. By simultaneously taking into account phylogeny, evolutionary rates, and uniform priors that appropriately encompass the range of possible ages for fossil localities, dating of tips in this Bayesian framework allows paleontologists to move beyond vague and assumption-laden “stage of evolution” arguments in biochronology to provide relatively rigorous age assessments of poorly-constrained faunas
Sallam, Hesham M; Seiffert, Erik R
2016-01-01
The Fayum Depression of Egypt has yielded fossils of hystricognathous rodents from multiple Eocene and Oligocene horizons that range in age from ∼37 to ∼30 Ma and document several phases in the early evolution of crown Hystricognathi and one of its major subclades, Phiomorpha. Here we describe two new genera and species of basal phiomorphs, Birkamys korai and Mubhammys vadumensis, based on rostra and maxillary and mandibular remains from the terminal Eocene (∼34 Ma) Fayum Locality 41 (L-41). Birkamys is the smallest known Paleogene hystricognath, has very simple molars, and, like derived Oligocene-to-Recent phiomorphs (but unlike contemporaneous and older taxa) apparently retained dP(4)∕4 late into life, with no evidence for P(4)∕4 eruption or formation. Mubhammys is very similar in dental morphology to Birkamys, and also shows no evidence for P(4)∕4 formation or eruption, but is considerably larger. Though parsimony analysis with all characters equally weighted places Birkamys and Mubhammys as sister taxa of extant Thryonomys to the exclusion of much younger relatives of that genus, all other methods (standard Bayesian inference, Bayesian "tip-dating," and parsimony analysis with scaled transitions between "fixed" and polymorphic states) place these species in more basal positions within Hystricognathi, as sister taxa of Oligocene-to-Recent phiomorphs. We also employ tip-dating as a means for estimating the ages of early hystricognath-bearing localities, many of which are not well-constrained by geological, geochronological, or biostratigraphic evidence. By simultaneously taking into account phylogeny, evolutionary rates, and uniform priors that appropriately encompass the range of possible ages for fossil localities, dating of tips in this Bayesian framework allows paleontologists to move beyond vague and assumption-laden "stage of evolution" arguments in biochronology to provide relatively rigorous age assessments of poorly-constrained faunas. This
Phylogenetic Analysis of Mitochondrial Outer Membrane β-Barrel Channels
Wojtkowska, Małgorzata; Jąkalski, Marcin; Pieńkowska, Joanna R.; Stobienia, Olgierd; Karachitos, Andonis; Przytycka, Teresa M.; Weiner, January; Kmita, Hanna; Makałowski, Wojciech
2012-01-01
Transport of molecules across mitochondrial outer membrane is pivotal for a proper function of mitochondria. The transport pathways across the membrane are formed by ion channels that participate in metabolite exchange between mitochondria and cytoplasm (voltage-dependent anion-selective channel, VDAC) as well as in import of proteins encoded by nuclear genes (Tom40 and Sam50/Tob55). VDAC, Tom40, and Sam50/Tob55 are present in all eukaryotic organisms, encoded in the nuclear genome, and have β-barrel topology. We have compiled data sets of these protein sequences and studied their phylogenetic relationships with a special focus on the position of Amoebozoa. Additionally, we identified these protein-coding genes in Acanthamoeba castellanii and Dictyostelium discoideum to complement our data set and verify the phylogenetic position of these model organisms. Our analysis show that mitochondrial β-barrel channels from Archaeplastida (plants) and Opisthokonta (animals and fungi) experienced many duplication events that resulted in multiple paralogous isoforms and form well-defined monophyletic clades that match the current model of eukaryotic evolution. However, in representatives of Amoebozoa, Chromalveolata, and Excavata (former Protista), they do not form clearly distinguishable clades, although they locate basally to the plant and algae branches. In most cases, they do not posses paralogs and their sequences appear to have evolved quickly or degenerated. Consequently, the obtained phylogenies of mitochondrial outer membrane β-channels do not entirely reflect the recent eukaryotic classification system involving the six supergroups: Chromalveolata, Excavata, Archaeplastida, Rhizaria, Amoebozoa, and Opisthokonta. PMID:22155732
Dembo, Mana; Matzke, Nicholas J; Mooers, Arne Ø; Collard, Mark
2015-08-07
The phylogenetic relationships of several hominin species remain controversial. Two methodological issues contribute to the uncertainty-use of partial, inconsistent datasets and reliance on phylogenetic methods that are ill-suited to testing competing hypotheses. Here, we report a study designed to overcome these issues. We first compiled a supermatrix of craniodental characters for all widely accepted hominin species. We then took advantage of recently developed Bayesian methods for building trees of serially sampled tips to test among hypotheses that have been put forward in three of the most important current debates in hominin phylogenetics--the relationship between Australopithecus sediba and Homo, the taxonomic status of the Dmanisi hominins, and the place of the so-called hobbit fossils from Flores, Indonesia, in the hominin tree. Based on our results, several published hypotheses can be statistically rejected. For example, the data do not support the claim that Dmanisi hominins and all other early Homo specimens represent a single species, nor that the hobbit fossils are the remains of small-bodied modern humans, one of whom had Down syndrome. More broadly, our study provides a new baseline dataset for future work on hominin phylogeny and illustrates the promise of Bayesian approaches for understanding hominin phylogenetic relationships.
Bayesian network models in brain functional connectivity analysis
Zhang, Sheng; Li, Chiang-shan R.
2013-01-01
Much effort has been made to better understand the complex integration of distinct parts of the human brain using functional magnetic resonance imaging (fMRI). Altered functional connectivity between brain regions is associated with many neurological and mental illnesses, such as Alzheimer and Parkinson diseases, addiction, and depression. In computational science, Bayesian networks (BN) have been used in a broad range of studies to model complex data set in the presence of uncertainty and when expert prior knowledge is needed. However, little is done to explore the use of BN in connectivity analysis of fMRI data. In this paper, we present an up-to-date literature review and methodological details of connectivity analyses using BN, while highlighting caveats in a real-world application. We present a BN model of fMRI dataset obtained from sixty healthy subjects performing the stop-signal task (SST), a paradigm widely used to investigate response inhibition. Connectivity results are validated with the extant literature including our previous studies. By exploring the link strength of the learned BN’s and correlating them to behavioral performance measures, this novel use of BN in connectivity analysis provides new insights to the functional neural pathways underlying response inhibition. PMID:24319317
Bayesian Model Selection with Network Based Diffusion Analysis
Whalen, Andrew; Hoppitt, William J. E.
2016-01-01
A number of recent studies have used Network Based Diffusion Analysis (NBDA) to detect the role of social transmission in the spread of a novel behavior through a population. In this paper we present a unified framework for performing NBDA in a Bayesian setting, and demonstrate how the Watanabe Akaike Information Criteria (WAIC) can be used for model selection. We present a specific example of applying this method to Time to Acquisition Diffusion Analysis (TADA). To examine the robustness of this technique, we performed a large scale simulation study and found that NBDA using WAIC could recover the correct model of social transmission under a wide range of cases, including under the presence of random effects, individual level variables, and alternative models of social transmission. This work suggests that NBDA is an effective and widely applicable tool for uncovering whether social transmission underpins the spread of a novel behavior, and may still provide accurate results even when key model assumptions are relaxed. PMID:27092089
A procedure for seiche analysis with Bayesian information criterion
Aichi, Masaatsu
2016-04-01
Seiche is a standing wave in enclosed or semi-enclosed water body. Its amplitude irregularly changes in time due to weather condition etc. Then, extracting seiche signal is not easy by usual methods for time series analysis such as fast Fourier transform (FFT). In this study, a new method for time series analysis with Bayesian information criterion was developed to decompose seiche, tide, long-term trend and residual components from time series data of tide stations. The method was developed based on the maximum marginal likelihood estimation of tide amplitudes, seiche amplitude, and trend components. Seiche amplitude and trend components were assumed that they gradually changes as second derivative in time was close to zero. These assumptions were incorporated as prior distributions. The variances of prior distributions were estimated by minimizing Akaike-Bayes information criterion (ABIC). The frequency of seiche was determined by Newton method with initial guess by FFT. The accuracy of proposed method was checked by analyzing synthetic time series data composed of known components. The reproducibility of the original components was quite well. The proposed method was also applied to the actual time series data of sea level observed by tide station and the strain of coastal rock masses observed by fiber Bragg grating sensor in Aburatsubo Bay, Japan. The seiche in bay and its response of rock masses were successfully extracted.
Evans, Margaret E K; Hearn, David J; Hahn, William J; Spangle, Jennifer M; Venable, D Lawrence
2005-09-01
Evolutionary ecologists have long sought to understand the conditions under which perennial (iteroparous) versus annual (semelparous) plant life histories are favored. We evaluated the idea that aridity and variation in the length of droughts should favor the evolution of an annual life history, both by decreasing adult survival and by increasing the potential for high seedling survival via reduced plant cover. We calculated phylogenetically independent contrasts of climate with respect to life history in a clade of winter-establishing evening primroses (sections Anogra and Kleinia; Oenothera; Onagraceae), which includes seven annuals, 12 perennials, and two variable taxa. Climate variables were quantified from long-term records at weather stations near collection localities. To explicitly account for phylogenetic uncertainty, contrasts were calculated on a random sample of phylogenetic trees from the posterior distribution of a Bayesian analysis of DNA sequence data. Statements of association are based on comparing the per-tree mean contrast, which has a null expectation of zero, to a set of per-tree mean contrasts calculated on the same trees, after randomizing the climate data. As predicted, increased annual aridity, increased annual potential evapotranspiration, and decreased annual precipitation were associated with transitions to the annual habit, but these trends were not significantly different from the null pattern. Transitions to the annual habit were not significantly associated with increases in one measure of aridity in summer nor with increased summer drought, but they were associated with significantly increased maximum summer temperatures. In winter, increased aridity and decreased precipitation were significantly associated with transitions to the annual habit. Changes in life history were not significantly associated with changes in the coefficient of variation of precipitation, either on an annual or seasonal (summer vs. winter) basis. Though we
Hepatitis E Virus Circulation in Italy: Phylogenetic and Evolutionary Analysis
Montesano, Carla; Giovanetti, Marta; Ciotti, Marco; Cella, Eleonora; Lo Presti, Alessandra; Grifoni, Alba; Zehender, Gianguglielmo; Angeletti, Silvia; Ciccozzi, Massimo
2016-01-01
Background Hepatitis E virus (HEV), a major cause of acute viral hepatitis in developing countries, has been classified into four main genotypes and a number of subtypes. New genotypes have been recently identified in various mammals, including HEV genotype 3, which has a worldwide distribution. It is widespread among pigs in developed countries. Objectives This study investigated the genetic diversity of HEV among humans and swine in Italy. The date of origin and the demographic history of the HEV were also estimated. Materials and Methods A total of 327 HEV sequences of swine and humans from Italy were downloaded from the national centre for biotechnology information. Three different data sets were constructed. The first and the second data set were used to confirm the genotype of the sequences analyzed. The third data set was used to estimate the mean evolutionary rate and to determine the time-scaled phylogeny and demographic history. Results The Bayesian maximum clade credibility tree and the time of the most common recent ancestor estimates showed that the root of the tree dated back to the year 1907 (95% HPD: 1811 - 1975). Two main clades were found, divided into two subclades. Skyline plot analysis, performed separately for human and swine sequences, demonstrated the presence of a bottleneck only in the skyline plot from the swine sequences. Selective pressure analysis revealed only negatively selected sites. Conclusions This study provides support for the hypothesis that humans are probably infected after contact with swine sources. The findings emphasize the importance of checking the country of origin of swine and of improving sanitary control measures from the veterinary standpoint to prevent the spread of HEV infection in Italy. PMID:27226798
Using Bayesian analysis in repeated preclinical in vivo studies for a more effective use of animals.
Walley, Rosalind; Sherington, John; Rastrick, Joe; Detrait, Eric; Hanon, Etienne; Watt, Gillian
2016-05-01
Whilst innovative Bayesian approaches are increasingly used in clinical studies, in the preclinical area Bayesian methods appear to be rarely used in the reporting of pharmacology data. This is particularly surprising in the context of regularly repeated in vivo studies where there is a considerable amount of data from historical control groups, which has potential value. This paper describes our experience with introducing Bayesian analysis for such studies using a Bayesian meta-analytic predictive approach. This leads naturally either to an informative prior for a control group as part of a full Bayesian analysis of the next study or using a predictive distribution to replace a control group entirely. We use quality control charts to illustrate study-to-study variation to the scientists and describe informative priors in terms of their approximate effective numbers of animals. We describe two case studies of animal models: the lipopolysaccharide-induced cytokine release model used in inflammation and the novel object recognition model used to screen cognitive enhancers, both of which show the advantage of a Bayesian approach over the standard frequentist analysis. We conclude that using Bayesian methods in stable repeated in vivo studies can result in a more effective use of animals, either by reducing the total number of animals used or by increasing the precision of key treatment differences. This will lead to clearer results and supports the "3Rs initiative" to Refine, Reduce and Replace animals in research. Copyright © 2016 John Wiley & Sons, Ltd.
Markov Chain Monte Carlo Methods for Bayesian Data Analysis in Astronomy
Sharma, Sanjib
2017-08-01
Markov Chain Monte Carlo based Bayesian data analysis has now become the method of choice for analyzing and interpreting data in almost all disciplines of science. In astronomy, over the last decade, we have also seen a steady increase in the number of papers that employ Monte Carlo based Bayesian analysis. New, efficient Monte Carlo based methods are continuously being developed and explored. In this review, we first explain the basics of Bayesian theory and discuss how to set up data analysis problems within this framework. Next, we provide an overview of various Monte Carlo based methods for performing Bayesian data analysis. Finally, we discuss advanced ideas that enable us to tackle complex problems and thus hold great promise for the future. We also distribute downloadable computer software (available at https://github.com/sanjibs/bmcmc/ ) that implements some of the algorithms and examples discussed here.
Guidance on the implementation and reporting of a drug safety Bayesian network meta-analysis.
Ohlssen, David; Price, Karen L; Xia, H Amy; Hong, Hwanhee; Kerman, Jouni; Fu, Haoda; Quartey, George; Heilmann, Cory R; Ma, Haijun; Carlin, Bradley P
2014-01-01
The Drug Information Association Bayesian Scientific Working Group (BSWG) was formed in 2011 with a vision to ensure that Bayesian methods are well understood and broadly utilized for design and analysis and throughout the medical product development process, and to improve industrial, regulatory, and economic decision making. The group, composed of individuals from academia, industry, and regulatory, has as its mission to facilitate the appropriate use and contribute to the progress of Bayesian methodology. In this paper, the safety sub-team of the BSWG explores the use of Bayesian methods when applied to drug safety meta-analysis and network meta-analysis. Guidance is presented on the conduct and reporting of such analyses. We also discuss different structural model assumptions and provide discussion on prior specification. The work is illustrated through a case study involving a network meta-analysis related to the cardiovascular safety of non-steroidal anti-inflammatory drugs.
Bayesian analysis of multimodal data and brain imaging
Assadi, Amir H.; Eghbalnia, Hamid; Backonja, Miroslav; Wakai, Ronald T.; Rutecki, Paul; Haughton, Victor
2000-06-01
It is often the case that information about a process can be obtained using a variety of methods. Each method is employed because of specific advantages over the competing alternatives. An example in medical neuro-imaging is the choice between fMRI and MEG modes where fMRI can provide high spatial resolution in comparison to the superior temporal resolution of MEG. The combination of data from varying modes provides the opportunity to infer results that may not be possible by means of any one mode alone. We discuss a Bayesian and learning theoretic framework for enhanced feature extraction that is particularly suited to multi-modal investigations of massive data sets from multiple experiments. In the following Bayesian approach, acquired knowledge (information) regarding various aspects of the process are all directly incorporated into the formulation. This information can come from a variety of sources. In our case, it represents statistical information obtained from other modes of data collection. The information is used to train a learning machine to estimate a probability distribution, which is used in turn by a second machine as a prior, in order to produce a more refined estimation of the distribution of events. The computational demand of the algorithm is handled by proposing a distributed parallel implementation on a cluster of workstations that can be scaled to address real-time needs if required. We provide a simulation of these methods on a set of synthetically generated MEG and EEG data. We show how spatial and temporal resolutions improve by using prior distributions. The method on fMRI signals permits one to construct the probability distribution of the non-linear hemodynamics of the human brain (real data). These computational results are in agreement with biologically based measurements of other labs, as reported to us by researchers from UK. We also provide preliminary analysis involving multi-electrode cortical recording that accompanies
A Bayesian analysis of plutonium exposures in Sellafield workers.
Puncher, M; Riddell, A E
2016-03-01
The joint Russian (Mayak Production Association) and British (Sellafield) plutonium worker epidemiological analysis, undertaken as part of the European Union Framework Programme 7 (FP7) SOLO project, aims to investigate potential associations between cancer incidence and occupational exposures to plutonium using estimates of organ/tissue doses. The dose reconstruction protocol derived for the study makes best use of the most recent biokinetic models derived by the International Commission on Radiological Protection (ICRP) including a recent update to the human respiratory tract model (HRTM). This protocol was used to derive the final point estimates of absorbed doses for the study. Although uncertainties on the dose estimates were not included in the final epidemiological analysis, a separate Bayesian analysis has been performed for each of the 11 808 Sellafield plutonium workers included in the study in order to assess: A. The reliability of the point estimates provided to the epidemiologists and B. The magnitude of the uncertainty on dose estimates. This analysis, which accounts for uncertainties in biokinetic model parameters, intakes and measurement uncertainties, is described in the present paper. The results show that there is excellent agreement between the point estimates of dose and posterior mean values of dose. However, it is also evident that there are significant uncertainties associated with these dose estimates: the geometric range of the 97.5%:2.5% posterior values are a factor of 100 for lung dose, 30 for doses to liver and red bone marrow, and 40 for intakes: these uncertainties are not reflected in estimates of risk when point doses are used to assess them. It is also shown that better estimates of certain key HRTM absorption parameters could significantly reduce the uncertainties on lung dose in future studies.
Phylogenetic analysis of the evolution of lactose digestion in adults.
Holden, C; Mace, R
1997-10-01
In most of the world's population the ability to digest lactose declines sharply after infancy. High lactose digestion capacity in adults is common only in populations of European and circum-Mediterranean origin and is thought to be an evolutionary adaptation to millennia of drinking milk from domestic livestock. Milk can also be consumed in a processed form, such as cheese or soured milk, which has a reduced lactose content. Two other selective pressures for drinking fresh milk with a high lactose content have been proposed: promotion of calcium uptake in high-latitude populations prone to vitamin-D deficiency and maintainance of water and electrolytes in the body in highly and environments. These three hypotheses are all supported by the geographic distribution of high lactose digestion capacity in adults. However, the relationships between environmental variables and adult lactose digestion capacity are highly confounded by the shared ancestry of many populations whose lactose digestion capacity has been tested. The three hypotheses for the evolution of high adult lactose digestion capacity are tested here using a comparative method of analysis that takes the problem of phylogenetic confounding into account. This analysis supports the hypothesis that high adult lactose digestion capacity is an adaptation to dairying but does not support the hypotheses that lactose digestion capacity is additionally selected for either at high latitudes or in highly arid environments. Furthermore, methods using maximum likelihood are used to show that the evolution of milking preceded the evolution of high lactose digestion.
Kinetic and phylogenetic analysis of plant polyamine uptake transporters.
Mulangi, Vaishali; Chibucos, Marcus C; Phuntumart, Vipaporn; Morris, Paul F
2012-10-01
The rice gene Polyamine Uptake Transporter1 (PUT1) was originally identified based on its homology to the polyamine uptake transporters LmPOT1 and TcPAT12 in Leishmania major and Trypanosoma cruzi, respectively. Here we show that five additional transporters from rice and Arabidopsis that cluster in the same clade as PUT1 all function as high affinity spermidine uptake transporters. Yeast expression assays of these genes confirmed that uptake of spermidine was minimally affected by 166 fold or greater concentrations of amino acids. Characterized polyamine transporters from both Arabidopsis thaliana and Oryza sativa along with the two polyamine transporters from L. major and T. cruzi were aligned and used to generate a hidden Markov model. This model was used to identify significant matches to proteins in other angiosperms, bryophytes, chlorophyta, discicristates, excavates, stramenopiles and amoebozoa. No significant matches were identified in fungal or metazoan genomes. Phylogenic analysis showed that some sequences from the haptophyte, Emiliania huxleyi, as well as sequences from oomycetes and diatoms clustered closer to sequences from plant genomes than from a homologous sequence in the red algal genome Galdieria sulphuraria, consistent with the hypothesis that these polyamine transporters were acquired by horizontal transfer from green algae. Leishmania and Trypansosoma formed a separate cluster with genes from other Discicristates and two Entamoeba species. We surmise that the genes in Entamoeba species were acquired by phagotrophy of Discicristates. In summary, phylogenetic and functional analysis has identified two clades of genes that are predictive of polyamine transport activity.
Molecular analysis and phylogenetic characterization of HIV in Iran.
Sarrami-Forooshani, Ramin; Das, Suman Ranjan; Sabahi, Farzaneh; Adeli, Ahmad; Esmaeili, Rezvan; Wahren, Britta; Mohraz, Minoo; Haji-Abdolbaghi, Mahboubeh; Rasoolinejad, Mehrnaz; Jameel, Shahid; Mahboudi, Fereidoun
2006-07-01
The rate of human immunodeficiency virus type 1 (HIV-1) infection in Iran has increased dramatically in the last few years. While the earliest cases were found in hemophiliacs, intravenous drug users are now fueling the outbreak. In this study, both the 122 clones of HIV-1 gag p17 and the 131 clones of env V1-V5 region were obtained from 61 HIV-1 seropositives belonging to these two groups in Iran. HIV-1 subtyping and phylogenetic analysis was done by heteroduplex mobility assays (HMA) and multiple clone sequencing. The result indicated all hemophiliacs are infected with HIV-1 subtype B and all intravenous drug users are infected with HIV-1 subtype A. Since intravenous drug abuse is the major transmission route in Iran, HIV-1 subtype A is likely to be the dominant viral subtype circulating in the country. The analysis of genetic distances showed subtype B viruses in Iran to be twice as heterogeneous as the subtype A viruses. In conclusion, this first molecular study of HIV-1 genotypes in Iran suggests two parallel outbreaks in distinct high-risk populations and may offer clues to the origin and spread of infection in Iran.
Bayesian analysis of input uncertainty in hydrological modeling: 2. Application
Kavetski, Dmitri; Kuczera, George; Franks, Stewart W.
2006-03-01
The Bayesian total error analysis (BATEA) methodology directly addresses both input and output errors in hydrological modeling, requiring the modeler to make explicit, rather than implicit, assumptions about the likely extent of data uncertainty. This study considers a BATEA assessment of two North American catchments: (1) French Broad River and (2) Potomac basins. It assesses the performance of the conceptual Variable Infiltration Capacity (VIC) model with and without accounting for input (precipitation) uncertainty. The results show the considerable effects of precipitation errors on the predicted hydrographs (especially the prediction limits) and on the calibrated parameters. In addition, the performance of BATEA in the presence of severe model errors is analyzed. While BATEA allows a very direct treatment of input uncertainty and yields some limited insight into model errors, it requires the specification of valid error models, which are currently poorly understood and require further work. Moreover, it leads to computationally challenging highly dimensional problems. For some types of models, including the VIC implemented using robust numerical methods, the computational cost of BATEA can be reduced using Newton-type methods.
Bayesian Angular Power Spectrum Analysis of Interferometric Data
Sutter, P. M.; Wandelt, Benjamin D.; Malu, Siddarth S.
2012-09-01
We present a Bayesian angular power spectrum and signal map inference engine which can be adapted to interferometric observations of anisotropies in the cosmic microwave background (CMB), 21 cm emission line mapping of galactic brightness fluctuations, or 21 cm absorption line mapping of neutral hydrogen in the dark ages. The method uses Gibbs sampling to generate a sampled representation of the angular power spectrum posterior and the posterior of signal maps given a set of measured visibilities in the uv-plane. We use a mock interferometric CMB observation to demonstrate the validity of this method in the flat-sky approximation when adapted to take into account arbitrary coverage of the uv-plane, mode-mode correlations due to observations on a finite patch, and heteroschedastic visibility errors. The computational requirements scale as {O}(n_p log n_p) where np measures the ratio of the size of the detector array to the inter-detector spacing, meaning that Gibbs sampling is a promising technique for meeting the data analysis requirements of future cosmology missions.
BAYESIAN ANGULAR POWER SPECTRUM ANALYSIS OF INTERFEROMETRIC DATA
Sutter, P. M.; Wandelt, Benjamin D.; Malu, Siddarth S.
2012-09-15
We present a Bayesian angular power spectrum and signal map inference engine which can be adapted to interferometric observations of anisotropies in the cosmic microwave background (CMB), 21 cm emission line mapping of galactic brightness fluctuations, or 21 cm absorption line mapping of neutral hydrogen in the dark ages. The method uses Gibbs sampling to generate a sampled representation of the angular power spectrum posterior and the posterior of signal maps given a set of measured visibilities in the uv-plane. We use a mock interferometric CMB observation to demonstrate the validity of this method in the flat-sky approximation when adapted to take into account arbitrary coverage of the uv-plane, mode-mode correlations due to observations on a finite patch, and heteroschedastic visibility errors. The computational requirements scale as O(n{sub p} log n{sub p}) where n{sub p} measures the ratio of the size of the detector array to the inter-detector spacing, meaning that Gibbs sampling is a promising technique for meeting the data analysis requirements of future cosmology missions.
A Bayesian Model for the Analysis of Transgenerational Epigenetic Variation
Varona, Luis; Munilla, Sebastián; Mouresan, Elena Flavia; González-Rodríguez, Aldemar; Moreno, Carlos; Altarriba, Juan
2015-01-01
Epigenetics has become one of the major areas of biological research. However, the degree of phenotypic variability that is explained by epigenetic processes still remains unclear. From a quantitative genetics perspective, the estimation of variance components is achieved by means of the information provided by the resemblance between relatives. In a previous study, this resemblance was described as a function of the epigenetic variance component and a reset coefficient that indicates the rate of dissipation of epigenetic marks across generations. Given these assumptions, we propose a Bayesian mixed model methodology that allows the estimation of epigenetic variance from a genealogical and phenotypic database. The methodology is based on the development of a T matrix of epigenetic relationships that depends on the reset coefficient. In addition, we present a simple procedure for the calculation of the inverse of this matrix (T−1) and a Gibbs sampler algorithm that obtains posterior estimates of all the unknowns in the model. The new procedure was used with two simulated data sets and with a beef cattle database. In the simulated populations, the results of the analysis provided marginal posterior distributions that included the population parameters in the regions of highest posterior density. In the case of the beef cattle dataset, the posterior estimate of transgenerational epigenetic variability was very low and a model comparison test indicated that a model that did not included it was the most plausible. PMID:25617408
Studies in Astronomical Time Series Analysis. VI. Bayesian Block Representations
NASA Technical Reports Server (NTRS)
Scargle, Jeffrey D.; Norris, Jay P.; Jackson, Brad; Chiang, James
2013-01-01
This paper addresses the problem of detecting and characterizing local variability in time series and other forms of sequential data. The goal is to identify and characterize statistically significant variations, at the same time suppressing the inevitable corrupting observational errors. We present a simple nonparametric modeling technique and an algorithm implementing it-an improved and generalized version of Bayesian Blocks [Scargle 1998]-that finds the optimal segmentation of the data in the observation interval. The structure of the algorithm allows it to be used in either a real-time trigger mode, or a retrospective mode. Maximum likelihood or marginal posterior functions to measure model fitness are presented for events, binned counts, and measurements at arbitrary times with known error distributions. Problems addressed include those connected with data gaps, variable exposure, extension to piece- wise linear and piecewise exponential representations, multivariate time series data, analysis of variance, data on the circle, other data modes, and dispersed data. Simulations provide evidence that the detection efficiency for weak signals is close to a theoretical asymptotic limit derived by [Arias-Castro, Donoho and Huo 2003]. In the spirit of Reproducible Research [Donoho et al. (2008)] all of the code and data necessary to reproduce all of the figures in this paper are included as auxiliary material.
Using Bayesian Population Viability Analysis to Define Relevant Conservation Objectives
Green, Adam W.; Bailey, Larissa L.
2015-01-01
Adaptive management provides a useful framework for managing natural resources in the face of uncertainty. An important component of adaptive management is identifying clear, measurable conservation objectives that reflect the desired outcomes of stakeholders. A common objective is to have a sustainable population, or metapopulation, but it can be difficult to quantify a threshold above which such a population is likely to persist. We performed a Bayesian metapopulation viability analysis (BMPVA) using a dynamic occupancy model to quantify the characteristics of two wood frog (Lithobates sylvatica) metapopulations resulting in sustainable populations, and we demonstrate how the results could be used to define meaningful objectives that serve as the basis of adaptive management. We explored scenarios involving metapopulations with different numbers of patches (pools) using estimates of breeding occurrence and successful metamorphosis from two study areas to estimate the probability of quasi-extinction and calculate the proportion of vernal pools producing metamorphs. Our results suggest that ≥50 pools are required to ensure long-term persistence with approximately 16% of pools producing metamorphs in stable metapopulations. We demonstrate one way to incorporate the BMPVA results into a utility function that balances the trade-offs between ecological and financial objectives, which can be used in an adaptive management framework to make optimal, transparent decisions. Our approach provides a framework for using a standard method (i.e., PVA) and available information to inform a formal decision process to determine optimal and timely management policies. PMID:26658734
A Bayesian model for the analysis of transgenerational epigenetic variation.
Varona, Luis; Munilla, Sebastián; Mouresan, Elena Flavia; González-Rodríguez, Aldemar; Moreno, Carlos; Altarriba, Juan
2015-01-23
Epigenetics has become one of the major areas of biological research. However, the degree of phenotypic variability that is explained by epigenetic processes still remains unclear. From a quantitative genetics perspective, the estimation of variance components is achieved by means of the information provided by the resemblance between relatives. In a previous study, this resemblance was described as a function of the epigenetic variance component and a reset coefficient that indicates the rate of dissipation of epigenetic marks across generations. Given these assumptions, we propose a Bayesian mixed model methodology that allows the estimation of epigenetic variance from a genealogical and phenotypic database. The methodology is based on the development of a T: matrix of epigenetic relationships that depends on the reset coefficient. In addition, we present a simple procedure for the calculation of the inverse of this matrix ( T-1: ) and a Gibbs sampler algorithm that obtains posterior estimates of all the unknowns in the model. The new procedure was used with two simulated data sets and with a beef cattle database. In the simulated populations, the results of the analysis provided marginal posterior distributions that included the population parameters in the regions of highest posterior density. In the case of the beef cattle dataset, the posterior estimate of transgenerational epigenetic variability was very low and a model comparison test indicated that a model that did not included it was the most plausible.
Using Bayesian Population Viability Analysis to Define Relevant Conservation Objectives.
Green, Adam W; Bailey, Larissa L
2015-01-01
Adaptive management provides a useful framework for managing natural resources in the face of uncertainty. An important component of adaptive management is identifying clear, measurable conservation objectives that reflect the desired outcomes of stakeholders. A common objective is to have a sustainable population, or metapopulation, but it can be difficult to quantify a threshold above which such a population is likely to persist. We performed a Bayesian metapopulation viability analysis (BMPVA) using a dynamic occupancy model to quantify the characteristics of two wood frog (Lithobates sylvatica) metapopulations resulting in sustainable populations, and we demonstrate how the results could be used to define meaningful objectives that serve as the basis of adaptive management. We explored scenarios involving metapopulations with different numbers of patches (pools) using estimates of breeding occurrence and successful metamorphosis from two study areas to estimate the probability of quasi-extinction and calculate the proportion of vernal pools producing metamorphs. Our results suggest that ≥50 pools are required to ensure long-term persistence with approximately 16% of pools producing metamorphs in stable metapopulations. We demonstrate one way to incorporate the BMPVA results into a utility function that balances the trade-offs between ecological and financial objectives, which can be used in an adaptive management framework to make optimal, transparent decisions. Our approach provides a framework for using a standard method (i.e., PVA) and available information to inform a formal decision process to determine optimal and timely management policies.
STUDIES IN ASTRONOMICAL TIME SERIES ANALYSIS. VI. BAYESIAN BLOCK REPRESENTATIONS
Scargle, Jeffrey D.; Norris, Jay P.; Jackson, Brad; Chiang, James
2013-02-20
This paper addresses the problem of detecting and characterizing local variability in time series and other forms of sequential data. The goal is to identify and characterize statistically significant variations, at the same time suppressing the inevitable corrupting observational errors. We present a simple nonparametric modeling technique and an algorithm implementing it-an improved and generalized version of Bayesian Blocks-that finds the optimal segmentation of the data in the observation interval. The structure of the algorithm allows it to be used in either a real-time trigger mode, or a retrospective mode. Maximum likelihood or marginal posterior functions to measure model fitness are presented for events, binned counts, and measurements at arbitrary times with known error distributions. Problems addressed include those connected with data gaps, variable exposure, extension to piecewise linear and piecewise exponential representations, multivariate time series data, analysis of variance, data on the circle, other data modes, and dispersed data. Simulations provide evidence that the detection efficiency for weak signals is close to a theoretical asymptotic limit derived by Arias-Castro et al. In the spirit of Reproducible Research all of the code and data necessary to reproduce all of the figures in this paper are included as supplementary material.
A Bayesian analysis of the 2016 Pedernales (Ecuador) earthquake
Gombert, Baptiste; Duputel, Zacharie; Jolivet, Romain; Rivera, Luis; Simons, Mark; Jiang, Junle; Liang, Cunren; Fielding, Eric
2017-04-01
A Mw 7.8 earthquake struck Ecuador on April 16, 2016, causing significant damage and casualties. Long period W-phase and Global CMT solutions suggest that fault slip for this event agrees with the convergence obliquity of the Ecuadorian subduction. We present a new co-seismic kinematic slip model obtained from the joint inversion of multiple observations in an unregularized and fully Bayesian framework. We use a comprehensive static dataset composed of several SAR interferograms, GPS static offsets, and tsunami waveforms from two nearby DART stations. The kinematic component of the rupture process is constrained by an extensive set of high-rate GPS and seismic data. Our solution includes the ensemble of all plausible slip models that are consistent with our prior information and fit the available observations within data and prediction uncertainties. We analyze the source process in light of the historical seismicity, in particular the Mw 7.8 1942 earthquake for which the rupture extent overlaps with the 2016 event. In addition, we conduct a probabilistic comparison of co-seismic slip with a stochastic interseismic coupling model obtained from GPS data. This analysis gives new insights on the processes at play within the Ecuadorian subduction margin.
Bayesian analysis of a reduced-form air quality model.
Foley, Kristen M; Reich, Brian J; Napelenok, Sergey L
2012-07-17
Numerical air quality models are being used for assessing emission control strategies for improving ambient pollution levels across the globe. This paper applies probabilistic modeling to evaluate the effectiveness of emission reduction scenarios aimed at lowering ground-level ozone concentrations. A Bayesian hierarchical model is used to combine air quality model output and monitoring data in order to characterize the impact of emissions reductions while accounting for different degrees of uncertainty in the modeled emissions inputs. The probabilistic model predictions are weighted based on population density in order to better quantify the societal benefits/disbenefits of four hypothetical emission reduction scenarios in which domain-wide NO(x) emissions from various sectors are reduced individually and then simultaneously. Cross validation analysis shows the statistical model performs well compared to observed ozone levels. Accounting for the variability and uncertainty in the emissions and atmospheric systems being modeled is shown to impact how emission reduction scenarios would be ranked, compared to standard methodology.
Spatial Hierarchical Bayesian Analysis of the Historical Extreme Streamflow
Najafi, M. R.; Moradkhani, H.
2012-04-01
Analysis of the climate change impact on extreme hydro-climatic events is crucial for future hydrologic/hydraulic designs and water resources decision making. The purpose of this study is to investigate the changes of the extreme value distribution parameters with respect to time to reflect upon the impact of climate change. We develop a statistical model using the observed streamflow data of the Columbia River Basin in USA to estimate the changes of high flows as a function of time as well as other variables. Generalized Pareto Distribution (GPD) is used to model the upper 95% flows during December through March for 31 gauge stations. In the process layer of the model the covariates including time, latitude, longitude, elevation and basin area are considered to assess the sensitivity of the model to each variable. Markov Chain Monte Carlo (MCMC) method is used to estimate the parameters. The Spatial Hierarchical Bayesian technique models the GPD parameters spatially and borrows strength from other locations by pooling data together, while providing an explicit estimation of the uncertainties in all stages of modeling.
A Bayesian Analysis of Regularised Source Inversions in Gravitational Lensing
Suyu, Sherry H.; Marshall, P.J.; Hobson, M.P.; Blandford, R.D.; /Caltech /KIPAC, Menlo Park
2006-01-25
Strong gravitational lens systems with extended sources are of special interest because they provide additional constraints on the models of the lens systems. To use a gravitational lens system for measuring the Hubble constant, one would need to determine the lens potential and the source intensity distribution simultaneously. A linear inversion method to reconstruct a pixellated source distribution of a given lens potential model was introduced by Warren and Dye. In the inversion process, a regularization on the source intensity is often needed to ensure a successful inversion with a faithful resulting source. In this paper, we use Bayesian analysis to determine the optimal regularization constant (strength of regularization) of a given form of regularization and to objectively choose the optimal form of regularization given a selection of regularizations. We consider and compare quantitatively three different forms of regularization previously described in the literature for source inversions in gravitational lensing: zeroth-order, gradient and curvature. We use simulated data with the exact lens potential to demonstrate the method. We find that the preferred form of regularization depends on the nature of the source distribution.
NASA Astrophysics Data System (ADS)
Loredo, Thomas J.; Hendry, Martin; Kowal, Daniel; Ruppert, David
2016-01-01
Synoptic time-domain surveys provide astronomers, not simply more data, but a different kind of data: large ensembles of multivariate, irregularly and asynchronously sampled light curves. We describe a statistical framework for light curve demography—optimal accumulation and extraction of information, not only along individual light curves as conventional methods do, but also across large ensembles of related light curves. We build the framework using tools from functional data analysis (FDA), a rapidly growing area of statistics that addresses inference from datasets that sample ensembles of related functions. Our Bayesian FDA framework builds hierarchical models that describe light curve ensembles using multiple levels of randomness: upper levels describe the source population, and lower levels describe the observation process, including measurement errors and selection effects. Roughly speaking, a particular object's light curve is modeled as the sum of a parameterized template component (modeling population-averaged behavior) and a peculiar component (modeling variability across the population), subsequently subjected to an observation model. A functional shrinkage adjustment to individual light curves emerges—an adaptive, functional generalization of the kind of adjustments made for Eddington or Malmquist bias in single-epoch photometric surveys. We describe ongoing work applying the framework to improved estimation of Cepheid variable star luminosities via FDA-based refinement and generalization of the Cepheid period-luminosity relation.
Light curve demography via Bayesian functional data analysis
Loredo, Thomas; Budavari, Tamas; Hendry, Martin A.; Kowal, Daniel; Ruppert, David
2015-08-01
Synoptic time-domain surveys provide astronomers, not simply more data, but a different kind of data: large ensembles of multivariate, irregularly and asynchronously sampled light curves. We describe a statistical framework for light curve demography—optimal accumulation and extraction of information, not only along individual light curves as conventional methods do, but also across large ensembles of related light curves. We build the framework using tools from functional data analysis (FDA), a rapidly growing area of statistics that addresses inference from datasets that sample ensembles of related functions. Our Bayesian FDA framework builds hierarchical models that describe light curve ensembles using multiple levels of randomness: upper levels describe the source population, and lower levels describe the observation process, including measurement errors and selection effects. Schematically, a particular object's light curve is modeled as the sum of a parameterized template component (modeling population-averaged behavior) and a peculiar component (modeling variability across the population), subsequently subjected to an observation model. A functional shrinkage adjustment to individual light curves emerges—an adaptive, functional generalization of the kind of adjustments made for Eddington or Malmquist bias in single-epoch photometric surveys. We are applying the framework to a variety of problems in synoptic time-domain survey astronomy, including optimal detection of weak sources in multi-epoch data, and improved estimation of Cepheid variable star luminosities from detailed demographic modeling of ensembles of Cepheid light curves.
Mugosa, Boban; Cella, Eleonora; Lai, Alessia; Lo Presti, Alessandra; Blasi, Aletheia; Vratnica, Zoran; Vujoševic, Danijela; Ebranati, Erika; Lauševic, Dragan; Guarino, Michele; Zehender, Gianguglielmo; Milano, Teresa; Pascarella, Stefano; Spoto, Silvia; Angeletti, Silvia; Ciccozzi, Massimo
2017-06-01
Few reports are available on HCV molecular epidemiology among IDUs in Eastern Europe, and none in Montenegro. The aim of this study was to investigate the HCV genotype distribution in Montenegro among IDUs and to perform Bayesian and evolutionary analysis of the most prevalent HCV genotype circulating in this population. Sixty-four HCV-positive IDUs in Montenegro were enrolled between 2013 and 2014, and the NS5B gene was sequenced. The Bayesian analysis showed that the most prevalent subtype was HCV-3a. Phylogenetic data showed that HCV-3a reached Montenegro in the late 1990s, causing an epidemic that exponentially grew between the 1995 and 2005. In the dated tree, four different entries, from 1990 (clade D), 1994 (clade A) to 1999 (clade B) and 2001 (clade C), were identified. In the NS5B protein model, the amino acids variations were located mainly in the palm domain, which contains most of the conserved structural elements of the active site. This study provides an analysis of the virus transmission pathway and the evolution of HCV genotype 3a among IDUs in Montenegro. These data could represent the basis for further strategies aimed to improve disease management and surveillance program development in high-risk populations.
Nuclear stockpile stewardship and Bayesian image analysis (DARHT and the BIE)
Carroll, James L
2011-01-11
Since the end of nuclear testing, the reliability of our nation's nuclear weapon stockpile has been performed using sub-critical hydrodynamic testing. These tests involve some pretty 'extreme' radiography. We will be discussing the challenges and solutions to these problems provided by DARHT (the world's premiere hydrodynamic testing facility) and the BIE or Bayesian Inference Engine (a powerful radiography analysis software tool). We will discuss the application of Bayesian image analysis techniques to this important and difficult problem.
Erosion of phylogenetic signal in tunicate mitochondrial genomes on different levels of analysis.
Stach, Thomas; Braband, Anke; Podsiadlowski, Lars
2010-06-01
The molecular phylogenetic position of Tunicata and internal interrelationship of higher tunicate taxa is controversial. High substitution rates and extreme gene order variability hamper phylogenetic analyses. We describe the sequence and organization of the mitochondrial genome of the aplousobranch ascidian Clavelina lepadiformis and use mitochondrial genomes to investigate phylogenetic information content on different molecular levels of comparison. Despite agreement in phylogenetic analyses of nucleotide and amino acid sequences, split analyses revealed little phylogenetic signal. Split analyses on molecular data sets deemed increasingly conservative, demonstrated that the lack of signal pervades all levels and that it is Tunicata the taxon of interest that introduces noise in the data sets. The strongest signal present in our molecular data sets as revealed by split analyses is not present in the optimal cladograms and supports a sister group relationship between cephalochordates and craniates. Phylogenetic analysis of gene order using common interval algorithms shows that phylogenetic signal is also eroded in respect of gene positions. Even functional constraints, such as partial gene overlap as exemplified in the case of the commonly observed adjacency between cox2 and cytb are subjected to homoplasy. However, rare phylogenetic events like this hold some promise to retain phylogenetic information even in such cases of extreme variability. We therefore caution to rely on sequence analysis alone and recommend investigation into the signal content of molecular data sets in order to assess the strength of phylogenetic signal.
Reporting of Bayesian analysis in epidemiologic research should become more transparent.
Rietbergen, Charlotte; Debray, Thomas P A; Klugkist, Irene; Janssen, Kristel J M; Moons, Karel G M
2017-06-01
The objective of this systematic review is to investigate the use of Bayesian data analysis in epidemiology in the past decade and particularly to evaluate the quality of research papers reporting the results of these analyses. Complete volumes of five major epidemiological journals in the period 2005-2015 were searched via PubMed. In addition, we performed an extensive within-manuscript search using a specialized Java application. Details of reporting on Bayesian statistics were examined in the original research papers with primary Bayesian data analyses. The number of studies in which Bayesian techniques were used for primary data analysis remains constant over the years. Though many authors presented thorough descriptions of the analyses they performed and the results they obtained, several reports presented incomplete method sections and even some incomplete result sections. Especially, information on the process of prior elicitation, specification, and evaluation was often lacking. Though available guidance papers concerned with reporting of Bayesian analyses emphasize the importance of transparent prior specification, the results obtained in this systematic review show that these guidance papers are often not used. Additional efforts should be made to increase the awareness of the existence and importance of these checklists to overcome the controversy with respect to the use of Bayesian techniques. The reporting quality in epidemiological literature could be improved by updating existing guidelines on the reporting of frequentist analyses to address issues that are important for Bayesian data analyses. Copyright © 2017 Elsevier Inc. All rights reserved.
Reidenbach, Kyanne R; Cook, Shelley; Bertone, Matthew A; Harbach, Ralph E; Wiegmann, Brian M; Besansky, Nora J
2009-12-22
Phylogenetic analyses provide a framework for examining the evolution of morphological and molecular diversity, interpreting patterns in biogeography, and achieving a stable classification. The generic and suprageneric relationships within mosquitoes (Diptera: Culicidae) are poorly resolved, making these subjects difficult to address. We carried out maximum parsimony and maximum likelihood, including Bayesian, analyses on a data set consisting of six nuclear genes and 80 morphological characters to assess their ability to resolve relationships among 25 genera. We also estimated divergence times based on sequence data and fossil calibration points, using Bayesian relaxed clock methods. Strong support was recovered for the basal position and monophyly of the subfamily Anophelinae and the tribes Aedini and Sabethini of subfamily Culicinae. Divergence times for major culicid lineages date to the early Cretaceous. Deeper relationships within the family remain poorly resolved, suggesting the need for additional taxonomic sampling. Our results support the notion of rapid radiations early in the diversification of mosquitoes.
2009-01-01
[Molecular phylogenetic analysis of Paecilomyces hepiali and Cordyceps sinensis].
Yang, Jin-Ling; Xiao, Wei; He, Hui-Xia; Zhu, Hui-Xin; Wang, Shu-Fang; Cheng, Ke-Di; Zhu, Ping
2008-04-01
Phylogenetic relationship between Paecilomyces hepiali and Cordyceps sinensis was studied by analyzing the sequence of rDNA-ITS. The samples of C. sinensis were collected from Hualong County in Qinghai Province and Kangding County in Sichuan Province in May and June, respectively. The rDNA-ITS fragments were obtained by PCR amplification with the template genomic DNA of the fresh stroma or caterpillar body of the collected samples and the cultured mycelium of P. hepiali, with the universal fungal primers ITS1/ITS4. The amplified fragments were cloned into pMD18-T Vector and sequenced. Phylogenetic analysis was performed with these sequences and those from GenBank. The result showed that all of the 46 clones randomly chosen from the amplification of C. sinensis shared identical or almost identical rDNA-ITS regions and had over 99% identity with some rDNA-ITS sequences of Hirsutella sinensis and C. sinensis registered in GenBank, but all of them had only about 72% identity with that of P. hepiali. Two pairs of specific primers were designed based on the rDNA-ITS sequence of P. hepiali, then PCR and Nest-PCR were performed with the template genomic DNA of the stroma or caterpillar body of C. sinensis samples mentioned above. The apparent bands amplified by Nest-PCR were obtained from all of the samples, and the sequences showed 100% identity with the rDNA-ITS sequence of P. hepiali. In addition, another pair of specific primers were designed based on the rDNA-ITS sequence registered in GenBank as the marker of C. sinensis (accession no. AB067740) but the latter only shared 87.3% identity with that of H. sinensis (accession no. AJ309353). This pair of primers was used to amplify the C. sinensis samples by PCR, and the amplified sequence showed 100% identity with that of AB067740. The result indicated that H. sinensis is the main body of C. sinensis, while some other endoparasitic fungi such as P. hepiali commonly exist in the natural C. sinensis.
Molecular phylogenetic analysis of mango mealybug, Drosicha mangiferae from Punjab.
Banta, Geetika; Jindal, Vikas; Mohindru, Bharathi; Sharma, Sachin; Kaur, Jaimeet; Gupta, V K
2016-01-01
Mealybugs (Hemiptera: Pseudococcidae) are major pests of a wide range of crops and ornamental plants worldwide. Their high degree of morphological similarity makes them difficult to identify and limits their study and management. In the present study, four Indian populations of mango mealybug (mango, litchi, guava from Gurdaspur and mango from Jalandhar) were analyzed. The mtCOI region was amplified, cloned, the nucleotide sequences were determined and analysed. All the four species were found to be D. mangiferae. The population from Litchi and Mango from Gurdaspur showed 100% homologus sequence. The population of Guava-Gurdaspur and Mango-Jalandhar showed a single mutation of 'C' instead of 'T' at 18th and 196th position, respectively. Indian populations were compared with populations from Pakistan (21) and Japan (1). The phylogenetic tree resulted in two main clusters. Cluster1 represent all the 4 populations of Punjab, India, 20 of Pakistan (Punjab, Sind, Lahore, Multan, Faisalabad and Karak districts) with homologous sequences. The two population collected from Faisalabad district of Pakistan and Japan made a separate cluster 2 because the gene sequence used in analysis was from the COI-3p region. However, all the other sequence of D. mangiferae samples under study showed a low nucleotide divergence. The homologus mtCO1 sequence of Indian and Pakistan population concluded that the genetic diversity in mealybug population was quite less over a large geographical area.
Inventory and phylogenetic analysis of meiotic genes in monogonont rotifers.
Hanson, Sara J; Schurko, Andrew M; Hecox-Lea, Bette; Welch, David B Mark; Stelzer, Claus-Peter; Logsdon, John M
2013-01-01
A long-standing question in evolutionary biology is how sexual reproduction has persisted in eukaryotic lineages. As cyclical parthenogens, monogonont rotifers are a powerful model for examining this question, yet the molecular nature of sexual reproduction in this lineage is currently understudied. To examine genes involved in meiosis, we generated partial genome assemblies for 2 distantly related monogonont species, Brachionus calyciflorus and B. manjavacas. Here we present an inventory of 89 meiotic genes, of which 80 homologs were identified and annotated from these assemblies. Using phylogenetic analysis, we show that several meiotic genes have undergone relatively recent duplication events that appear to be specific to the monogonont lineage. Further, we compare the expression of "meiosis-specific" genes involved in recombination and all annotated copies of the cell cycle regulatory gene CDC20 between obligate parthenogenetic (OP) and cyclical parthenogenetic (CP) strains of B. calyciflorus. We show that "meiosis-specific" genes are expressed in both CP and OP strains, whereas the expression of one of the CDC20 genes is specific to cyclical parthenogenesis. The data presented here provide insights into mechanisms of cyclical parthenogenesis and establish expectations for studies of obligate asexual relatives of monogononts, the bdelloid rotifer lineage.
Inventory and Phylogenetic Analysis of Meiotic Genes in Monogonont Rotifers
2013-01-01
A long-standing question in evolutionary biology is how sexual reproduction has persisted in eukaryotic lineages. As cyclical parthenogens, monogonont rotifers are a powerful model for examining this question, yet the molecular nature of sexual reproduction in this lineage is currently understudied. To examine genes involved in meiosis, we generated partial genome assemblies for 2 distantly related monogonont species, Brachionus calyciflorus and B. manjavacas. Here we present an inventory of 89 meiotic genes, of which 80 homologs were identified and annotated from these assemblies. Using phylogenetic analysis, we show that several meiotic genes have undergone relatively recent duplication events that appear to be specific to the monogonont lineage. Further, we compare the expression of “meiosis-specific” genes involved in recombination and all annotated copies of the cell cycle regulatory gene CDC20 between obligate parthenogenetic (OP) and cyclical parthenogenetic (CP) strains of B. calyciflorus. We show that “meiosis-specific” genes are expressed in both CP and OP strains, whereas the expression of one of the CDC20 genes is specific to cyclical parthenogenesis. The data presented here provide insights into mechanisms of cyclical parthenogenesis and establish expectations for studies of obligate asexual relatives of monogononts, the bdelloid rotifer lineage. PMID:23487324
Bayesian analysis of anisotropic cosmologies: Bianchi VIIh and WMAP
McEwen, J. D.; Josset, T.; Feeney, S. M.; Peiris, H. V.; Lasenby, A. N.
2013-12-01
We perform a definitive analysis of Bianchi VIIh cosmologies with Wilkinson Microwave Anisotropy Probe (WMAP) observations of the cosmic microwave background (CMB) temperature anisotropies. Bayesian analysis techniques are developed to study anisotropic cosmologies using full-sky and partial-sky masked CMB temperature data. We apply these techniques to analyse the full-sky internal linear combination (ILC) map and a partial-sky masked W-band map of WMAP 9 yr observations. In addition to the physically motivated Bianchi VIIh model, we examine phenomenological models considered in previous studies, in which the Bianchi VIIh parameters are decoupled from the standard cosmological parameters. In the two phenomenological models considered, Bayes factors of 1.7 and 1.1 units of log-evidence favouring a Bianchi component are found in full-sky ILC data. The corresponding best-fitting Bianchi maps recovered are similar for both phenomenological models and are very close to those found in previous studies using earlier WMAP data releases. However, no evidence for a phenomenological Bianchi component is found in the partial-sky W-band data. In the physical Bianchi VIIh model, we find no evidence for a Bianchi component: WMAP data thus do not favour Bianchi VIIh cosmologies over the standard Λ cold dark matter (ΛCDM) cosmology. It is not possible to discount Bianchi VIIh cosmologies in favour of ΛCDM completely, but we are able to constrain the vorticity of physical Bianchi VIIh cosmologies at (ω/H)0 < 8.6 × 10-10 with 95 per cent confidence.
Oliveira, Claudio; Avelino, Gleisy S; Abe, Kelly T; Mariguela, Tatiane C; Benine, Ricardo C; Ortí, Guillermo; Vari, Richard P; Corrêa e Castro, Ricardo M
2011-09-26
With nearly 1,100 species, the fish family Characidae represents more than half of the species of Characiformes, and is a key component of Neotropical freshwater ecosystems. The composition, phylogeny, and classification of Characidae is currently uncertain, despite significant efforts based on analysis of morphological and molecular data. No consensus about the monophyly of this group or its position within the order Characiformes has been reached, challenged by the fact that many key studies to date have non-overlapping taxonomic representation and focus only on subsets of this diversity. In the present study we propose a new definition of the family Characidae and a hypothesis of relationships for the Characiformes based on phylogenetic analysis of DNA sequences of two mitochondrial and three nuclear genes (4,680 base pairs). The sequences were obtained from 211 samples representing 166 genera distributed among all 18 recognized families in the order Characiformes, all 14 recognized subfamilies in the Characidae, plus 56 of the genera so far considered incertae sedis in the Characidae. The phylogeny obtained is robust, with most lineages significantly supported by posterior probabilities in Bayesian analysis, and high bootstrap values from maximum likelihood and parsimony analyses. A monophyletic assemblage strongly supported in all our phylogenetic analysis is herein defined as the Characidae and includes the characiform species lacking a supraorbital bone and with a derived position of the emergence of the hyoid artery from the anterior ceratohyal. To recognize this and several other monophyletic groups within characiforms we propose changes in the limits of several families to facilitate future studies in the Characiformes and particularly the Characidae. This work presents a new phylogenetic framework for a speciose and morphologically diverse group of freshwater fishes of significant ecological and evolutionary importance across the Neotropics and portions
2011-01-01
Background With nearly 1,100 species, the fish family Characidae represents more than half of the species of Characiformes, and is a key component of Neotropical freshwater ecosystems. The composition, phylogeny, and classification of Characidae is currently uncertain, despite significant efforts based on analysis of morphological and molecular data. No consensus about the monophyly of this group or its position within the order Characiformes has been reached, challenged by the fact that many key studies to date have non-overlapping taxonomic representation and focus only on subsets of this diversity. Results In the present study we propose a new definition of the family Characidae and a hypothesis of relationships for the Characiformes based on phylogenetic analysis of DNA sequences of two mitochondrial and three nuclear genes (4,680 base pairs). The sequences were obtained from 211 samples representing 166 genera distributed among all 18 recognized families in the order Characiformes, all 14 recognized subfamilies in the Characidae, plus 56 of the genera so far considered incertae sedis in the Characidae. The phylogeny obtained is robust, with most lineages significantly supported by posterior probabilities in Bayesian analysis, and high bootstrap values from maximum likelihood and parsimony analyses. Conclusion A monophyletic assemblage strongly supported in all our phylogenetic analysis is herein defined as the Characidae and includes the characiform species lacking a supraorbital bone and with a derived position of the emergence of the hyoid artery from the anterior ceratohyal. To recognize this and several other monophyletic groups within characiforms we propose changes in the limits of several families to facilitate future studies in the Characiformes and particularly the Characidae. This work presents a new phylogenetic framework for a speciose and morphologically diverse group of freshwater fishes of significant ecological and evolutionary importance
A New Orchid Genus, Danxiaorchis, and Phylogenetic Analysis of the Tribe Calypsoeae
Zhai, Jun-Wen; Zhang, Guo-Qiang; Chen, Li-Jun; Xiao, Xin-Ju; Liu, Ke-Wei; Tsai, Wen-Chieh; Hsiao, Yu-Yun; Tian, Huai-Zhen; Zhu, Jia-Qiang; Wang, Mei-Na; Wang, Fa-Guo; Xing, Fu-Wu; Liu, Zhong-Jian
2013-01-01
Background Orchids have numerous species, and their speciation rates are presumed to be exceptionally high, suggesting that orchids are continuously and actively evolving. The wide diversity of orchids has attracted the interest of evolutionary biologists. In this study, a new orchid was discovered on Danxia Mountain in Guangdong, China. However, the phylogenetic clarification of this new orchid requires further molecular, morphological, and phytogeographic analyses. Methodology/Principal Findings A new orchid possesses a labellum with a large Y-shaped callus and two sacs at the base, and cylindrical, fleshy seeds, which make it distinct from all known orchid genera. Phylogenetic methods were applied to a matrix of morphological and molecular characters based on the fragments of the nuclear internal transcribed spacer, chloroplast matK, and rbcL genes of Orchidaceae (74 genera) and Calypsoeae (13 genera). The strict consensus Bayesian inference phylogram strongly supports the division of the Calypsoeae alliance (not including Dactylostalix and Ephippianthus) into seven clades with 11 genera. The sequence data of each species and the morphological characters of each genus were combined into a single dataset. The inferred Bayesian phylogram supports the division of the 13 genera of Calypsoeae into four clades with 13 subclades (genera). Based on the results of our phylogenetic analyses, Calypsoeae, under which the new orchid is classified, represents an independent lineage in the Epidendroideae subfamily. Conclusions Analyses of the combined datasets using Bayesian methods revealed strong evidence that Calypsoeae is a monophyletic tribe consisting of eight well-supported clades with 13 subclades (genera), which are all in agreement with the phytogeography of Calypsoeae. The Danxia orchid represents an independent lineage under the tribe Calypsoeae of the subfamily Epidendroideae. This lineage should be treated as a new genus, which we have named Danxiaorchis, that is
Bayesian Estimation and Testing in Random Effects Meta-analysis of Rare Binary Adverse Events.
Bai, Ou; Chen, Min; Wang, Xinlei
Meta-analysis has been widely applied to rare adverse event data because it is very difficult to reliably detect the effect of a treatment on such events in an individual clinical study. However, it is known that standard meta-analysis methods are often biased, especially when the background incidence rate is very low. A recent work by Bhaumik et al. (2012) proposed new moment-based approaches under a natural random effects model, to improve estimation and testing of the treatment effect and the between-study heterogeneity parameter. It has been demonstrated that for rare binary events, their methods have superior performance to commonly-used meta-analysis methods. However, their comparison does not include any Bayesian methods, although Bayesian approaches are a natural and attractive choice under the random-effects model. In this paper, we study a Bayesian hierarchical approach to estimation and testing in meta-analysis of rare binary events using the random effects model in Bhaumik et al. (2012). We develop Bayesian estimators of the treatment effect and the heterogeneity parameter, as well as hypothesis testing methods based on Bayesian model selection procedures. We compare them with the existing methods through simulation. A data example is provided to illustrate the Bayesian approach as well.
Dembo, Mana; Matzke, Nicholas J.; Mooers, Arne Ø.; Collard, Mark
2015-01-01
The phylogenetic relationships of several hominin species remain controversial. Two methodological issues contribute to the uncertainty—use of partial, inconsistent datasets and reliance on phylogenetic methods that are ill-suited to testing competing hypotheses. Here, we report a study designed to overcome these issues. We first compiled a supermatrix of craniodental characters for all widely accepted hominin species. We then took advantage of recently developed Bayesian methods for building trees of serially sampled tips to test among hypotheses that have been put forward in three of the most important current debates in hominin phylogenetics—the relationship between Australopithecus sediba and Homo, the taxonomic status of the Dmanisi hominins, and the place of the so-called hobbit fossils from Flores, Indonesia, in the hominin tree. Based on our results, several published hypotheses can be statistically rejected. For example, the data do not support the claim that Dmanisi hominins and all other early Homo specimens represent a single species, nor that the hobbit fossils are the remains of small-bodied modern humans, one of whom had Down syndrome. More broadly, our study provides a new baseline dataset for future work on hominin phylogeny and illustrates the promise of Bayesian approaches for understanding hominin phylogenetic relationships. PMID:26202999
Wu, Yu-Peng; Zhao, Jin-Liang; Su, Tian-Juan; Luo, A-Rong; Zhu, Chao-Dong
2016-10-10
To better understand the diversity and phylogeny of Lepidoptera, the complete mitochondrial genome of Choristoneura longicellana (=Hoshinoa longicellana) was determined. It is a typical circular duplex molecule with 15,759bp in length, containing the standard metazoan set of 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and an A+T-rich region. All of the inferred tRNA secondary structures show the common cloverleaf pattern, with the exception of trnS1(AGN), which lacks the DHU arm. The rrnL of C. Longicellana is the longest in sequenced lepidopterans. C. Longicellana has the same gene order as all lepidopteran species currently available in GenBank. There are 5 overlapping regions ranging from 1bp to 8bp and 14 intergenic spacers ranging from 1bp to 48bp. In addition, there are four similar tandem macro-satellite regions with the lengths of 101bp, 98bp, 92bp, and 92bp respectively in the A+T-rich regions of C. longicellana. We sampled 89 species representing 13 superfamilies, and reconstructed their relationship among Lepidoptera by Bayesian Inference and Maximum Likelihood analysis. The topology of the two phylogenetic analysis trees is identical roughly, except for Cossoidea in different locations, the positions of Cossoidea, Copromorphoidea, Gelechioidea, Zygaenoidea were not determined based the limited sampling. (Geometroidea+(Noctuoidea+Bombycoidea)) form the Macrolepidoptera "core". Pyraloidea group with the "core" Macrolepidoptera. Papilionoidea are not Macrolepidoptera. The Hesperiidae (represent Hesperioidea) is nested in the Papilionoidea, and closely related to Pieridae and Papilionidae. The well-known relationship of (Nymphalidae+(Riodinidae+Lycaenidae)) is recovered in this paper.
Huang, Jie; Yang, Bo; Yan, Chaochao; Yang, Chengzhong; Tu, Feiyun; Zhang, Xiuyue; Yue, Bisong
2014-08-01
The mountain weasel (Mustela altaica) belongs to family Mustelidae, which is the near threatened species in the IUCN Red List. In this study, the complete mitochondrial genome of M. altaica was sequenced and characterized. The genome is 16,521 bases in length (GenBank accession no. KC815122). The nucleotide sequence data of 12 heavy-strand protein-coding genes of M. altaica and other 20 Mustelidae species were used for phylogenetic analyses. Trees constructed by using Bayesian inference, maximum parsimony and maximum likelihood demonstrated that M. altaica was close to Mustela nivalis and they were sister to Mustela putorius and Mustela sibirica.
Both Bayesian analysis assuming independence and discriminant function analysis have been used to estimate probabilities of coronary disease. To compare their relative accuracy, we submitted 303 subjects referred for coronary angiography to stress electrocardiography, thallium scintigraphy, and cine fluoroscopy. Severe angiographic disease was defined as at least one greater than 50% occlusion of a major vessel. Four calculations were done: (1) Bayesian analysis using literature estimates of pretest probabilities, sensitivities, and specificities was applied to the clinical and test data of a randomly selected subgroup (group I, 151 patients) to calculate posttest probabilities. (2) Bayesian analysis using literature estimates of pretest probabilities (but with sensitivities and specificities derived from the remaining 152 subjects [group II]) was applied to group I data to estimate posttest probabilities. (3) A discriminant function with logistic regression coefficients derived from the clinical and test variables of group II was used to calculate posttest probabilities of group I. (4) A discriminant function derived with the use of test results from group II and pretest probabilities from the literature was used to calculate posttest probabilities of group I. Receiver operating characteristic curve analysis showed that all four calculations could equivalently rank the disease probabilities for our patients. A goodness-of-fit analysis suggested the following relationship between the accuracies of the four calculations: (1) less than (2) approximately equal to (4) less than (3). Our results suggest that data-based discriminant functions are more accurate than literature-based Bayesian analysis assuming independence in predicting severe coronary disease based on clinical and noninvasive test results.
Background The phylogeny of Arthropoda is still a matter of harsh debate among systematists, and significant disagreement exists between morphological and molecular studies. In particular, while the taxon joining hexapods and crustaceans (the Pancrustacea) is now widely accepted among zoologists, the relationships among its basal lineages, and particularly the supposed reciprocal paraphyly of Crustacea and Hexapoda, continues to represent a challenge. Several genes, as well as different molecular markers, have been used to tackle this problem in molecular phylogenetic studies, with the mitochondrial DNA being one of the molecules of choice. In this study, we have assembled the largest data set available so far for Pancrustacea, consisting of 100 complete (or almost complete) sequences of mitochondrial genomes. After removal of unalignable sequence regions and highly rearranged genomes, we used nucleotide and inferred amino acid sequences of the 13 protein coding genes to reconstruct the phylogenetic relationships among major lineages of Pancrustacea. The analysis was performed with Bayesian inference, and for the amino acid sequences a new, Pancrustacea-specific, matrix of amino acid replacement was developed and used in this study. Results Two largely congruent trees were obtained from the analysis of nucleotide and amino acid datasets. In particular, the best tree obtained based on the new matrix of amino acid replacement (MtPan) was preferred over those obtained using previously available matrices (MtArt and MtRev) because of its higher likelihood score. The most remarkable result is the reciprocal paraphyly of Hexapoda and Crustacea, with some lineages of crustaceans (namely the Malacostraca, Cephalocarida and, possibly, the Branchiopoda) being more closely related to the Insecta s.s. (Ectognatha) than two orders of basal hexapods, Collembola and Diplura. Our results confirm that the mitochondrial genome, unlike analyses based on morphological data or nuclear
The dhole (Cuon alpinus) is the only existent species in the genus Cuon (Carnivora: Canidae). In the present study, the complete mitochondrial genome of the dhole was sequenced. The total length is 16672 base pairs which is the shortest in Canidae. Sequence analysis revealed that most mitochondrial genomic functional regions were highly consistent among canid animals except the CSB domain of the control region. The difference in length among the Canidae mitochondrial genome sequences is mainly due to the number of short segments of tandem repeated in the CSB domain. Phylogenetic analysis was progressed based on the concatenated data set of 14 mitochondrial genes of 8 canid animals by using maximum parsimony (MP), maximum likelihood (ML) and Bayesian (BI) inference methods. The genera Vulpes and Nyctereutes formed a sister group and split first within Canidae, followed by that in the Cuon. The divergence in the genus Canis was the latest. The divarication of domestic dogs after that of the Canis lupus laniger is completely supported by all the three topologies. Pairwise sequence divergence data of different mitochondrial genes among canid animals were also determined. Except for the synonymous substitutions in protein-coding genes, the control region exhibits the highest sequence divergences. The synonymous rates are approximately two to six times higher than those of the non-synonymous sites except for a slightly higher rate in the non-synonymous substitution between Cuon alpinus and Vulpes vulpes. 16S rRNA genes have a slightly faster sequence divergence than 12S rRNA and tRNA genes. Based on nucleotide substitutions of tRNA genes and rRNA genes, the times since divergence between dhole and other canid animals, and between domestic dogs and three subspecies of wolves were evaluated. The result indicates that Vulpes and Nyctereutes have a close phylogenetic relationship and the divergence of Nyctereutes is a little earlier. The Tibetan wolf may be an archaic
We describe an approximate method for the analysis of quantitative trait loci (QTL) based on model selection from multiple regression models with trait values regressed on marker genotypes, using a modification of the easily calculated Bayesian information criterion to estimate the posterior probability of models with various subsets of markers as variables. The BIC-delta criterion, with the parameter delta increasing the penalty for additional variables in a model, is further modified to incorporate prior information, and missing values are handled by multiple imputation. Marginal probabilities for model sizes are calculated, and the posterior probability of nonzero model size is interpreted as the posterior probability of existence of a QTL linked to one or more markers. The method is demonstrated on analysis of associations between wood density and markers on two linkage groups in Pinus radiata. Selection bias, which is the bias that results from using the same data to both select the variables in a model and estimate the coefficients, is shown to be a problem for commonly used non-Bayesian methods for QTL mapping, which do not average over alternative possible models that are consistent with the data.
Spectrum estimation is a problem common to many fields of physics, science, and engineering, and it has thus received a great deal of attention from the Bayesian data analysis community. In room acoustics, the modal or frequency response of a room is important for diagnosing and remedying acoustical defects. The physics of a sound field in a room dictates a model comprised of exponentially decaying sinusoids. Continuing in the tradition of the seminal work of Bretthorst and Jaynes, this work contributes an approach to analyzing the modal responses of rooms with a time-domain model. Room acoustic spectra are constructed of damped sinusoids, and the modelbased approach allows estimation of the number of sinusoids in the signal as well as their frequencies, amplitudes, damping constants, and phase delays. The frequency-amplitude spectrum may be most useful for characterizing a room, but in some settings the damping constants are of primary interest. This is the case for measuring the absorptive properties of materials, for example. A further challenge of the room acoustic spectrum problem is that modal density increases quadratically with frequency. At a point called the Schroeder frequency, adjacent modes overlap enough that the spectrum - particularly when estimated with the discrete Fourier transform - can be treated as a continuum. The time-domain, model-based approach can resolve overlapping modes and in some cases be used to estimate the Schroeder frequency. The proposed approach addresses the issue of filtering and preprocessing in order for the sampling to accurately identify all present room modes with their quadratically increasing density.
Pediatric Anesthesia and Neurodevelopmental Impairments: A Bayesian Meta-Analysis
Experimental evidence of anesthesia-induced neurotoxicity has caused serious concern about the long-term effect of commonly used volatile anesthetic agents on young children. Several observational studies based on existing data have been conducted to address this concern with inconsistent results. We conducted a meta-analysis to synthesize the epidemiologic evidence on the association of anesthesia/surgery with neurodevelopmental outcomes in children. Using Bayesian meta-analytic approaches, we estimated the synthesized odds ratios (OR) and 95% credible interval (CrI) as well as the predictive distribution of a future study given the synthesized evidence. Data on 7 unadjusted and 6 adjusted measures of association were abstracted from 7 studies. The synthesized OR based on the 7 unadjusted measures for the association of anesthesia/surgery with an adverse behavioral or developmental outcome was 1.9 (95% CrI 1.2, 3.0). The most likely unadjusted OR from a future study was estimated to be 2.2 (95% CrI 0.6, 6.1). The synthesized OR based on the 6 adjusted measures for the association of anesthesia/surgery with an adverse behavioral or developmental outcome was 1.4 ( 95% CrI 0.9, 2.2). The most likely adjusted OR from a future study was estimated to be 1.5 (95% Cr I 0.5, 4.0). We conclude that the existent epidemiologic evidence suggests a modestly elevated risk of adverse behavioral or developmental outcomes in children who were exposed to anesthesia/surgery during early childhood. The uncertainty with the existent epidemiologic evidence, however, is considerable, implying that the value of additional research using existent data sources to enhance the evidence base is diminishing. PMID:23076225
We use GO 13297 Cycle 21 Hubble Space Telescope (HST) observations and archival GO 10775 Cycle 14 HST ACS Treasury observations of Galactic Globular Clusters to find and characterize multiple stellar populations. Determining how globular clusters are able to create and retain enriched material to produce several generations of stars is key to understanding how these objects formed and how they have affected the structural, kinematic, and chemical evolution of the Milky Way. We employ a sophisticated Bayesian technique with an adaptive MCMC algorithm to simultaneously fit the age, distance, absorption, and metallicity for each cluster. At the same time, we also fit unique helium values to two distinct populations of the cluster and determine the relative proportions of those populations. Our unique numerical approach allows objective and precise analysis of these complicated clusters, providing posterior distribution functions for each parameter of interest. We use these results to gain a better understanding of multiple populations in these clusters and their role in the history of the Milky Way.Support for this work was provided by NASA through grant numbers HST-GO-10775 and HST-GO-13297 from the Space Telescope Science Institute, which is operated by AURA, Inc., under NASA contract NAS5-26555. This material is based upon work supported by the National Aeronautics and Space Administration under Grant NNX11AF34G issued through the Office of Space Science. This project was supported by the National Aeronautics & Space Administration through the University of Central Florida's NASA Florida Space Grant Consortium.
Motivation: Rapid advances in genotyping and genome-wide association studies have enabled the discovery of many new genotype–phenotype associations at the resolution of individual markers. However, these associations explain only a small proportion of theoretically estimated heritability of most diseases. In this work, we propose an integrative mixture model called JBASE: joint Bayesian analysis of subphenotypes and epistasis. JBASE explores two major reasons of missing heritability: interactions between genetic variants, a phenomenon known as epistasis and phenotypic heterogeneity, addressed via subphenotyping. Results: Our extensive simulations in a wide range of scenarios repeatedly demonstrate that JBASE can identify true underlying subphenotypes, including their associated variants and their interactions, with high precision. In the presence of phenotypic heterogeneity, JBASE has higher Power and lower Type 1 Error than five state-of-the-art approaches. We applied our method to a sample of individuals from Mexico with Type 2 diabetes and discovered two novel epistatic modules, including two loci each, that define two subphenotypes characterized by differences in body mass index and waist-to-hip ratio. We successfully replicated these subphenotypes and epistatic modules in an independent dataset from Mexico genotyped with a different platform. Availability and implementation: JBASE is implemented in C++, supported on Linux and is available at http://www.cs.toronto.edu/∼goldenberg/JBASE/jbase.tar.gz. The genotype data underlying this study are available upon approval by the ethics review board of the Medical Centre Siglo XXI. Please contact Dr Miguel Cruz at mcruzl@yahoo.com for assistance with the application. Contact: anna.goldenberg@utoronto.ca Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26411870
The Black Stork, Ciconia nigra belongs to family Ciconiidae, which is evaluated as Least Concern by IUCN. In this study, the complete mitochondrial genome of C. nigra was first sequenced and characterized, which was 17,795 bp in length. The mt-genome has tandem repeats of 80 bp and 78 bp repeat units, and AAACAAC and AAACAAACAAC tandem repeats in D-loop region. It is notable that a single extra base "C" at position 174 was inserted in gene ND3. Bayesian inference, maximum likelihood methods were used to construct phylogenetic trees based on 12 heavy-strand protein-coding genes. Phylogenetic analyses showed that Ardeidae diverged earlier than Ciconiidae, Cathartida and Threskiornithidae, and Ciconiidae had closest relationship to Cathartida. C. nigra diverged first among three Ciconia birds.
We illustrate how the Bayesian approach can be used to provide a simple but powerful way to analyze data from solar neutrino experiments. The data are analyzed assuming that the neutrinos are unaltered during their passage from the Sun to the Earth. We derive quantitative and easily understood information pertaining to the solar neutrino problem.
PhyloOncology: Understanding cancer through phylogenetic analysis.
Despite decades of research and an enormity of resultant data, cancer remains a significant public health problem. New tools and fresh perspectives are needed to obtain fundamental insights, to develop better prognostic and predictive tools, and to identify improved therapeutic interventions. With increasingly common genome-scale data, one suite of algorithms and concepts with potential to shed light on cancer biology is phylogenetics, a scientific discipline used in diverse fields. From grouping subsets of cancer samples to tracing subclonal evolution during cancer progression and metastasis, the use of phylogenetics is a powerful systems biology approach. Well-developed phylogenetic applications provide fast, robust approaches to analyze high-dimensional, heterogeneous cancer data sets. This article is part of a Special Issue entitled: Evolutionary principles - heterogeneity in cancer?, edited by Dr. Robert A. Gatenby. Copyright © 2016 Elsevier B.V. All rights reserved.
Phylogenetic Analysis of Genome Rearrangements among Five Mammalian Orders
Evolutionary relationships among placental mammalian orders have been controversial. Whole genome sequencing and new computational methods offer opportunities to resolve the relationships among 10 genomes belonging to the mammalian orders Primates, Rodentia, Carnivora, Perissodactyla and Artiodactyla. By application of the double cut and join distance metric, where gene order is the phylogenetic character, we computed genomic distances among the sampled mammalian genomes. With a marsupial outgroup, the gene order tree supported a topology in which Rodentia fell outside the cluster of Primates, Carnivora, Perissodactyla, and Artiodactyla. Results of breakpoint reuse rate and synteny block length analyses were consistent with the prediction of random breakage model, which provided a diagnostic test to support use of gene order as an appropriate phylogenetic character in this study. We the influence of rate differences among lineages and other factors that may contribute to different resolutions of mammalian ordinal relationships by different methods of phylogenetic reconstruction. PMID:22929217
If damped Lyman alpha systems (DLAs) contain even modest amounts of dust, the ultraviolet luminosity of the background quasar can be severely diminished. When the spectrum is redshifted, this leads to a bias in optical surveys for DLAs. Previous estimates of the magnitude of this effect are in some tension; in particular, the distribution of DLAs in the (NHI, Z) (i.e. column density-metallicity) plane has led to claims that we may be missing a considerable fraction of metal-rich, high column density DLAs, whereas radio surveys do not unveil a substantial population of otherwise hidden systems. Motivated by this tension, we perform a Bayesian parameter estimation analysis of a simple dust obscuration model. We include radio and optical observations of DLAs in our overall likelihood analysis and show that these do not, in fact, constitute conflicting constraints. Our model gives statistical limits on the biasing effects of dust, predicting that only 7 per cent of DLAs are missing from optical samples due to dust obscuration; at 2σ confidence, this figure takes a maximum value of 17 per cent. This contrasts with recent claims that DLA incidence rates are underestimated by 30-50 per cent. Optical measures of the mean metallicities of DLAs are found to underestimate the true value by just 0.1dex (or at most 0.4dex,2σ confidence limit), in agreement with the radio survey results of Akerman et al. As an independent test, we use our model to make a rough prediction for dust reddening of the background quasar. We find a mean reddening in the DLA rest frame of log10
The evolutionary origins of extraintestinal pathogenic Escherichia coli (ExPEC) remain uncertain despite these organisms' relevance to human disease. A valid understanding of ExPEC phylogeny is needed as a framework against which the observed distribution of virulence factors and clinical associations can be analyzed. Accordingly, phylogenetic relationships were defined by multi-locus sequence analysis among 44 representatives of selected ExPEC clonal groups and the E. coli Reference (ECOR) collection. Recombination, which significantly obscured the phylogenetic signal for several strains, was dealt with by excluding strains or specific sequences. Conflicting overall phylogenies, and internal phylogenies for virulence-associated phylogenetic group B2, were inferred depending on the specific dataset (i.e., how extensively purged of recombination), outgroup (Salmonella enterica and/or Escherichia fergusonii), and analysis method (neighbor joining, maximum parsimony, maximum likelihood, or Bayesian likelihood). Nonetheless, the major E. coli phylogenetic groups A, B1, and B2 were consistently well resolved, as was a major sub-component of group D and an ECOR 37-O157:H7 clade. Moreover, nine important ExPEC clonal groups within groups B2 and D, characterized by serotypes O6:K2:H1, O18:K1:H7, O6:H31, and O4:K+:H+ (from group B2), and O1:K1:H-, O7:K1:H-, O157:K+:H (non-7), O15:K52:H1, and O11/17/77:K52:H18 ("clonal group A") (from group D), were consistently well resolved, regardless of clinical background (cystitis, pyelonephritis, neonatal meningitis, sepsis, or fecal), host group, geographical origin, and virulence profile. Among the group B2-derived clonal groups the O6:K2:H1 clade appeared basal. Within group D, "clonal group A" and the O15:K52:H1 clonal group were consistently placed with ECOR 47 and ECOR 44, respectively, as nearest neighbors. These findings clarify phylogenetic relationships among key ExPEC clonal groups but also emphasize that recombination
A Bayesian approach to meta-analysis of plant pathology studies.
Bayesian statistical methods are used for meta-analysis in many disciplines, including medicine, molecular biology, and engineering, but have not yet been applied for quantitative synthesis of plant pathology studies. In this paper, we illustrate the key concepts of Bayesian statistics and outline the differences between Bayesian and classical (frequentist) methods in the way parameters describing population attributes are considered. We then describe a Bayesian approach to meta-analysis and present a plant pathological example based on studies evaluating the efficacy of plant protection products that induce systemic acquired resistance for the management of fire blight of apple. In a simple random-effects model assuming a normal distribution of effect sizes and no prior information (i.e., a noninformative prior), the results of the Bayesian meta-analysis are similar to those obtained with classical methods. Implementing the same model with a Student's t distribution and a noninformative prior for the effect sizes, instead of a normal distribution, yields similar results for all but acibenzolar-S-methyl (Actigard) which was evaluated only in seven studies in this example. Whereas both the classical (P = 0.28) and the Bayesian analysis with a noninformative prior (95% credibility interval [CRI] for the log response ratio: -0.63 to 0.08) indicate a nonsignificant effect for Actigard, specifying a t distribution resulted in a significant, albeit variable, effect for this product (CRI: -0.73 to -0.10). These results confirm the sensitivity of the analytical outcome (i.e., the posterior distribution) to the choice of prior in Bayesian meta-analyses involving a limited number of studies. We review some pertinent literature on more advanced topics, including modeling of among-study heterogeneity, publication bias, analyses involving a limited number of studies, and methods for dealing with missing data, and show how these issues can be approached in a Bayesian framework
Background Chemosensory receptors, which are all G-protein-coupled receptors (GPCRs), come in four types: odorant receptors (ORs), vomeronasal receptors, trace-amine associated receptors and formyl peptide receptor-like proteins. The ORs are the most important receptors for detecting a wide range of environmental chemicals in daily life. Most fish OR genes have been identified from genome databases following the completion of the genome sequencing projects of many fishes. However, it remains unclear whether these OR genes from the genome databases are actually expressed in the fish olfactory epithelium. Thus, it is necessary to clone the OR mRNAs directly from the olfactory epithelium and to examine their expression status. Results Eighty-nine full-length and 22 partial OR cDNA sequences were isolated from the olfactory epithelium of the large yellow croaker, Larimichthys crocea. Bayesian phylogenetic analysis classified the vertebrate OR genes into two types, with several clades within each type, and showed that the L. crocea OR genes of each type are more closely related to those of fugu, pufferfish and stickleback than they are to those of medaka, zebrafish and frog. The reconciled tree showed 178 duplications and 129 losses. The evolutionary relationships among OR genes in these fishes accords with their evolutionary history. The fish OR genes have experienced functional divergence, and the different clades of OR genes have evolved different functions. The result of real-time PCR shows that different clades of ORs have distinct expression levels. Conclusion We have shown about 100 OR genes to be expressed in the olfactory epithelial tissues of L. crocea. The OR genes of modern fishes duplicated from their common ancestor, and were expanded over evolutionary time. The OR genes of L. crocea are closely related to those of fugu, pufferfish and stickleback, which is consistent with its evolutionary position. The different expression levels of OR genes of large
Phylogenetic analysis of β-defensin-like genes of Bothrops, Crotalus and Lachesis snakes.
Defensins are components of the vertebrate innate immune system; they comprise a diverse group of small cationic antimicrobial peptides. Among them, β-defensins have a characteristic β-sheet-rich fold plus six conserved cysteines with particular spacing and intramolecular bonds. They have been fully studied in mammals, but there is little information about them in snakes. Using a PCR approach, we described 13 β-defensin-like sequences in Bothrops and Lachesis snakes. The genes are organized in three exons and two introns, with exception of B.atrox_defensinB_01 which has only two exons. They show high similarities in exon 1, intron 1 and intron 2, but exons 2 and 3 have undergone accelerated evolution. The theoretical translated sequences encode a pre-β-defensin-like molecule with a conserved signal peptide and a mature peptide. The signal peptides are leucine-rich and the mature β-defensin-like molecules have a size around 4.5 kDa, a net charge from +2 to +11, and the conserved cysteine motif. Phylogenetic analysis was done using maximum parsimony, maximum likelihood and Bayesian analyses, and all resulted in similar topologies with slight differences. The genus Bothrops displayed two separate lineages. The reconciliation of gene trees and species tree indicated eight to nine duplications and 23 to 29 extinctions depending on the gene tree used. Our results together with previously published data indicate that the ancestral β-defensin-like gene may have three exons in vertebrates and that their evolution occurred according to a birth-and-death model.
Phylogenetic relationships within the bryozoan order Cheilostomata are currently uncertain, with many morphological hypotheses proposed but scarcely tested by independent means of molecular analysis. This research uses DNA sequence data across five loci of both mitochondrial and nuclear origin from 91 species of cheilostome Bryozoa (34 species newly sequenced). This vastly improved the taxonomic coverage and number of loci used in a molecular analysis of this order and allowed a more in-depth look into the evolutionary history of Cheilostomata. Maximum likelihood and Bayesian analyses of individual loci were carried out along with a partitioned multi-locus approach, plus a range of topology tests based on morphological hypotheses. Together, these provide a comprehensive set of phylogenetic analyses of the order Cheilostomata. From these results inferences are made about the evolutionary history of this order and proposed morphological hypotheses are discussed in light of the independent evidence gained from the molecular data. Infraorder Ascophorina was demonstrated to be non-monophyletic, and there appears to be multiple origins of the ascus and associated structures involved in lophophore extension. This was further supported by the lack of monophyly within each of the four ascophoran grades (acanthostegomorph/spinocystal, hippothoomorph/gymnocystal, umbonulomorph/umbonuloid, lepraliomorph/lepralioid) defined by frontal-shield morphology. Chorizopora, currently classified in the ascophoran grade Hippothoomorpha, is phylogenetically distinct from Hippothoidae, providing strong evidence for multiple origins of the gymnocystal frontal shield type. Further evidence is produced to support the morphological hypothesis of multiple umbonuloid origins of lepralioid frontal shields, using a step-wise set of topological hypothesis tests combined with examination of multi-locus phylogenies.
The hoverflies Episyrphus balteatus and Eupeodes corollae (Diptera: Muscomorpha: Syrphidae) are important natural aphid predators. We obtained mitochondrial genome sequences from these two species using methods of PCR amplification and sequencing. The complete Episyrphus mitochondrial genome is 16,175 bp long while the incomplete one of Eupeodes is 15,326 bp long. All 37 typical mitochondrial genes are present in both species and arranged in ancestral positions and directions. The two mitochondrial genomes showed a biased A/T usage versus G/C. The cox1, cox2, cox3, cob and nad1 showed relatively low level of nucleotide diversity among protein-coding genes, while the trnM was the most conserved one without any nucleotide variation in stem regions within Muscomorpha. Phylogenetic relationships among the major lineages of Muscomorpha were reconstructed using a complete set of mitochondrial genes. Bayesian and maximum likelihood analyses generated congruent topologies. Our results supported the monophyly of five species within the Syrphidae (Syrphoidea). The Platypezoidea was sister to all other species of Muscomorpha in our phylogeny. Our study demonstrated the power of the complete mitochondrial gene set for phylogenetic analysis in Muscomorpha.
The hoverflies Episyrphus balteatus and Eupeodes corollae (Diptera: Muscomorpha: Syrphidae) are important natural aphid predators. We obtained mitochondrial genome sequences from these two species using methods of PCR amplification and sequencing. The complete Episyrphus mitochondrial genome is 16,175 bp long while the incomplete one of Eupeodes is 15,326 bp long. All 37 typical mitochondrial genes are present in both species and arranged in ancestral positions and directions. The two mitochondrial genomes showed a biased A/T usage versus G/C. The cox1, cox2, cox3, cob and nad1 showed relatively low level of nucleotide diversity among protein-coding genes, while the trnM was the most conserved one without any nucleotide variation in stem regions within Muscomorpha. Phylogenetic relationships among the major lineages of Muscomorpha were reconstructed using a complete set of mitochondrial genes. Bayesian and maximum likelihood analyses generated congruent topologies. Our results supported the monophyly of five species within the Syrphidae (Syrphoidea). The Platypezoidea was sister to all other species of Muscomorpha in our phylogeny. Our study demonstrated the power of the complete mitochondrial gene set for phylogenetic analysis in Muscomorpha. PMID:28276531
Phylogenetic Analysis of the Bifidobacterium Genus Using Glycolysis Enzyme Sequences
Bifidobacteria are important members of the human gastrointestinal tract that promote the establishment of a healthy microbial consortium in the gut of infants. Recent studies have established that the Bifidobacterium genus is a polymorphic phylogenetic clade, which encompasses a diversity of species and subspecies that encode a broad range of proteins implicated in complex and non-digestible carbohydrate uptake and catabolism, ranging from human breast milk oligosaccharides, to plant fibers. Recent genomic studies have created a need to properly place Bifidobacterium species in a phylogenetic tree. Current approaches, based on core-genome analyses come at the cost of intensive sequencing and demanding analytical processes. Here, we propose a typing method based on sequences of glycolysis genes and the proteins they encode, to provide insights into diversity, typing, and phylogeny in this complex and broad genus. We show that glycolysis genes occur broadly in these genomes, to encode the machinery necessary for the biochemical spine of the cell, and provide a robust phylogenetic marker. Furthermore, glycolytic sequences-based trees are congruent with both the classical 16S rRNA phylogeny, and core genome-based strain clustering. Furthermore, these glycolysis markers can also be used to provide insights into the adaptive evolution of this genus, especially with regards to trends toward a high GC content. This streamlined method may open new avenues for phylogenetic studies on a broad scale, given the widespread occurrence of the glycolysis pathway in bacteria, and the diversity of the sequences they encode. PMID:27242688
The objectives of this study were to determine the antigenic relationship among ruminant adenoviruses and determine their phylogenetic relationship based on the deduced hexon gene amino acid sequence.
Phylogenetic relationships within the Actinidia were investigated using randomly amplified polymorphic DNA (RAPD) markers.
This article illustrates a simplified time series analysis for use by the counseling researcher practitioner in single-case baseline plus intervention studies with a Bayesian probability analysis to integrate findings from replications. The C statistic is recommended as a primary analysis tool with particular relevance in the context of actual…
Yang, Jingjing; Cox, Dennis D; Lee, Jong Soo; Ren, Peng; Choi, Taeryon
Functional data are defined as realizations of random functions (mostly smooth functions) varying over a continuum, which are usually collected on discretized grids with measurement errors. In order to accurately smooth noisy functional observations and deal with the issue of high-dimensional observation grids, we propose a novel Bayesian method based on the Bayesian hierarchical model with a Gaussian-Wishart process prior and basis function representations. We first derive an induced model for the basis-function coefficients of the functional data, and then use this model to conduct posterior inference through Markov chain Monte Carlo methods. Compared to the standard Bayesian inference that suffers serious computational burden and instability in analyzing high-dimensional functional data, our method greatly improves the computational scalability and stability, while inheriting the advantage of simultaneously smoothing raw observations and estimating the mean-covariance functions in a nonparametric way. In addition, our method can naturally handle functional data observed on random or uncommon grids. Simulation and real studies demonstrate that our method produces similar results to those obtainable by the standard Bayesian inference with low-dimensional common grids, while efficiently smoothing and estimating functional data with random and high-dimensional observation grids when the standard Bayesian inference fails. In conclusion, our method can efficiently smooth and estimate high-dimensional functional data, providing one way to resolve the curse of dimensionality for Bayesian functional data analysis with Gaussian-Wishart processes.
A Bayesian statistical framework is presented for Zimmerman and Weissenburger flutter margin method which considers the uncertainties in aeroelastic modal parameters. The proposed methodology overcomes the limitations of the previously developed least-square based estimation technique which relies on the Gaussian approximation of the flutter margin probability density function (pdf). Using the measured free-decay responses at subcritical (preflutter) airspeeds, the joint non-Gaussain posterior pdf of the modal parameters is sampled using the Metropolis-Hastings (MH) Markov chain Monte Carlo (MCMC) algorithm. The posterior MCMC samples of the modal parameters are then used to obtain the flutter margin pdfs and finally the flutter speed pdf. The usefulness of the Bayesian flutter margin method is demonstrated using synthetic data generated from a two-degree-of-freedom pitch-plunge aeroelastic model. The robustness of the statistical framework is demonstrated using different sets of measurement data. It will be shown that the probabilistic (Bayesian) approach reduces the number of test points required in providing a flutter speed estimate for a given accuracy and precision.
Ultrametric networks: a new tool for phylogenetic analysis
2013-01-01
In this article, the authors demonstrate a time-series analysis based on a hierarchical Bayesian model of a Poisson outcome with an excessive number of zeroes. The motivating example for this analysis comes from the intensive care unit (ICU) of an urban university teaching hospital (New Haven, Connecticut, 2002-2004). Studies of medication use among older patients in the ICU are complicated by statistical factors such as an excessive number of zero doses, periodicity, and within-person autocorrelation. Whereas time-series techniques adjust for autocorrelation and periodicity in outcome measurements, Bayesian analysis provides greater precision for small samples and the flexibility to conduct posterior predictive simulations. By applying elements of time-series analysis within both frequentist and Bayesian frameworks, the authors evaluate differences in shift-based dosing of medication in a medical ICU. From a small sample and with adjustment for excess zeroes, linear trend, autocorrelation, and clinical covariates, both frequentist and Bayesian models provide evidence of a significant association between a specific nursing shift and dosing level of a sedative medication. Furthermore, the posterior distributions from a Bayesian random-effects Poisson model permit posterior predictive simulations of related results that are potentially difficult to model.
Functional MRI (fMRI) used for neurosurgical planning delineates functionally eloquent brain areas by time-series analysis of task-induced BOLD signal changes. Commonly used frequentist statistics protect against false positive results based on a p-value threshold. In surgical planning, false negative results are equally if not more harmful, potentially masking true brain activity leading to erroneous resection of eloquent regions. Bayesian statistics provides an alternative framework, categorizing areas as activated, deactivated, non-activated or with low statistical confidence. This approach has not yet found wide clinical application partly due to the lack of a method to objectively define an effect size threshold. We implemented a Bayesian analysis framework for neurosurgical planning fMRI. It entails an automated effect-size threshold selection method for posterior probability maps accounting for inter-individual BOLD response differences, which was calibrated based on the frequentist results maps thresholded by two clinical experts. We compared Bayesian and frequentist analysis of passive-motor fMRI data from 10 healthy volunteers measured on a pre-operative 3T and an intra-operative 1.5T MRI scanner. As a clinical case study, we tested passive motor task activation in a brain tumor patient at 3T under clinical conditions. With our novel effect size threshold method, the Bayesian analysis revealed regions of all four categories in the 3T data. Activated region foci and extent were consistent with the frequentist analysis results. In the lower signal-to-noise ratio 1.5T intra-operative scanner data, Bayesian analysis provided improved brain-activation detection sensitivity compared with the frequentist analysis, albeit the spatial extents of the activations were smaller than at 3T. Bayesian analysis of fMRI data using operator-independent effect size threshold selection may improve the sensitivity and certainty of information available to guide neurosurgery.
We examined a partial SSU-rDNA sequence from 20 Acanthamoeba isolates associated with keratitis infections. The phylogenetic tree inferred from this partial sequence allowed to assign isolates to genotypes. Among the 20 isolates examined, 16 were found to be of the T4 genotype, 2 were T3, 1 was a T5, and 1 was a T2, confirming the predominance of T4 in infections. However, the study highlighted other genotypes more rarely associated with infections, particularly the T2 genotype. Our study is the second one to detect that this genotype is associated with keratitis. Additionally, the phylogenetic analyses showed five main emerging clusters, T4/T3/T11, T2/T6, T10/T12/T14, T13/T16, and T7/T8/T9/T17, regularly obtained whichever method was used. A similar branching pattern was found when the full rDNA sequence was investigated.
In recent years, the advent of Markov chain Monte Carlo (MCMC) techniques, coupled with modern computational capabilities, has enabled the study of evolutionary models without a closed form solution of the likelihood function. However, current Bayesian MCMC applications can incur significant computational costs, as they are based on a full sampling from the posterior probability distribution of the parameters of interest. Here, we draw attention as to how MCMC techniques can be embedded within normal approximation strategies for more economical statistical computation. The overall procedure is based on an estimate of the first and second moments of the likelihood function, as well as a maximum likelihood estimate. Through examples, we review several MCMC-based methods used in the statistical literature for such estimation, applying the approaches to constructing posterior distributions under non-analytical evolutionary models relaxing the assumptions of rate homogeneity, and of independence between sites. Finally, we use the procedures for conducting Bayesian model selection, based on Laplace approximations of Bayes factors, which we find to be accurate and computationally advantageous. Altogether, the methods we expound here, as well as other related approaches from the statistical literature, should prove useful when investigating increasingly complex descriptions of molecular evolution, alleviating some of the difficulties associated with nonanalytical models.
We compiled published values of mammalian maximum oxygen consumption during exercise ( ) and supplemented these data with new measurements of for the largest rodent (capybara), 20 species of smaller-bodied rodents, two species of weasels and one small marsupial. Many of the new data were obtained with running-wheel respirometers instead of the treadmill systems used in most previous measurements of mammalian . We used both conventional and phylogenetically informed allometric regression models to analyze of 77 'species' (including subspecies or separate populations within species) in relation to body size, phylogeny, diet and measurement method. Both body mass and allometrically mass-corrected showed highly significant phylogenetic signals (i.e. related species tended to resemble each other). The Akaike information criterion corrected for sample size was used to compare 27 candidate models predicting (all of which included body mass). In addition to mass, the two best-fitting models (cumulative Akaike weight=0.93) included dummy variables coding for three species previously shown to have high (pronghorn, horse and a bat), and incorporated a transformation of the phylogenetic branch lengths under an Ornstein-Uhlenbeck model of residual variation (thus indicating phylogenetic signal in the residuals). We found no statistical difference between wheel- and treadmill-elicited values, and diet had no predictive ability for . Averaged across all models, the allometric scaling exponent was 0.839, with 95% confidence limits of 0.795 and 0.883, which does not provide support for a scaling exponent of 0.67, 0.75 or unity.
Many problems in comparative biology are, or are thought to be, best expressed as phylogenetic "networks" as opposed to trees. In trees, vertices may have only a single parent (ancestor), while networks allow for multiple parent vertices. There are two main interpretive types of networks, "softwired" and "hardwired." The parsimony cost of hardwired networks is based on all changes over all edges, hence must be greater than or equal to the best tree cost contained ("displayed") by the network. This is in contrast to softwired, where each character follows the lowest parsimony cost tree displayed by the network, resulting in costs which are less than or equal to the best display tree. Neither situation is ideal since hard-wired networks are not generally biologically attractive (since individual heritable characters can have more than one parent) and softwired networks can be trivially optimized (containing the best tree for each character). Furthermore, given the alternate cost scenarios of trees and these two flavors of networks, hypothesis testing among these explanatory scenarios is impossible. A network cost adjustment (penalty) is proposed to allow phylogenetic trees and soft-wired phylogenetic networks to compete equally on a parsimony optimality basis. This cost is demonstrated for several real and simulated datasets. In each case, the favored graph representation (tree or network) matched expectation or simulation scenario. The softwired network cost regime proposed here presents a quantitative criterion for an optimality-based search procedure where trees and networks can participate in hypothesis testing simultaneously.
Anopheles darlingi Root, 1926 and Anopheles gambiae (Diptera: Culicidae) are the most important human malaria vectors in South America and Africa, respectively. The two species are estimated to have diverged 100 million years ago. Studies on the phylogenetics and evolution of gene sequences, such as glutathione S-transferase (GST) in disease-transmitting mosquitoes are scarce. The sigma class GST (KC890767) from the transcriptome of An. darlingi captured in the Brazilian Amazon was studied by in silico hybridization, and mapped to chromosome 3 of An. gambiae. The sigma class GST of An. darlingi was used for phylogenetic analyses to understand the GST base composition of the most recent common ancestor between An. darlingi, Anopheles gambiae, Aedes aegypti and Culex quinquefasciatus. The GST (KC890767) of An. darlingi was studied to generate the main divergence branches using a Neighbor-Joining and bootstrapping approaches to confirm confidence levels on the tree nodes that separate the An. darlingi and other mosquito species. The results showed divergence between An. gambiae, Ae. Aegypti, Cx. quinquefasciatus, and Phlebotomus papatasi as outgroup, and the homology relationship between sigma class GST of An. darlingi and GSTS1_1 gene of An. gambiae was valuable for phylogenetic and evolutionary studies. Copyright © 2014 Elsevier B.V. All rights reserved.
Elymus L. is often planted in temperate and subtropical regions as forage. Species in the genus have 5 allopolyploid genomes that are found in the grass tribe Triticeae. To determine the phylogenetic relationships in Elymus species from western China, we estimated phylogenetic trees using sequences from the nuclear ribosomal internal transcribed spacer and non-coding chloroplast DNA sequences from 56 accessions (871 samples) of 9 polyploid Elymus species and 42 accessions from GenBank. Tetraploid and hexaploid Elymus species from western China had independent origins, and Elymus species from the same area or neighboring geographic regions were the most closely related. Based on the phylogenetic tree topology, the St- and Y-genomes were not derived from the same donor and Y-genome likely originated from the H-genome of Hordeum species, or they shared the same origin or underwent introgression. The maternal genome of tetraploid and hexaploid Elymus species originated from species of Hordeum or Pseudoroegneria. Additionally, Elymus species in western China began diverging 17-8.5 million years ago, during a period of increased aridification as a consequence of the Messinian salinity crisis. Elymus species adapted to drought and high salinity may have developed based on the environmental conditions during this period. Elymus evolution in western China may have been affected by the uplift of the Qinghai-Tibetan Plateau (5 million years ago), when Elymus seeds were dispersed by gravity or wind into a newly heterogeneous habitat, resulting in isolation.
Background and Aims Myrcia section Aulomyrcia includes ∼120 species that are endemic to the Neotropics and disjunctly distributed in the moist Amazon and Atlantic coastal forests of Brazil. This paper presents the first comprehensive phylogenetic study of this group and this phylogeny is used as a basis to evaluate recent classification systems and to test alternative hypotheses associated with the history of this clade. Methods Fifty-three taxa were sampled out of the 120 species currently recognized, plus 40 outgroup taxa, for one nuclear marker (ribosomal internal transcribed spacer) and four plastid markers (psbA-trnH, trnL-trnF, trnQ-rpS16 and ndhF). The relationships were reconstructed based on Bayesian and maximum likelihood analyses. Additionally, a likelihood approach, ‘geographic state speciation and extinction’, was used to estimate region- dependent rates of speciation, extinction and dispersal, comparing historically climatic stable areas (refugia) and unstable areas. Key Results Maximum likelihood and Bayesian inferences indicate that Myrcia and Marlierea are polyphyletic, and the internal groupings recovered are characterized by combinations of morphological characters. Phylogenetic relationships support a link between Amazonian and north-eastern species and between north-eastern and south-eastern species. Lower extinction rates within glacial refugia suggest that these areas were important in maintaining diversity in the Atlantic forest biodiversity hotspot. Conclusions This study provides a robust phylogenetic framework to address important ecological questions for Myrcia s.l. within an evolutionary context, and supports the need to unite taxonomically the two traditional genera Myrcia and Marlierea in an expanded Myrcia s.l. Furthermore, this study offers valuable insights into the diversification of plant species in the highly impacted Atlantic forest of South America; evidence is presented that the lowest extinction rates are found inside
1. A Bayesian analysis of site-occupancy data containing covariates of species occurrence and species detection probabilities is usually completed using Markov chain Monte Carlo methods in conjunction with software programs that can implement those methods for any statistical model, not just site-occupancy models. Although these software programs are quite flexible, considerable experience is often required to specify a model and to initialize the Markov chain so that summaries of the posterior distribution can be estimated efficiently and accurately. 2. As an alternative to these programs, we develop a Gibbs sampler for Bayesian analysis of site-occupancy data that include covariates of species occurrence and species detection probabilities. This Gibbs sampler is based on a class of site-occupancy models in which probabilities of species occurrence and detection are specified as probit-regression functions of site- and survey-specific covariate measurements. 3. To illustrate the Gibbs sampler, we analyse site-occupancy data of the blue hawker, Aeshna cyanea (Odonata, Aeshnidae), a common dragonfly species in Switzerland. Our analysis includes a comparison of results based on Bayesian and classical (non-Bayesian) methods of inference. We also provide code (based on the R software program) for conducting Bayesian and classical analyses of site-occupancy data.
Bayesian uncertainty analysis compared with the application of the GUM and its supplements
The Guide to the Expression of Uncertainty in Measurement (GUM) has proven to be a major step towards the harmonization of uncertainty evaluation in metrology. Its procedures contain elements from both classical and Bayesian statistics. The recent supplements 1 and 2 to the GUM appear to move the guidelines towards the Bayesian point of view, and they produce a probability distribution that shall encode one's state of knowledge about the measurand. In contrast to a Bayesian uncertainty analysis, however, Bayes' theorem is not applied explicitly. Instead, a distribution is assigned for the input quantities which is then ‘propagated’ through a model that relates the input quantities to the measurand. The resulting distribution for the measurand may coincide with a distribution obtained by the application of Bayes' theorem, but this is not true in general. The relation between a Bayesian uncertainty analysis and the application of the GUM and its supplements is investigated. In terms of a simple example, similarities and differences in the approaches are illustrated. Then a general class of models is considered and conditions are specified for which the distribution obtained by supplement 1 to the GUM is equivalent to a posterior distribution resulting from the application of Bayes' theorem. The corresponding prior distribution is identified and assessed. Finally, we briefly compare the GUM approach with a Bayesian uncertainty analysis in the context of regression problems.
Many-core algorithms for statistical phylogenetics
Motivation: Statistical phylogenetics is computationally intensive, resulting in considerable attention meted on techniques for parallelization. Codon-based models allow for independent rates of synonymous and replacement substitutions and have the potential to more adequately model the process of protein-coding sequence evolution with a resulting increase in phylogenetic accuracy. Unfortunately, due to the high number of codon states, computational burden has largely thwarted phylogenetic reconstruction under codon models, particularly at the genomic-scale. Here, we describe novel algorithms and methods for evaluating phylogenies under arbitrary molecular evolutionary models on graphics processing units (GPUs), making use of the large number of processing cores to efficiently parallelize calculations even for large state-size models. Results: We implement the approach in an existing Bayesian framework and apply the algorithms to estimating the phylogeny of 62 complete mitochondrial genomes of carnivores under a 60-state codon model. We see a near 90-fold speed increase over an optimized CPU-based computation and a >140-fold increase over the currently available implementation, making this the first practical use of codon models for phylogenetic inference over whole mitochondrial or microorganism genomes. Availability and implementation: Source code provided in BEAGLE: Broad-platform Evolutionary Analysis General Likelihood Evaluator, a cross-platform/processor library for phylogenetic likelihood computation (http://beagle-lib.googlecode.com/). We employ a BEAGLE-implementation using the Bayesian phylogenetics framework BEAST (http://beast.bio.ed.ac.uk/). Contact: msuchard@ucla.edu; a.rambaut@ed.ac.uk PMID:19369496
In this article, we present a Bayesian spatial factor analysis model. We extend previous work on confirmatory factor analysis by including geographically distributed latent variables and accounting for heterogeneity and spatial autocorrelation. The simulation study shows excellent recovery of the model parameters and demonstrates the consequences…
In this article, we present a Bayesian spatial factor analysis model. We extend previous work on confirmatory factor analysis by including geographically distributed latent variables and accounting for heterogeneity and spatial autocorrelation. The simulation study shows excellent recovery of the model parameters and demonstrates the consequences…
In traditional factor analysis, the variance-covariance matrix or the correlation matrix has often been a form of inputting data. In contrast, in Bayesian factor analysis, the entire data set is typically required to compute the posterior estimates, such as Bayes factor loadings and Bayes unique variances. We propose a simple method for computing…
Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample.
The extraction of any physical information from data has been generally made by fitting the data through a χ^2 minimization procedure. However, as pointed out by the pioneer work of Sivia D. S. et al. another way to analyze the data is possible using a probabilistic approach based on Bayes theorem. Expressed in a practical way, the main difference between the classical (χ^2 minimization) and the Bayesian approach is the way of expressing the final results of the fitting procedure: in the first case the result is expressed by values of parameters and a merit figure such as χ^2, while in the second case results are presented as probability distribution functions (PDF) of both. In the method presented here we obtain the final probability distribution functions exploring the combinations of parameters compatible with the experimental error, i.e. allowing the fitting procedure to wander in the parameter space with a probability of visiting a certain point P=exp(-χ^2/2), the so called Gibbs sampling. Among the advantages of this method, we would like to emphasize three. First of all, correlation between parameters is automatically taken into account with the Bayesian method. This implies, for example, that parameter errors are correctly calculated, correlations show up in a natural way and ill defined parameters are immediately recognized from their PDF (i.e. parameters for which data only support the calculation of lower or upper bounds). Secondly, it is possible to calculate the likelihood of a determined physical model, and therefore to select the one which best fits the data with the minimum number of parameters, in a correctly defined probabilistic way. Finally, the last but not less, in the case of a low count rate, where the known low error=√{counts} fails because Poisson distribution can no longer be approximated as a Gaussian, the Bayesian, method can also be used by simply redefining χ^2, which is not possible with the usual fitting procedure.
To analyze the dynamic structure in China's economic growth during the period 1952-1998, we introduce a model of the aggregate production function for the Chinese economy that considers total factor productivity (TFP) and output elasticities as time-varying parameters. Specifically, this paper is concerned with the relationship between the rate of economic growth in China and the trend in TFP. Here, we consider the time-varying parameters as random variables and introduce smoothness priors to construct a set of Bayesian linear models for parameter estimation. The results of the estimation are in agreement with the movements in China's social economy, thus illustrating the validity of the proposed methods.
In the Bayesian approach to effective field theory (EFT) expansions, truncation errors are derived from degree-of-belief (DOB) intervals for EFT predictions. By encoding expectations about the naturalness of EFT expansion coefficients for observables, this framework provides a statistical interpretation of the standard EFT procedure where truncation errors are estimated using the order-by-order convergence of the expansion. We extend and test previous calculations of DOB intervals for chiral EFT observables, examine correlations between contributions at different orders and energies, and explore methods to validate the statistical consistency of the EFT expansion parameter. Supported in part by the NSF and the DOE.
Maximum likelihood and Bayesian inference analyses of seven concatenated fragments of nuclear-encoded housekeeping genes indicate that Lophotrochozoa is monophyletic, i.e., the lophophorate groups Bryozoa, Brachiopoda and Phoronida are more closely related to molluscs and annelids than to Deuterostomia or Ecdysozoa. Lophophorates themselves, however, form a polyphyletic assemblage. The hypotheses that they are monophyletic and more closely allied to Deuterostomia than to Protostomia can be ruled out with both the approximately unbiased test and the expected likelihood weights test. The existence of Phoronozoa, a putative clade including Brachiopoda and Phoronida, has also been rejected. According to our analyses, phoronids instead share a more recent common ancestor with bryozoans than with brachiopods. Platyhelminthes is the sister group of Lophotrochozoa. Together these two constitute Spiralia. Although Chaetognatha appears as the sister group of Priapulida within Ecdysozoa in our analyses, alternative hypothesis concerning chaetognath relationships could not be rejected.
Representation of hydrologic analysis in climate change is a challenging task. Hydrologic outputs in regional climate models (RCMs) from general circulation models (GCMs) have difficult representation due to several uncertainties in hydrologic impacts of climate change. To overcome this problem, this research presents practical options for hydrological climate change with Bayesian and Neural networks approached to regional adaption to climate change. Bayesian and Neural networks analysis to climate hydrologic components is one of new frontier researches considering to climate change expectation. Strong advantage in Bayesian Neural networks is detecting time series in hydrologic components, which is complicated due to data, parameter, and model hypothesis on climate change scenario, through changing steps by removing and adding connections in Neural network process that combined Bayesian concept from parameter, predict and update process. As an example study, Mekong River Watershed, which is surrounded by four countries (Myanmar, Laos, Thailand and Cambodia), is selected. Results will show understanding of hydrologic components trend on climate model simulations through Bayesian Neural networks.
The aim of this study was to determine the accuracy of Bayesian networks in supporting breast cancer diagnoses. Systematic review and meta-analysis were carried out, including articles and papers published between January 1990 and March 2013. We included prospective and retrospective cross-sectional studies of the accuracy of diagnoses of breast lesions (target conditions) made using Bayesian networks (index test). Four primary studies that included 1,223 breast lesions were analyzed, 89.52% (444/496) of the breast cancer cases and 6.33% (46/727) of the benign lesions were positive based on the Bayesian network analysis. The area under the curve (AUC) for the summary receiver operating characteristic curve (SROC) was 0.97, with a Q* value of 0.92. Using Bayesian networks to diagnose malignant lesions increased the pretest probability of a true positive from 40.03% to 90.05% and decreased the probability of a false negative to 6.44%. Therefore, our results demonstrated that Bayesian networks provide an accurate and non-invasive method to support breast cancer diagnosis.
Case-crossover designs are widely used to study short-term exposure effects on the risk of acute adverse health events. While the frequentist literature on this topic is vast, there is no Bayesian work in this general area. The contribution of this paper is twofold. First, the paper establishes Bayesian equivalence results that require characterization of the set of priors under which the posterior distributions of the risk ratio parameters based on a case-crossover and time-series analysis are identical. Second, the paper studies inferential issues under case-crossover designs in a Bayesian framework. Traditionally, a conditional logistic regression is used for inference on risk-ratio parameters in case-crossover studies. We consider instead a more general full likelihood-based approach which makes less restrictive assumptions on the risk functions. Formulation of a full likelihood leads to growth in the number of parameters proportional to the sample size. We propose a semi-parametric Bayesian approach using a Dirichlet process prior to handle the random nuisance parameters that appear in a full likelihood formulation. We carry out a simulation study to compare the Bayesian methods based on full and conditional likelihood with the standard frequentist approaches for case-crossover and time-series analysis. The proposed methods are illustrated through the Detroit Asthma Morbidity, Air Quality and Traffic study, which examines the association between acute asthma risk and ambient air pollutant concentrations.
We have constructed the first ever phylogeny for the New Zealand earthworm fauna (Megascolecinae and Acanthodrilinae) including representatives from other major continental regions. Bayesian and maximum likelihood phylogenetic trees were constructed from 427 base pairs from the mitochondrial large subunit (16S) rRNA gene and 661 base pairs from the nuclear large subunit (28S) rRNA gene. Within the Acanthodrilinae we were able to identify a number of well-supported clades that were restricted to continental landmasses. Estimates of nodal support for these major clades were generally high, but relationships among clades were poorly resolved. The phylogenetic analyses revealed several independent lineages in New Zealand, some of which had a comparable phylogenetic depth to monophyletic groups sampled from Madagascar, Africa, North America and Australia. These results are consistent with at least some of these clades having inhabited New Zealand since rifting from Gondwana in the Late Cretaceous. Within the New Zealand Acanthodrilinae, major clades tended to be restricted to specific regions of New Zealand, with the central North Island and Cook Strait representing major biogeographic boundaries. Our field surveys of New Zealand and subsequent identification has also revealed extensive cryptic taxonomic diversity with approximately 48 new species sampled in addition to the 199 species recognized by previous authors. Our results indicate that further survey and taxonomic work is required to establish a foundation for future biogeographic and ecological research on this vitally important component of the New Zealand biota. Copyright © 2010 Elsevier Inc. All rights reserved.
Marker gene studies often use short amplicons spanning one or more hypervariable regions from an rRNA gene to interrogate the community structure of uncultured environmental samples. Target regions are chosen for their discriminatory power, but the limited phylogenetic signal of short high-throughput sequencing reads precludes accurate phylogenetic analysis. This is particularly unfortunate in the study of microscopic eukaryotes where horizontal gene flow is limited and the rRNA gene is expected to accurately reflect the species phylogeny. A promising alternative to full phylogenetic analysis is phylogenetic placement, where a reference phylogeny is inferred using the complete marker gene and iteratively extended with the short sequences from a metagenetic sample under study. Based on the phylogenetic placement approach we built Séance, a community analysis pipeline focused on the analysis of 18S marker gene data. Séance combines the alignment extension and phylogenetic placement capabilities of the Pagan multiple sequence alignment program with a suite of tools to preprocess, cluster and visualise datasets composed of many samples. We showcase Séance by analysing 454 data from a longitudinal study of intestinal parasite communities in wild rufous mouse lemurs (Microcebus rufus) as well as in simulation. We demonstrate both improved OTU picking at higher levels of sequence similarity for 454 data and show the accuracy of phylogenetic placement to be comparable to maximum likelihood methods for lower numbers of taxa. Séance is an open source community analysis pipeline that provides reference-based phylogenetic analysis for rRNA marker gene studies. Whilst in this article we focus on studying nematodes using the 18S marker gene, the concepts are generic and reference data for alternative marker genes can be easily created. Séance can be downloaded from http://wasabiapp.org/software/seance/ .
A culture-independent phylogenetic survey for an anaerobic trichlorobenzene-transforming microbial community was carried out. Small-subunit rRNA genes were PCR amplified from community DNA by using primers specific for Bacteria or Euryarchaeota and were subsequently cloned. Application of a new hybridization-based screening approach revealed 51 bacterial clone families, one of which was closely related to dechlorinating Dehalobacter species. Several clone sequences clustered to rDNA sequences obtained from a molecular study of an anaerobic aquifer contaminated with hydrocarbons and chlorinated solvents (Dojka et al., Appl. Env. Microbiol. 64:3869–3877, 1998). PMID:9872791
We compare the performances of well-known frequentist model fit indices (MFIs) and several Bayesian model selection criteria (MCC) as tools for cross-loading selection in factor analysis under low to moderate sample sizes, cross-loading sizes, and possible violations of distributional assumptions. The Bayesian criteria considered include the Bayes factor (BF), Bayesian Information Criterion (BIC), Deviance Information Criterion (DIC), a Bayesian leave-one-out with Pareto smoothed importance sampling (LOO-PSIS), and a Bayesian variable selection method using the spike-and-slab prior (SSP; Lu, Chow, & Loken, 2016). Simulation results indicate that of the Bayesian measures considered, the BF and the BIC showed the best balance between true positive rates and false positive rates, followed closely by the SSP. The LOO-PSIS and the DIC showed the highest true positive rates among all the measures considered, but with elevated false positive rates. In comparison, likelihood ratio tests (LRTs) are still the preferred frequentist model comparison tool, except for their higher false positive detection rates compared to the BF, BIC and SSP under violations of distributional assumptions. The root mean squared error of approximation (RMSEA) and the Tucker-Lewis index (TLI) at the conventional cut-off of approximate fit impose much more stringent "penalties" on model complexity under conditions with low cross-loading size, low sample size, and high model complexity compared with the LRTs and all other Bayesian MCC. Nevertheless, they provided a reasonable alternative to the LRTs in cases where the models cannot be readily constructed as nested within each other. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Elopomorpha is one of the three main clades of living teleost fishes and includes a range of disparate lineages including eels, tarpons, bonefishes, and halosaurs. Elopomorphs were among the first groups of fishes investigated using Hennigian phylogenetic methods and continue to be the object of intense phylogenetic scrutiny due to their economic significance, diversity, and crucial evolutionary status as the sister group of all other teleosts. While portions of the phylogenetic backbone for Elopomorpha are consistent between studies, the relationships among Albula, Pterothrissus, Notacanthiformes, and Anguilliformes remain contentious and difficult to evaluate. This lack of phylogenetic resolution is problematic as fossil lineages are often described and placed taxonomically based on an assumed sister group relationship between Albula and Pterothrissus. In addition, phylogenetic studies using morphological data that sample elopomorph fossil lineages often do not include notacanthiform or anguilliform lineages, potentially introducing a bias toward interpreting fossils as members of the common stem of Pterothrissus and Albula. Here we provide a phylogenetic analysis of DNA sequences sampled from multiple nuclear genes that include representative taxa from Albula, Pterothrissus, Notacanthiformes and Anguilliformes. We integrate our molecular dataset with a morphological character matrix that spans both living and fossil elopomorph lineages. Our results reveal substantial uncertainty in the placement of Pterothrissus as well as all sampled fossil lineages, questioning the stability of the taxonomy of fossil Elopomorpha. However, despite topological uncertainty, our integration of fossil lineages into a Bayesian time calibrated framework provides divergence time estimates for the clade that are consistent with previously published age estimates based on the elopomorph fossil record and molecular estimates resulting from traditional node-dating methods. Copyright
OBJECTIVE BAYESIAN ANALYSIS OF ''ON/OFF'' MEASUREMENTS
In high-energy astrophysics, it is common practice to account for the background overlaid with counts from the source of interest with the help of auxiliary measurements carried out by pointing off-source. In this ''on/off'' measurement, one knows the number of photons detected while pointing toward the source, the number of photons collected while pointing away from the source, and how to estimate the background counts in the source region from the flux observed in the auxiliary measurements. For very faint sources, the number of photons detected is so low that the approximations that hold asymptotically are not valid. On the other hand, an analytical solution exists for the Bayesian statistical inference, which is valid at low and high counts. Here we illustrate the objective Bayesian solution based on the reference posterior and compare the result with the approach very recently proposed by Knoetig, and discuss its most delicate points. In addition, we propose to compute the significance of the excess with respect to the background-only expectation with a method that is able to account for any uncertainty on the background and is valid for any photon count. This method is compared to the widely used significance formula by Li and Ma, which is based on asymptotic properties.
Reactive transport modeling is often used in support of bioremediation and chemical treatment planning and design. There remains a pressing need for practical and efficient models that do not require (or assume attainable) the high level of characterization needed by complex numerical models. We focus on a linear systems or transfer function approach to the problem of reactive tracer transport in a heterogeneous saprolite aquifer. Transfer functions are obtained through the Bayesian geostatistical inverse method applied to tracer injection histories and breakthrough curves. We employ nonparametric transfer functions, which require minimal assumptions about shape and structure. The resulting flexibility empowers the data to determine the nature of the transfer function with minimal prior assumptions. Nonnegativity is enforced through a reflected Brownian motion stochastic model. The inverse method enables us to quantify uncertainty and to generate conditional realizations of the transfer function. Complex information about a hydrogeologic system is distilled into a relatively simple but rigorously obtained function that describes the transport behavior of the system between two wells. The resulting transfer functions are valuable in reactive transport models based on traveltime and streamline methods. The information contained in the data, particularly in the case of strong heterogeneity, is not overextended but is fully used. This is the first application of Bayesian geostatistical inversion to transfer functions in hydrogeology but the methodology can be extended to any linear system.
In high-energy astrophysics, it is common practice to account for the background overlaid with counts from the source of interest with the help of auxiliary measurements carried out by pointing off-source. In this "on/off" measurement, one knows the number of photons detected while pointing toward the source, the number of photons collected while pointing away from the source, and how to estimate the background counts in the source region from the flux observed in the auxiliary measurements. For very faint sources, the number of photons detected is so low that the approximations that hold asymptotically are not valid. On the other hand, an analytical solution exists for the Bayesian statistical inference, which is valid at low and high counts. Here we illustrate the objective Bayesian solution based on the reference posterior and compare the result with the approach very recently proposed by Knoetig, and discuss its most delicate points. In addition, we propose to compute the significance of the excess with respect to the background-only expectation with a method that is able to account for any uncertainty on the background and is valid for any photon count. This method is compared to the widely used significance formula by Li & Ma, which is based on asymptotic properties.
In this paper we apply a Bayesian technique to determine the best fit of stellar evolution models to find the main sequence turn-off age and other cluster parameters of four intermediate-age open clusters: NGC 2360, NGC 2477, NGC 2660, and NGC 3960. Our algorithm utilizes a Markov chain Monte Carlo technique to fit these various parameters, objectively finding the best-fit isochrone for each cluster. The result is a high-precision isochrone fit. We compare these results with the those of traditional “by-eye” isochrone fitting methods. By applying this Bayesian technique to NGC 2360, NGC 2477, NGC 2660, and NGC 3960, we determine the ages of these clusters to be 1.35 ± 0.05, 1.02 ± 0.02, 1.64 ± 0.04, and 0.860 ± 0.04 Gyr, respectively. The results of this paper continue our effort to determine cluster ages to a higher precision than that offered by these traditional methods of isochrone fitting.
A new species of Pythium collected from grapevine roots (Vitis vinifera) in South Africa and roots of common beet (Beta vulgaris) in Majorca, Spain, is described. The phylogenetic position of the new species was investigated by multigene sequence analyses of the internal transcribed spacers (ITS1 and ITS2) of the rDNA region, as well as three other nuclear and three mitochondrial coding genes. Maximum likelihood phylogenetic analyses based on ITS rDNA and concatenated beta-tubulin and cytrochrome c oxidase II alignment place Pythium recalcitrans together with P. sylvaticum and P. intermedium. Pythium recalcitrans sp. nov. is morphologically almost indistinguishable from other Pythium species that only form hyphal swellings in culture. However its species status is justified by the distinctiveness of the DNA sequences in all the genes examined. In culture P. recalcitrans exhibits fast radial growth, abundant spherical to subglobose hyphal swellings but produces no zoosporangia. Sexual structures are not seen in agar media but form in autoclaved grass blades floated on water. Multiple antheridia (1-7) are encountered with most of them diclinous and crook-necked. Oospores are thin-walled and either aplerotic or plerotic. P. recalcitrans was pathogenic to seedlings of Beta vulgaris and Solanum lycopersicum.
Evolution of climatic niche specialization: a phylogenetic analysis in amphibians
Bonetti, Maria Fernanda; Wiens, John J.
2014-01-01
The evolution of climatic niche specialization has important implications for many topics in ecology, evolution and conservation. The climatic niche reflects the set of temperature and precipitation conditions where a species can occur. Thus, specialization to a limited set of climatic conditions can be important for understanding patterns of biogeography, species richness, community structure, allopatric speciation, spread of invasive species and responses to climate change. Nevertheless, the factors that determine climatic niche width (level of specialization) remain poorly explored. Here, we test whether species that occur in more extreme climates are more highly specialized for those conditions, and whether there are trade-offs between niche widths on different climatic niche axes (e.g. do species that tolerate a broad range of temperatures tolerate only a limited range of precipitation regimes?). We test these hypotheses in amphibians, using phylogenetic comparative methods and global-scale datasets, including 2712 species with both climatic and phylogenetic data. Our results do not support either hypothesis. Rather than finding narrower niches in more extreme environments, niches tend to be narrower on one end of a climatic gradient but wider on the other. We also find that temperature and precipitation niche breadths are positively related, rather than showing trade-offs. Finally, our results suggest that most amphibian species occur in relatively warm and dry environments and have relatively narrow climatic niche widths on both of these axes. Thus, they may be especially imperilled by anthropogenic climate change. PMID:25274369
Expert Prior Elicitation and Bayesian Analysis of the Mycotic Ulcer Treatment Trial I
Purpose. To perform a Bayesian analysis of the Mycotic Ulcer Treatment Trial I (MUTT I) using expert opinion as a prior belief. Methods. MUTT I was a randomized clinical trial comparing topical natamycin or voriconazole for treating filamentous fungal keratitis. A questionnaire elicited expert opinion on the best treatment of fungal keratitis before MUTT I results were available. A Bayesian analysis was performed using the questionnaire data as a prior belief and the MUTT I primary outcome (3-month visual acuity) by frequentist analysis as a likelihood. Results. Corneal experts had a 41.1% prior belief that natamycin improved 3-month visual acuity compared with voriconazole. The Bayesian analysis found a 98.4% belief for natamycin treatment compared with voriconazole treatment for filamentous cases as a group (mean improvement 1.1 Snellen lines, 95% credible interval 0.1–2.1). The Bayesian analysis estimated a smaller treatment effect than the MUTT I frequentist analysis result of 1.8-line improvement with natamycin versus voriconazole (95% confidence interval 0.5–3.0, P = 0.006). For Fusarium cases, the posterior demonstrated a 99.7% belief for natamycin treatment, whereas non-Fusarium cases had a 57.3% belief. Conclusions. The Bayesian analysis suggests that natamycin is superior to voriconazole when filamentous cases are analyzed as a group. Subgroup analysis of Fusarium cases found improvement with natamycin compared with voriconazole, whereas there was almost no difference between treatments for non-Fusarium cases. These results were consistent with, though smaller in effect size than, the MUTT I primary outcome by frequentist analysis. The accordance between analyses further validates the trial results. (ClinicalTrials.gov number, NCT00996736.) PMID:23702779
Molecular phylogenetic analyses are mainly based on the small ribosomal RNA subunit (18S rRNA), internal transcribed spacer regions, and other molecular markers. We compared the phylogenetic relationships of Babesia spp. using large subunit ribosomal RNA, i.e., 28S rRNA, and the united 28S + 18S rRNA sequence fragments from 11 isolates of Babesia spp. collected in China. Due to sequence length and variability, the 28S rRNA gene contained more information than the 18S rRNA gene and could be used to elucidate the phlyogenetic relationships of B. motasi, B. major, and B. bovis. Thus, 28S rRNA is another candidate marker that can be used for the phylogenetic analysis of Babesia spp. However, the united fragment (28S + 18S) analysis provided better supported phylogenetic relationships than single genes for Babesia spp. in China.
In this study, we describe the development of a fast and accurate molecular identification system for human-associated liver fluke species (Opisthorchis viverrini, Opisthorchis felineus, and Clonorchis sinensis) using the PCR-RFLP analysis of the 18S-ITS1-5.8S nuclear ribosomal DNA region. Based on sequence variation in the target rDNA region, we selected three species-specific restriction enzymes within the ITS1 regions, generating different restriction profiles among the species: MunI for O. viverrini, NheI for O. felineus, and XhoI for C. sinensis, respectively. Each restriction enzyme generated different-sized fragments specific to the species examined, but no intraspecific polymorphism or cross-reaction between the species was detected in their restriction pattern. These results indicate that PCR-linked restriction analysis of the ITS1 region allows for the rapid and reliable molecular identification among these opisthorchid taxa. In addition, phylogenetic analysis of rDNA sequences using different methods (MP, ML, NJ, and Bayesian inference) displayed O. viverrini and O. felineus as a sister group, but this relationship was not strongly supported. The failure of recovering a robust phylogeny may be due to the relatively small number of synapomorphic characters shared among the species, yielding weak phylogenetic signal. Alternatively, rapid speciation within a very short period time could be another explanation for the relatively poorly resolved relationships among these species. Our data are insufficient for discriminating between sudden cladogenesis and other potential causes of poor resolution. Further information from independent loci might help resolve this phylogeny.
Leishmaniasis is a worldwide epidemic disease caused by the genus Leishmania, which is still endemic in the west and northwest areas of China. Some viewpoints of the traditional taxonomy of Chinese Leishmania have been challenged by recent phylogenetic researches based on different molecular markers. However, the taxonomic positions and phylogenetic relationships of Chinese Leishmania isolates remain controversial, which need for more data and further analysis. In this study, the heat shock protein 70 (HSP70) gene and cytochrome b (cyt b) gene were used for phylogenetic analysis of Chinese Leishmania isolates from patients, dogs, gerbils, and sand flies in different geographic origins. Besides, for the interesting Leishmania sp. in China, the ultrastructure of three Chinese Leishmania sp. strains (MHOM/CN/90/SC10H2, SD, GL) were observed by transmission electron microscopy. Bayesian trees from HSP70 and cyt b congruently indicated that the 14 Chinese Leishmania isolates belong to three Leishmania species including L. donovani complex, L. gerbilli, and L. (Sauroleishmania) sp. Their identity further confirmed that the undescribed Leishmania species causing visceral Leishmaniasis (VL) in China is closely related to L. tarentolae. The phylogenetic results from HSP70 also suggested the classification of subspecies within L. donovani complex: KXG-918, KXG-927, KXG-Liu, KXG-Xu, 9044, SC6, and KXG-65 belong to L. donovani; Cy, WenChuan, and 801 were proposed to be L. infantum. Through transmission electron microscopy, unexpectedly, the Golgi apparatus were not observed in SC10H2, SD, and GL, which was similar to previous reports of reptilian Leishmania. The statistical analysis of microtubule counts separated SC10H2, SD, and GL as one group from any other reference strain (L. donovani MHOM/IN/80/DD8; L. tropica MHOM/SU/74/K27; L. gerbilli MRHO/CN/60/GERBILLI). The ultrastructural characteristics of Leishmania sp. partly lend support to the phylogenetic inference that
The phylogenetic structure and community composition were analysed in an existing data set of marine bacterioplankton communities to elucidate the evolutionary and ecological processes dictating the assembly. The communities were sampled from coastal waters at nine locations distributed worldwide and were examined through the use of comprehensive clone libraries of 16S ribosomal RNA genes. The analyses show that the local communities are phylogenetically different from each other and that a majority of them are phylogenetically clustered, i.e. the species (operational taxonomic units) were more related to each other than expected by chance. Accordingly, the local communities were assembled non-randomly from the global pool of available bacterioplankton. Further, the phylogenetic structures of the communities were related to the water temperature at the locations. In agreement with similar studies, including both macroorganisms and bacteria, these results suggest that marine bacterial communities are structured by “habitat filtering”, i.e. through non-random colonization and invasion determined by environmental characteristics. Different bacterial types seem to have different ecological niches that dictate their survival in different habitats. Other eco-evolutionary processes that may contribute to the observed phylogenetic patterns are discussed. The results also imply a mapping between phenotype and phylogenetic relatedness which facilitates the use of community phylogenetic structure analysis to infer ecological and evolutionary assembly processes.
The growing availability of spatial datasets (observations, reanalysis, and regional and global climate models) demands efficient multivariate spatial modeling techniques for many problems of interest (e.g. teleconnection analysis, multi-site downscaling, etc.). Complex networks have been recently applied in this context using graphs built from pairwise correlations between the different stations (or grid boxes) forming the dataset. However, this analysis does not take into account the full dependence structure underlying the data, gien by all possible marginal and conditional dependencies among the stations, and does not allow a probabilistic analysis of the dataset. In this talk we introduce Bayesian networks as an alternative multivariate analysis and modeling data-driven technique which allows building a joint probability distribution of the stations including all relevant dependencies in the dataset. Bayesian networks is a sound machine learning technique using a graph to 1) encode the main dependencies among the variables and 2) to obtain a factorization of the joint probability distribution of the stations given by a reduced number of parameters. For a particular problem, the resulting graph provides a qualitative analysis of the spatial relationships in the dataset (alternative to complex network analysis), and the resulting model allows for a probabilistic analysis of the dataset. Bayesian networks have been widely applied in many fields, but their use in climate problems is hampered by the large number of variables (stations) involved in this field, since the complexity of the existing algorithms to learn from data the graphical structure grows nonlinearly with the number of variables. In this contribution we present a modified local learning algorithm for Bayesian networks adapted to this problem, which allows inferring the graphical structure for thousands of stations (from observations) and/or gridboxes (from model simulations) thus providing new
This study aimed to develop a real-time crash risk model with limited data in China by using Bayesian meta-analysis and Bayesian inference approach. A systematic review was first conducted by using three different Bayesian meta-analyses, including the fixed effect meta-analysis, the random effect meta-analysis, and the meta-regression. The meta-analyses provided a numerical summary of the effects of traffic variables on crash risks by quantitatively synthesizing results from previous studies. The random effect meta-analysis and the meta-regression produced a more conservative estimate for the effects of traffic variables compared with the fixed effect meta-analysis. Then, the meta-analyses results were used as informative priors for developing crash risk models with limited data. Three different meta-analyses significantly affect model fit and prediction accuracy. The model based on meta-regression can increase the prediction accuracy by about 15% as compared to the model that was directly developed with limited data. Finally, the Bayesian predictive densities analysis was used to identify the outliers in the limited data. It can further improve the prediction accuracy by 5.0%.
A Bayesian analysis of stochastic volatility (SV) models using the class of symmetric scale mixtures of normal (SMN) distributions is considered. In the face of non-normality, this provides an appealing robust alternative to the routine use of the normal distribution. Specific distributions examined include the normal, student-t, slash and the variance gamma distributions. Using a Bayesian paradigm, an efficient Markov chain Monte Carlo (MCMC) algorithm is introduced for parameter estimation. Moreover, the mixing parameters obtained as a by-product of the scale mixture representation can be used to identify outliers. The methods developed are applied to analyze daily stock returns data on S&P500 index. Bayesian model selection criteria as well as out-of- sample forecasting results reveal that the SV models based on heavy-tailed SMN distributions provide significant improvement in model fit as well as prediction to the S&P500 index data over the usual normal model. PMID:20730043
Regression adjustment for the propensity score is a statistical method that reduces confounding from measured variables in observational data. A Bayesian propensity score analysis extends this idea by using simultaneous estimation of the propensity scores and the treatment effect. In this article, we conduct an empirical investigation of the performance of Bayesian propensity scores in the context of an observational study of the effectiveness of beta-blocker therapy in heart failure patients. We study the balancing properties of the estimated propensity scores. Traditional Frequentist propensity scores focus attention on balancing covariates that are strongly associated with treatment. In contrast, we demonstrate that Bayesian propensity scores can be used to balance the association between covariates and the outcome. This balancing property has the effect of reducing confounding bias because it reduces the degree to which covariates are outcome risk factors.
Prokaryotic 16S ribosomal RNA (rRNA) sequences are widely used in environmental microbiology and molecular evolution as reliable markers for the taxonomic classification and phylogenetic analysis of microbes. Restricted by current sequencing techniques, the massive sequencing of 16S rRNA gene amplicons encompassing the full length of genes is not yet feasible. Thus, the selection of the most efficient hypervariable regions for phylogenetic analysis and taxonomic classification is still debated. In the present study, several bioinformatics tools were integrated to build an in silico pipeline to evaluate the phylogenetic sensitivity of the hypervariable regions compared with the corresponding full-length sequences. The correlation of seven sub-regions was inferred from the geodesic distance, a parameter that is applied to quantitatively compare the topology of different phylogenetic trees constructed using the sequences from different sub-regions. The relationship between different sub-regions based on the geodesic distance indicated that V4-V6 were the most reliable regions for representing the full-length 16S rRNA sequences in the phylogenetic analysis of most bacterial phyla, while V2 and V8 were the least reliable regions. Our results suggest that V4-V6 might be optimal sub-regions for the design of universal primers with superior phylogenetic resolution for bacterial phyla. A potential relationship between function and the evolution of 16S rRNA is also discussed.
The fit of data using a mathematical model is the standard way to know if the model describes data correctly and to obtain parameters that describe the physical processes hidden behind the experimental results. This is usually done by means of a χ2 minimization procedure. Although this procedure is fast and quite reliable for simple models, it has many drawbacks when dealing with complicated problems such as models with many or correlated parameters. We present here a Bayesian method to explore the parameter space guided only by the probability laws underlying the χ2 figure of merit. The presented method does not get stuck in local minima of the χ2 landscape as it usually happens with classical minimization procedures. Moreover correlations between parameters are taken into account in a natural way. Finally, parameters are obtained as probability distribution functions so that all the complexity of the parameter space is shown.
The ability to accurately estimate effective connectivity among brain regions from neuroimaging data could help answering many open questions in neuroscience. We propose a method which uses causality to obtain a measure of effective connectivity from fMRI data. The method uses a vector autoregressive model for the latent variables describing neuronal activity in combination with a linear observation model based on a convolution with a hemodynamic response function. Due to the employed modeling, it is possible to efficiently estimate all latent variables of the model using a variational Bayesian inference algorithm. The computational efficiency of the method enables us to apply it to large scale problems with high sampling rates and several hundred regions of interest. We use a comprehensive empirical evaluation with synthetic and real fMRI data to evaluate the performance of our method under various conditions.
The ability to accurately estimate effective connectivity among brain regions from neuroimaging data could help answering many open questions in neuroscience. We propose a method which uses causality to obtain a measure of effective connectivity from fMRI data. The method uses a vector autoregressive model for the latent variables describing neuronal activity in combination with a linear observation model based on a convolution with a hemodynamic response function. Due to the employed modeling, it is possible to efficiently estimate all latent variables of the model using a variational Bayesian inference algorithm. The computational efficiency of the method enables us to apply it to large scale problems with high sampling rates and several hundred regions of interest. We use a comprehensive empirical evaluation with synthetic and real fMRI data to evaluate the performance of our method under various conditions. PMID:24847244
A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.
Phylogenetic Analysis of Eastern Equine Encephalitis Virus Isolates from Florida
Florida has the highest degree of endemicity for eastern equine encephalitis virus (EEEV) of any state in the United States and is the only state with year-round transmission of EEEV. To further understand the viral population dynamics in Florida, the genome sequence of six EEEV isolates from central Florida were determined. These data were used to identify the most polymorphic regions of the EEEV genome from viruses isolated in Florida. The sequence of these polymorphic regions was then determined for 18 additional Florida isolates collected in four geographically distinct regions over a 20-year period. Phylogenetic analyses of these data suggested a rough temporal association of the Florida isolates, but no clustering by region or by source of the isolate. Some clustering of northeastern isolates with Florida isolates was seen, providing support for the hypothesis that Florida serves as a reservoir for the periodic introduction of EEEV into the northeastern United States. PMID:21540379
In this Special feature, we assemble studies that illustrate phylogenetic approaches to studying salient questions regarding the effect of specialization on lineage diversification. The studies use an array of techniques involving a wide-ranging collection of biological systems (plants, butterflies, fish and amphibians are all represented). Their results reveal that macroevolutionary examination of specialization provides insight into the patterns of trade-offs in specialized systems; in particular, the genetic mechanisms of trade-offs appear to extend to very different aspects of life history in different groups. In turn, because a species may be a specialist from one perspective and a generalist in others, these trade-offs influence whether we perceive specialization to have effects on the evolutionary success of a lineage when we examine specialization only along a single axis. Finally, how geographical range influences speciation and extinction of specialist lineages remains a question offering much potential for further insight. PMID:25274367
wolfPAC is an AppleScript-based software package that facilitates the use of numerous, remotely located Macintosh computers to perform computationally-intensive phylogenetic analyses using the popular application PAUP* (Phylogenetic Analysis Using Parsimony). It has been designed to utilise readily available, inexpensive processors and to encourage sharing of computational resources within the worldwide phylogenetics community.
The evolution of climatic niche specialization has important implications for many topics in ecology, evolution and conservation. The climatic niche reflects the set of temperature and precipitation conditions where a species can occur. Thus, specialization to a limited set of climatic conditions can be important for understanding patterns of biogeography, species richness, community structure, allopatric speciation, spread of invasive species and responses to climate change. Nevertheless, the factors that determine climatic niche width (level of specialization) remain poorly explored. Here, we test whether species that occur in more extreme climates are more highly specialized for those conditions, and whether there are trade-offs between niche widths on different climatic niche axes (e.g. do species that tolerate a broad range of temperatures tolerate only a limited range of precipitation regimes?). We test these hypotheses in amphibians, using phylogenetic comparative methods and global-scale datasets, including 2712 species with both climatic and phylogenetic data. Our results do not support either hypothesis. Rather than finding narrower niches in more extreme environments, niches tend to be narrower on one end of a climatic gradient but wider on the other. We also find that temperature and precipitation niche breadths are positively related, rather than showing trade-offs. Finally, our results suggest that most amphibian species occur in relatively warm and dry environments and have relatively narrow climatic niche widths on both of these axes. Thus, they may be especially imperilled by anthropogenic climate change. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
Fifteen small-subunit rRNAs from methylotrophic bacteria have been sequenced. Comparisons of these sequences with 22 previously published sequences further defined the phylogenetic relationships among these bacteria and illustrated the agreement between phylogeny and physiological characteristics of the bacteria. Phylogenetic trees were constructed with 16S rRNA sequences from methylotrophic bacteria and representative organisms from subdivisions within the class Proteobacteria on the basis of sequence similarities by using a weighted least-mean-square difference method. The methylotrophs have been separated into coherent clusters in which bacteria shared physiological characteristics. The clusters distinguished bacteria which used either the ribulose monophosphate or serine pathway for carbon assimilation. In addition, methanotrophs and methylotrophs which do not utilize methane were found to form distinct clusters within these groups. Five new deoxyoligonucleotide probes were designed, synthesized, labelled with digoxigenin-11-ddUTP, and tested for the ability to hybridize to RNA extracted from the bacteria represented in the unique clusters and for the ability to detect RNAs purified from soils enriched for methanotrophs by exposure to a methane-air atmosphere for one month. The 16S rRNA purified from soil hybridized to the probe which was complementary to sequences present in 16S rRNA from serine pathway methanotrophs and hybridized to a lesser extent with a probe complementary to sequences in 16S rRNAs of ribulose monophosphate pathway methanotrophs. The nonradioactive detection system used performed reliably at amounts of RNA from pure cultures as small as 10 ng. Images PMID:7510941
Copula models have become increasingly popular for modelling the dependence structure in multivariate survival data. The two-parameter Archimedean family of Power Variance Function (PVF) copulas includes the Clayton, Positive Stable (Gumbel) and Inverse Gaussian copulas as special or limiting cases, thus offers a unified approach to fitting these important copulas. Two-stage frequentist procedures for estimating the marginal distributions and the PVF copula have been suggested by Andersen (Lifetime Data Anal 11:333-350, 2005), Massonnet et al. (J Stat Plann Inference 139(11):3865-3877, 2009) and Prenen et al. (J R Stat Soc Ser B 79(2):483-505, 2017) which first estimate the marginal distributions and conditional on these in a second step to estimate the PVF copula parameters. Here we explore an one-stage Bayesian approach that simultaneously estimates the marginal and the PVF copula parameters. For the marginal distributions, we consider both parametric as well as semiparametric models. We propose a new method to simulate uniform pairs with PVF dependence structure based on conditional sampling for copulas and on numerical approximation to solve a target equation. In a simulation study, small sample properties of the Bayesian estimators are explored. We illustrate the usefulness of the methodology using data on times to appendectomy for adult twins in the Australian NH&MRC Twin registry. Parameters of the marginal distributions and the PVF copula are simultaneously estimated in a parametric as well as a semiparametric approach where the marginal distributions are modelled using Weibull and piecewise exponential distributions, respectively.
Micronutrients in HIV: A Bayesian Meta-Analysis
Background Approximately 28.5 million people living with HIV are eligible for treatment (CD4<500), but currently have no access to antiretroviral therapy. Reduced serum level of micronutrients is common in HIV disease. Micronutrient supplementation (MNS) may mitigate disease progression and mortality. Objectives We synthesized evidence on the effect of micronutrient supplementation on mortality and rate of disease progression in HIV disease. Methods We searched MEDLINE, EMBASE, the Cochrane Central, AMED and CINAHL databases through December 2014, without language restriction, for studies of greater than 3 micronutrients versus any or no comparator. We built a hierarchical Bayesian random effects model to synthesize results. Inferences are based on the posterior distribution of the population effects; posterior distributions were approximated by Markov chain Monte Carlo in OpenBugs. Principal Findings From 2166 initial references, we selected 49 studies for full review and identified eight reporting on disease progression and/or mortality. Bayesian synthesis of data from 2,249 adults in three studies estimated the relative risk of disease progression in subjects on MNS vs. control as 0.62 (95% credible interval, 0.37, 0.96). Median number needed to treat is 8.4 (4.8, 29.9) and the Bayes Factor 53.4. Based on data reporting on 4,095 adults reporting mortality in 7 randomized controlled studies, the RR was 0.84 (0.38, 1.85), NNT is 25 (4.3, ∞). Conclusions MNS significantly and substantially slows disease progression in HIV+ adults not on ARV, and possibly reduces mortality. Micronutrient supplements are effective in reducing progression with a posterior probability of 97.9%. Considering MNS low cost and lack of adverse effects, MNS should be standard of care for HIV+ adults not yet on ARV. PMID:25830916
Bayesian Network Meta-Analysis for Unordered Categorical Outcomes with Incomplete Data
We develop a Bayesian multinomial network meta-analysis model for unordered (nominal) categorical outcomes that allows for partially observed data in which exact event counts may not be known for each category. This model properly accounts for correlations of counts in mutually exclusive categories and enables proper comparison and ranking of…
This study integrated Bayesian hierarchical modeling and receiver operating characteristic analysis (BROCA) to evaluate how interest strength (IS) and interest differentiation (ID) predicted low–socioeconomic status (SES) youth's interest-major congruence (IMC). Using large-scale Kuder Career Search online-assessment data, this study fit three…
Factor analysis is a popular statistical technique for multivariate data analysis. Developments in the structural equation modeling framework have enabled the use of hybrid confirmatory/exploratory approaches in which factor-loading structures can be explored relatively flexibly within a confirmatory factor analysis (CFA) framework. Recently, Muthén & Asparouhov proposed a Bayesian structural equation modeling (BSEM) approach to explore the presence of cross loadings in CFA models. We show that the issue of determining factor-loading patterns may be formulated as a Bayesian variable selection problem in which Muthén and Asparouhov's approach can be regarded as a BSEM approach with ridge regression prior (BSEM-RP). We propose another Bayesian approach, denoted herein as the Bayesian structural equation modeling with spike-and-slab prior (BSEM-SSP), which serves as a one-stage alternative to the BSEM-RP. We review the theoretical advantages and disadvantages of both approaches and compare their empirical performance relative to two modification indices-based approaches and exploratory factor analysis with target rotation. A teacher stress scale data set is used to demonstrate our approach.
In certain data analyses (e.g., multiple discriminant analysis and multinomial log-linear modeling), classification decisions are made based on the estimated posterior probabilities that individuals belong to each of several distinct categories. In the Bayesian network literature, this type of classification is often accomplished by assigning…
Application of a data-mining method based on Bayesian networks to lesion-deficit analysis
Although lesion-deficit analysis (LDA) has provided extensive information about structure-function associations in the human brain, LDA has suffered from the difficulties inherent to the analysis of spatial data, i.e., there are many more variables than subjects, and data may be difficult to model using standard distributions, such as the normal distribution. We herein describe a Bayesian method for LDA; this method is based on data-mining techniques that employ Bayesian networks to represent structure-function associations. These methods are computationally tractable, and can represent complex, nonlinear structure-function associations. When applied to the evaluation of data obtained from a study of the psychiatric sequelae of traumatic brain injury in children, this method generates a Bayesian network that demonstrates complex, nonlinear associations among lesions in the left caudate, right globus pallidus, right side of the corpus callosum, right caudate, and left thalamus, and subsequent development of attention-deficit hyperactivity disorder, confirming and extending our previous statistical analysis of these data. Furthermore, analysis of simulated data indicates that methods based on Bayesian networks may be more sensitive and specific for detecting associations among categorical variables than methods based on chi-square and Fisher exact statistics.
The validity of family background variables instrumenting education in income regressions has been much criticized. In this paper, we use data from the 2004 German Socio-Economic Panel and Bayesian analysis to analyze to what degree violations of the strict validity assumption affect the estimation results. We show that, in case of moderate direct…
This paper proposes a new method to evaluate informative hypotheses for meta-analysis of Cronbach's coefficient alpha using a Bayesian approach. The coefficient alpha is one of the most widely used reliability indices. In meta-analyses of reliability, researchers typically form specific informative hypotheses beforehand, such as "alpha of…
Clinicians need to know the likelihood of a condition given a positive or negative diagnostic test. In this study a Bayesian analysis of the Clinical Behavior Checklist for Persons with Intellectual Disabilities (CBCPID) to predict depression in people with intellectual disability was conducted. The CBCPID was administered to 92 adults with…
A cross-prefectural production function (CPPF) in Japan is constructed in a set of Bayesian models to examine the performance of Japan's post-war economy. The parameters in the model are estimated by using the procedure of a Monte Carlo filter together with the method of maximum likelihood. The estimated results are applied to regional and historical analysis of the Japanese economy.
We develop a Bayesian multinomial network meta-analysis model for unordered (nominal) categorical outcomes that allows for partially observed data in which exact event counts may not be known for each category. This model properly accounts for correlations of counts in mutually exclusive categories and enables proper comparison and ranking of…
Although lesion-deficit analysis (LDA) has provided extensive information about structure-function associations in the human brain, LDA has suffered from the difficulties inherent to the analysis of spatial data, i.e., there are many more variables than subjects, and data may be difficult to model using standard distributions, such as the normal distribution. We herein describe a Bayesian method for LDA; this method is based on data-mining techniques that employ Bayesian networks to represent structure-function associations. These methods are computationally tractable, and can represent complex, nonlinear structure-function associations. When applied to the evaluation of data obtained from a study of the psychiatric sequelae of traumatic brain injury in children, this method generates a Bayesian network that demonstrates complex, nonlinear associations among lesions in the left caudate, right globus pallidus, right side of the corpus callosum, right caudate, and left thalamus, and subsequent development of attention-deficit hyperactivity disorder, confirming and extending our previous statistical analysis of these data. Furthermore, analysis of simulated data indicates that methods based on Bayesian networks may be more sensitive and specific for detecting associations among categorical variables than methods based on chi-square and Fisher exact statistics.
A viewgraph presentation on the review of Bayesian approach to Cosmic Microwave Background (CMB) analysis, numerical implementation with Gibbs sampling, a summary of application to WMAP I and work in progress with generalizations to polarization, foregrounds, asymmetric beams, and 1/f noise is given.
This paper proposes a new method to evaluate informative hypotheses for meta-analysis of Cronbach's coefficient alpha using a Bayesian approach. The coefficient alpha is one of the most widely used reliability indices. In meta-analyses of reliability, researchers typically form specific informative hypotheses beforehand, such as "alpha of…
A viewgraph presentation on the review of Bayesian approach to Cosmic Microwave Background (CMB) analysis, numerical implementation with Gibbs sampling, a summary of application to WMAP I and work in progress with generalizations to polarization, foregrounds, asymmetric beams, and 1/f noise is given.
In this paper, the Genetic Algorithms (GA) and Bayesian model averaging (BMA) were combined to simultaneously conduct calibration and uncertainty analysis for the Soil and Water Assessment Tool (SWAT).
Tomato yellow leaf curl virus (TYLCV) is a member of the genus Begomovirus of the family Geminiviridae, members of which are characterized by closed circular single-stranded DNA genomes of 2.7-2.8 kb in length, and include viruses transmitted by the Bemisia tabaci whitefly. No reports of TYLCV in Korea are available prior to 2008, after which TYLCV spread rapidly to most regions of the southern Korean peninsula (Gyeongsang-Do, Jeolla-Do and Jeju-Do). Fifty full sequences of TYLCV were analyzed in this study, and the AC1, AV1, IR, and full sequences were analyzed via the muscle program and bayesian analysis. Phylogenetic analysis demonstrated that the Korea TYLCVs were divided into two subgroups. The TYLCV Korea 1 group (Masan) originated from TYLCV Japan (Miyazaki) and the TYLCV Korea 2 group (Jeju/Jeonju) from TYLCV Japan (Tosa/Haruno). A B. tabaci phylogenetic tree was constructed with 16S rRNA and mitochondria cytochrome oxidase I (MtCOI) sequences using the muscle program and MEGA 4.0 in the neighbor-joining algorithm. The sequence data of 16S rRNA revealed that Korea B. tabaci was closely aligned to B. tabaci isolated in Iran and Nigeria. The Q type of B. tabaci, which was originally identified as a viruliferous insect in 2008, was initially isolated in Korea as a non-viruliferous insect in 2005. Therefore, we suggest that two TYLCV Japan isolates were introduced to Korea via different routes, and then transmitted by native B. tabaci.
Meerow, Alan W.; Noblick, Larry; Borrone, James W.; Couvreur, Thomas L. P.; Mauro-Herrera, Margarita; Hahn, William J.; Kuhn, David N.; Nakamura, Kyoko; Oleas, Nora H.; Schnell, Raymond J.
2009-01-01
Background The Cocoseae is one of 13 tribes of Arecaceae subfam. Arecoideae, and contains a number of palms with significant economic importance, including the monotypic and pantropical Cocos nucifera L., the coconut, the origins of which have been one of the “abominable mysteries” of palm systematics for decades. Previous studies with predominantly plastid genes weakly supported American ancestry for the coconut but ambiguous sister relationships. In this paper, we use multiple single copy nuclear loci to address the phylogeny of the Cocoseae subtribe Attaleinae, and resolve the closest extant relative of the coconut. Methodology/Principal Findings We present the results of combined analysis of DNA sequences of seven WRKY transcription factor loci across 72 samples of Arecaceae tribe Cocoseae subtribe Attaleinae, representing all genera classified within the subtribe, and three outgroup taxa with maximum parsimony, maximum likelihood, and Bayesian approaches, producing highly congruent and well-resolved trees that robustly identify the genus Syagrus as sister to Cocos and resolve novel and well-supported relationships among the other genera of the Attaleinae. We also address incongruence among the gene trees with gene tree reconciliation analysis, and assign estimated ages to the nodes of our tree. Conclusions/Significance This study represents the as yet most extensive phylogenetic analyses of Cocoseae subtribe Attaleinae. We present a well-resolved and supported phylogeny of the subtribe that robustly indicates a sister relationship between Cocos and Syagrus. This is not only of biogeographic interest, but will also open fruitful avenues of inquiry regarding evolution of functional genes useful for crop improvement. Establishment of two major clades of American Attaleinae occurred in the Oligocene (ca. 37 MYBP) in Eastern Brazil. The divergence of Cocos from Syagrus is estimated at 35 MYBP. The biogeographic and morphological congruence that we see for
We apply the Bayesian framework to assess the presence of a correlation between two quantities. To do so, we estimate the probability distribution of the parameter of interest, ρ, characterizing the strength of the correlation. We provide an implementation of these ideas and concepts using python programming language and the pyMC module in a very short (∼ 130 lines of code, heavily commented) and user-friendly program. We used this tool to assess the presence and properties of the correlation between planetary surface gravity and stellar activity level as measured by the log([Formula: see text]) indicator. The results of the Bayesian analysis are qualitatively similar to those obtained via p-value analysis, and support the presence of a correlation in the data. The results are more robust in their derivation and more informative, revealing interesting features such as asymmetric posterior distributions or markedly different credible intervals, and allowing for a deeper exploration. We encourage the reader interested in this kind of problem to apply our code to his/her own scientific problems. The full understanding of what the Bayesian framework is can only be gained through the insight that comes by handling priors, assessing the convergence of Monte Carlo runs, and a multitude of other practical problems. We hope to contribute so that Bayesian analysis becomes a tool in the toolkit of researchers, and they understand by experience its advantages and limitations.
Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.
We apply the Bayesian framework to assess the presence of a correlation between two quantities. To do so, we estimate the probability distribution of the parameter of interest, ρ, characterizing the strength of the correlation. We provide an implementation of these ideas and concepts using python programming language and the pyMC module in a very short (˜ 130 lines of code, heavily commented) and user-friendly program. We used this tool to assess the presence and properties of the correlation between planetary surface gravity and stellar activity level as measured by the log(R^' }_{ {HK}}) indicator. The results of the Bayesian analysis are qualitatively similar to those obtained via p-value analysis, and support the presence of a correlation in the data. The results are more robust in their derivation and more informative, revealing interesting features such as asymmetric posterior distributions or markedly different credible intervals, and allowing for a deeper exploration. We encourage the reader interested in this kind of problem to apply our code to his/her own scientific problems. The full understanding of what the Bayesian framework is can only be gained through the insight that comes by handling priors, assessing the convergence of Monte Carlo runs, and a multitude of other practical problems. We hope to contribute so that Bayesian analysis becomes a tool in the toolkit of researchers, and they understand by experience its advantages and limitations.
Kwon, Deukwoo; Hoffman, F Owen; Moroz, Brian E; Simon, Steven L
MrBayes is a widespread phylogenetic inference tool harnessing empirical evolutionary models and Bayesian statistics. However, the computational cost on the likelihood estimation is very expensive, resulting in undesirably long execution time. Although a number of multi-threaded optimizations have been proposed to speed up MrBayes, there are bottlenecks that severely limit the GPU thread-level parallelism of likelihood estimations. This study proposes a high performance and resource-efficient method for GPU-oriented parallelization of likelihood estimations. Instead of having to rely on empirical programming, the proposed novel decomposition storage model implements high performance data transfers implicitly. In terms of performance improvement, a speedup factor of up to 178 can be achieved on the analysis of simulated datasets by four Tesla K40 cards. In comparison to the other publicly available GPU-oriented MrBayes, the tgMC(3)++ method (proposed herein) outperforms the tgMC(3) (v1.0), nMC(3) (v2.1.1) and oMC(3) (v1.00) methods by speedup factors of up to 1.6, 1.9 and 2.9, respectively. Moreover, tgMC(3)++ supports more evolutionary models and gamma categories, which previous GPU-oriented methods fail to take into analysis.
Recent developments in high field imaging have made possible the acquisition of high quality, low noise relaxographic data in reasonable imaging times. The datasets comprise a huge amount of information (>>1 million points) which makes rigorous analysis daunting. Here, the authors present results demonstrating that Principal Component Analysis (PCA) and Bayesian Decomposition (BD) provide powerful methods for relaxographic analysis of T{sub 1} recovery curves and editing of tissue type in resulting images.
Toward an ecological analysis of Bayesian inferences: how task characteristics influence responses
In research on Bayesian inferences, the specific tasks, with their narratives and characteristics, are typically seen as exchangeable vehicles that merely transport the structure of the problem to research participants. In the present paper, we explore whether, and possibly how, task characteristics that are usually ignored influence participants’ responses in these tasks. We focus on both quantitative dimensions of the tasks, such as their base rates, hit rates, and false-alarm rates, as well as qualitative characteristics, such as whether the task involves a norm violation or not, whether the stakes are high or low, and whether the focus is on the individual case or on the numbers. Using a data set of 19 different tasks presented to 500 different participants who provided a total of 1,773 responses, we analyze these responses in two ways: first, on the level of the numerical estimates themselves, and second, on the level of various response strategies, Bayesian and non-Bayesian, that might have produced the estimates. We identified various contingencies, and most of the task characteristics had an influence on participants’ responses. Typically, this influence has been stronger when the numerical information in the tasks was presented in terms of probabilities or percentages, compared to natural frequencies – and this effect cannot be fully explained by a higher proportion of Bayesian responses when natural frequencies were used. One characteristic that did not seem to influence participants’ response strategy was the numerical value of the Bayesian solution itself. Our exploratory study is a first step toward an ecological analysis of Bayesian inferences, and highlights new avenues for future research. PMID:26300791
In research on Bayesian inferences, the specific tasks, with their narratives and characteristics, are typically seen as exchangeable vehicles that merely transport the structure of the problem to research participants. In the present paper, we explore whether, and possibly how, task characteristics that are usually ignored influence participants' responses in these tasks. We focus on both quantitative dimensions of the tasks, such as their base rates, hit rates, and false-alarm rates, as well as qualitative characteristics, such as whether the task involves a norm violation or not, whether the stakes are high or low, and whether the focus is on the individual case or on the numbers. Using a data set of 19 different tasks presented to 500 different participants who provided a total of 1,773 responses, we analyze these responses in two ways: first, on the level of the numerical estimates themselves, and second, on the level of various response strategies, Bayesian and non-Bayesian, that might have produced the estimates. We identified various contingencies, and most of the task characteristics had an influence on participants' responses. Typically, this influence has been stronger when the numerical information in the tasks was presented in terms of probabilities or percentages, compared to natural frequencies - and this effect cannot be fully explained by a higher proportion of Bayesian responses when natural frequencies were used. One characteristic that did not seem to influence participants' response strategy was the numerical value of the Bayesian solution itself. Our exploratory study is a first step toward an ecological analysis of Bayesian inferences, and highlights new avenues for future research.
In reliability theory, the most important problem is to determine the reliability of a complex system from the reliability of its components. The weakness of most reliability theories is that the systems are described and explained as simply functioning or failed. In many real situations, the failures may be from many causes depending upon the age and the environment of the system and its components. Another problem in reliability theory is one of estimating the parameters of the assumed failure models. The estimation may be based on data collected over censored or uncensored life tests. In many reliability problems, the failure data are simply quantitatively inadequate, especially in engineering design and maintenance system. The Bayesian analyses are more beneficial than the classical one in such cases. The Bayesian estimation analyses allow us to combine past knowledge or experience in the form of an apriori distribution with life test data to make inferences of the parameter of interest. In this paper, we have investigated the application of the Bayesian estimation analyses to competing risk systems. The cases are limited to the models with independent causes of failure by using the Weibull distribution as our model. A simulation is conducted for this distribution with the objectives of verifying the models and the estimators and investigating the performance of the estimators for varying sample size. The simulation data are analyzed by using Bayesian and the maximum likelihood analyses. The simulation results show that the change of the true of parameter relatively to another will change the value of standard deviation in an opposite direction. For a perfect information on the prior distribution, the estimation methods of the Bayesian analyses are better than those of the maximum likelihood. The sensitivity analyses show some amount of sensitivity over the shifts of the prior locations. They also show the robustness of the Bayesian analysis within the range
In amphioxus, we found a mesoderm related gene, tropomyosin, which encodes a protein comprising 284 amino acid residues, sharing high identities with other known Tropomyosin proteins both in vertebrates and invertebrates. Phylogenetically, amphioxus Tropomyosin fell outside the invertebrate clade and was at the base of the vertebrate protein family clade, indicating that it may represent an independent branch. From the early neurula to the larva stage, whole-mount in situ hybridization and histological sections found transcripts of amphioxus tropomyosin gene. Weak tropomyosin expression was first detected in the wall of the archenteron at about 10 hours-post-fertilization neurula stage, while intense expression was revealed in the differentiating presumptive notochord and the muscle. Transcripts of tropomyosin were then expressed in the formed notochord and somites. Gene expression seemed to continue in these developing organs throughout the neurular stages and remained till 72-hours, during the early larval stages. In situ study still showed tropomyosin was also expressed in the neural tube, hepatic diverticulum, notochord and the spaces between myotomes in adult amphioxus. Our results indicated that tropomyosin may play an important role in both embryonic development and adult life.
The causative agent of fasciolosis in South America is thought to be Fasciola hepatica. In this study, Fasciola flukes from Peru were analyzed to investigate their genetic structure and phylogenetic relationships with those from other countries. Fasciola flukes were collected from the three definitive host species: cattle, sheep, and pigs. They were identified as F. hepatica because mature sperms were observed in their seminal vesicles, and also they displayed Fh type, which has an identical fragment pattern to F. hepatica in the nuclear internal transcribed spacer 1. Eight haplotypes were obtained from the mitochondrial NADH dehydrogenase subunit 1 (nad1) sequences of Peruvian F. hepatica; however, no special difference in genetic structure was observed between the three host species. Its extremely low genetic diversity suggests that the Peruvian population was introduced from other regions. Nad1 haplotypes identical to those of Peruvian F. hepatica were detected in China, Uruguay, Italy, Iran, and Australia. Our results indicate that F. hepatica rapidly expanded its range due to human migration. Future studies are required to elucidate dispersal route of F. hepatica from Europe, its probable origin, to other areas, including Peru.
