Sample records for bayesian clustering analyses

  1. Bayesian hierarchical models for cost-effectiveness analyses that use data from cluster randomized trials.

    PubMed

    Grieve, Richard; Nixon, Richard; Thompson, Simon G

    2010-01-01

    Cost-effectiveness analyses (CEA) may be undertaken alongside cluster randomized trials (CRTs) where randomization is at the level of the cluster (for example, the hospital or primary care provider) rather than the individual. Costs (and outcomes) within clusters may be correlated so that the assumption made by standard bivariate regression models, that observations are independent, is incorrect. This study develops a flexible modeling framework to acknowledge the clustering in CEA that use CRTs. The authors extend previous Bayesian bivariate models for CEA of multicenter trials to recognize the specific form of clustering in CRTs. They develop new Bayesian hierarchical models (BHMs) that allow mean costs and outcomes, and also variances, to differ across clusters. They illustrate how each model can be applied using data from a large (1732 cases, 70 primary care providers) CRT evaluating alternative interventions for reducing postnatal depression. The analyses compare cost-effectiveness estimates from BHMs with standard bivariate regression models that ignore the data hierarchy. The BHMs show high levels of cost heterogeneity across clusters (intracluster correlation coefficient, 0.17). Compared with standard regression models, the BHMs yield substantially increased uncertainty surrounding the cost-effectiveness estimates, and altered point estimates. The authors conclude that ignoring clustering can lead to incorrect inferences. The BHMs that they present offer a flexible modeling framework that can be applied more generally to CEA that use CRTs.

  2. Hierarchical structure of the Sicilian goats revealed by Bayesian analyses of microsatellite information.

    PubMed

    Siwek, M; Finocchiaro, R; Curik, I; Portolano, B

    2011-02-01

    Genetic structure and relationship amongst the main goat populations in Sicily (Girgentana, Derivata di Siria, Maltese and Messinese) were analysed using information from 19 microsatellite markers genotyped on 173 individuals. A posterior Bayesian approach implemented in the program STRUCTURE revealed a hierarchical structure with two clusters at the first level (Girgentana vs. Messinese, Derivata di Siria and Maltese), explaining 4.8% of variation (amovaФ(ST) estimate). Seven clusters nested within these first two clusters (further differentiations of Girgentana, Derivata di Siria and Maltese), explaining 8.5% of variation (amovaФ(SC) estimate). The analyses and methods applied in this study indicate their power to detect subtle population structure. © 2010 The Authors, Animal Genetics © 2010 Stichting International Foundation for Animal Genetics.

  3. Bayesian Decision Theoretical Framework for Clustering

    ERIC Educational Resources Information Center

    Chen, Mo

    2011-01-01

    In this thesis, we establish a novel probabilistic framework for the data clustering problem from the perspective of Bayesian decision theory. The Bayesian decision theory view justifies the important questions: what is a cluster and what a clustering algorithm should optimize. We prove that the spectral clustering (to be specific, the…

  4. Potential of SNP markers for the characterization of Brazilian cassava germplasm.

    PubMed

    de Oliveira, Eder Jorge; Ferreira, Cláudia Fortes; da Silva Santos, Vanderlei; de Jesus, Onildo Nunes; Oliveira, Gilmara Alvarenga Fachardo; da Silva, Maiane Suzarte

    2014-06-01

    High-throughput markers, such as SNPs, along with different methodologies were used to evaluate the applicability of the Bayesian approach and the multivariate analysis in structuring the genetic diversity in cassavas. The objective of the present work was to evaluate the diversity and genetic structure of the largest cassava germplasm bank in Brazil. Complementary methodological approaches such as discriminant analysis of principal components (DAPC), Bayesian analysis and molecular analysis of variance (AMOVA) were used to understand the structure and diversity of 1,280 accessions genotyped using 402 single nucleotide polymorphism markers. The genetic diversity (0.327) and the average observed heterozygosity (0.322) were high considering the bi-allelic markers. In terms of population, the presence of a complex genetic structure was observed indicating the formation of 30 clusters by DAPC and 34 clusters by Bayesian analysis. Both methodologies presented difficulties and controversies in terms of the allocation of some accessions to specific clusters. However, the clusters suggested by the DAPC analysis seemed to be more consistent for presenting higher probability of allocation of the accessions within the clusters. Prior information related to breeding patterns and geographic origins of the accessions were not sufficient for providing clear differentiation between the clusters according to the AMOVA analysis. In contrast, the F ST was maximized when considering the clusters suggested by the Bayesian and DAPC analyses. The high frequency of germplasm exchange between producers and the subsequent alteration of the name of the same material may be one of the causes of the low association between genetic diversity and geographic origin. The results of this study may benefit cassava germplasm conservation programs, and contribute to the maximization of genetic gains in breeding programs.

  5. Species-richness of the Anopheles annulipes Complex (Diptera: Culicidae) Revealed by Tree and Model-Based Allozyme Clustering Analyses

    DTIC Science & Technology

    2007-01-01

    including tree- based methods such as the unweighted pair group method of analysis ( UPGMA ) and Neighbour-joining (NJ) (Saitou & Nei, 1987). By...based Bayesian approach and the tree-based UPGMA and NJ cluster- ing methods. The results obtained suggest that far more species occur in the An...unlikely that groups that differ by more than these levels are conspecific. Genetic distances were clustered using the UPGMA and NJ algorithms in MEGA

  6. Bayesian Nonparametric Ordination for the Analysis of Microbial Communities.

    PubMed

    Ren, Boyu; Bacallado, Sergio; Favaro, Stefano; Holmes, Susan; Trippa, Lorenzo

    2017-01-01

    Human microbiome studies use sequencing technologies to measure the abundance of bacterial species or Operational Taxonomic Units (OTUs) in samples of biological material. Typically the data are organized in contingency tables with OTU counts across heterogeneous biological samples. In the microbial ecology community, ordination methods are frequently used to investigate latent factors or clusters that capture and describe variations of OTU counts across biological samples. It remains important to evaluate how uncertainty in estimates of each biological sample's microbial distribution propagates to ordination analyses, including visualization of clusters and projections of biological samples on low dimensional spaces. We propose a Bayesian analysis for dependent distributions to endow frequently used ordinations with estimates of uncertainty. A Bayesian nonparametric prior for dependent normalized random measures is constructed, which is marginally equivalent to the normalized generalized Gamma process, a well-known prior for nonparametric analyses. In our prior, the dependence and similarity between microbial distributions is represented by latent factors that concentrate in a low dimensional space. We use a shrinkage prior to tune the dimensionality of the latent factors. The resulting posterior samples of model parameters can be used to evaluate uncertainty in analyses routinely applied in microbiome studies. Specifically, by combining them with multivariate data analysis techniques we can visualize credible regions in ecological ordination plots. The characteristics of the proposed model are illustrated through a simulation study and applications in two microbiome datasets.

  7. Detecting cancer clusters in a regional population with local cluster tests and Bayesian smoothing methods: a simulation study

    PubMed Central

    2013-01-01

    Background There is a rising public and political demand for prospective cancer cluster monitoring. But there is little empirical evidence on the performance of established cluster detection tests under conditions of small and heterogeneous sample sizes and varying spatial scales, such as are the case for most existing population-based cancer registries. Therefore this simulation study aims to evaluate different cluster detection methods, implemented in the open soure environment R, in their ability to identify clusters of lung cancer using real-life data from an epidemiological cancer registry in Germany. Methods Risk surfaces were constructed with two different spatial cluster types, representing a relative risk of RR = 2.0 or of RR = 4.0, in relation to the overall background incidence of lung cancer, separately for men and women. Lung cancer cases were sampled from this risk surface as geocodes using an inhomogeneous Poisson process. The realisations of the cancer cases were analysed within small spatial (census tracts, N = 1983) and within aggregated large spatial scales (communities, N = 78). Subsequently, they were submitted to the cluster detection methods. The test accuracy for cluster location was determined in terms of detection rates (DR), false-positive (FP) rates and positive predictive values. The Bayesian smoothing models were evaluated using ROC curves. Results With moderate risk increase (RR = 2.0), local cluster tests showed better DR (for both spatial aggregation scales > 0.90) and lower FP rates (both < 0.05) than the Bayesian smoothing methods. When the cluster RR was raised four-fold, the local cluster tests showed better DR with lower FPs only for the small spatial scale. At a large spatial scale, the Bayesian smoothing methods, especially those implementing a spatial neighbourhood, showed a substantially lower FP rate than the cluster tests. However, the risk increases at this scale were mostly diluted by data aggregation. Conclusion High resolution spatial scales seem more appropriate as data base for cancer cluster testing and monitoring than the commonly used aggregated scales. We suggest the development of a two-stage approach that combines methods with high detection rates as a first-line screening with methods of higher predictive ability at the second stage. PMID:24314148

  8. Evidence of new species for malaria vector Anopheles nuneztovari sensu lato in the Brazilian Amazon region.

    PubMed

    Scarpassa, Vera Margarete; Cunha-Machado, Antonio Saulo; Saraiva, José Ferreira

    2016-04-12

    Anopheles nuneztovari sensu lato comprises cryptic species in northern South America, and the Brazilian populations encompass distinct genetic lineages within the Brazilian Amazon region. This study investigated, based on two molecular markers, whether these lineages might actually deserve species status. Specimens were collected in five localities of the Brazilian Amazon, including Manaus, Careiro Castanho and Autazes, in the State of Amazonas; Tucuruí, in the State of Pará; and Abacate da Pedreira, in the State of Amapá, and analysed for the COI gene (Barcode region) and 12 microsatellite loci. Phylogenetic analyses were performed using the maximum likelihood (ML) approach. Intra and inter samples genetic diversity were estimated using population genetics analyses, and the genetic groups were identified by means of the ML, Bayesian and factorial correspondence analyses and the Bayesian analysis of population structure. The Barcode region dataset (N = 103) generated 27 haplotypes. The haplotype network suggested three lineages. The ML tree retrieved five monophyletic groups. Group I clustered all specimens from Manaus and Careiro Castanho, the majority of Autazes and a few from Abacate da Pedreira. Group II clustered most of the specimens from Abacate da Pedreira and a few from Autazes and Tucuruí. Group III clustered only specimens from Tucuruí (lineage III), strongly supported (97 %). Groups IV and V clustered specimens of A. nuneztovari s.s. and A. dunhami, strongly (98 %) and weakly (70 %) supported, respectively. In the second phylogenetic analysis, the sequences from GenBank, identified as A. goeldii, clustered to groups I and II, but not to group III. Genetic distances (Kimura-2 parameters) among the groups ranged from 1.60 % (between I and II) to 2.32 % (between I and III). Microsatellite data revealed very high intra-population genetic variability. Genetic distances showed the highest and significant values (P = 0.005) between Tucuruí and all the other samples, and between Abacate da Pedreira and all the other samples. Genetic distances, Bayesian (Structure and BAPS) analyses and FCA suggested three distinct biological groups, supporting the barcode region results. The two markers revealed three genetic lineages for A. nuneztovari s.l. in the Brazilian Amazon region. Lineages I and II may represent genetically distinct groups or species within A. goeldii. Lineage III may represent a new species, distinct from the A. goeldii group, and may be the most ancestral in the Brazilian Amazon. They may have differences in Plasmodium susceptibility and should therefore be investigated further.

  9. Bayesian multivariate hierarchical transformation models for ROC analysis.

    PubMed

    O'Malley, A James; Zou, Kelly H

    2006-02-15

    A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box-Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial.

  10. Bayesian multivariate hierarchical transformation models for ROC analysis

    PubMed Central

    O'Malley, A. James; Zou, Kelly H.

    2006-01-01

    SUMMARY A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box–Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial. PMID:16217836

  11. A Hierarchical Bayesian Procedure for Two-Mode Cluster Analysis

    ERIC Educational Resources Information Center

    DeSarbo, Wayne S.; Fong, Duncan K. H.; Liechty, John; Saxton, M. Kim

    2004-01-01

    This manuscript introduces a new Bayesian finite mixture methodology for the joint clustering of row and column stimuli/objects associated with two-mode asymmetric proximity, dominance, or profile data. That is, common clusters are derived which partition both the row and column stimuli/objects simultaneously into the same derived set of clusters.…

  12. Analysis of Genetic Diversity and Structure Pattern of Indigofera Pseudotinctoria in Karst Habitats of the Wushan Mountains Using AFLP Markers.

    PubMed

    Fan, Yan; Zhang, Chenglin; Wu, Wendan; He, Wei; Zhang, Li; Ma, Xiao

    2017-10-16

    Indigofera pseudotinctoria Mats is an agronomically and economically important perennial legume shrub with a high forage yield, protein content and strong adaptability, which is subject to natural habitat fragmentation and serious human disturbance. Until now, our knowledge of the genetic relationships and intraspecific genetic diversity for its wild collections is still poor, especially at small spatial scales. Here amplified fragment length polymorphism (AFLP) technology was employed for analysis of genetic diversity, differentiation, and structure of 364 genotypes of I. pseudotinctoria from 15 natural locations in Wushan Montain, a highly structured mountain with typical karst landforms in Southwest China. We also tested whether eco-climate factors has affected genetic structure by correlating genetic diversity with habitat features. A total of 515 distinctly scoreable bands were generated, and 324 of them were polymorphic. The polymorphic information content (PIC) ranged from 0.694 to 0.890 with an average of 0.789 per primer pair. On species level, Nei's gene diversity ( H j ), the Bayesian genetic diversity index ( H B ) and the Shannon information index ( I ) were 0.2465, 0.2363 and 0.3772, respectively. The high differentiation among all sampling sites was detected ( F ST = 0.2217, G ST = 0.1746, G' ST = 0.2060, θ B = 0.1844), and instead, gene flow among accessions ( N m = 1.1819) was restricted. The population genetic structure resolved by the UPGMA tree, principal coordinate analysis, and Bayesian-based cluster analyses irrefutably grouped all accessions into two distinct clusters, i.e., lowland and highland groups. The population genetic structure resolved by the UPGMA tree, principal coordinate analysis, and Bayesian-based cluster analyses irrefutably grouped all accessions into two distinct clusters, i.e., lowland and highland groups. This structure pattern may indicate joint effects by the neutral evolution and natural selection. Restricted N m was observed across all accessions, and genetic barriers were detected between adjacent accessions due to specifically geographical landform.

  13. Postglacial recolonization history of the European crabapple (Malus sylvestris Mill.), a wild contributor to the domesticated apple.

    PubMed

    Cornille, A; Giraud, T; Bellard, C; Tellier, A; Le Cam, B; Smulders, M J M; Kleinschmit, J; Roldan-Ruiz, I; Gladieux, P

    2013-04-01

    Understanding the way in which the climatic oscillations of the Quaternary Period have shaped the distribution and genetic structure of extant tree species provides insight into the processes driving species diversification, distribution and survival. Deciphering the genetic consequences of past climatic change is also critical for the conservation and sustainable management of forest and tree genetic resources, a timely endeavour as the Earth heads into a period of fast climate change. We used a combination of genetic data and ecological niche models to investigate the historical patterns of biogeographic range expansion of a wild fruit tree, the European crabapple (Malus sylvestris), a wild contributor to the domesticated apple. Both climatic predictions for the last glacial maximum and analyses of microsatellite variation indicated that M. sylvestris experienced range contraction and fragmentation. Bayesian clustering analyses revealed a clear pattern of genetic structure, with one genetic cluster spanning a large area in Western Europe and two other genetic clusters with a more limited distribution range in Eastern Europe, one around the Carpathian Mountains and the other restricted to the Balkan Peninsula. Approximate Bayesian computation appeared to be a powerful technique for inferring the history of these clusters, supporting a scenario of simultaneous differentiation of three separate glacial refugia. Admixture between these three populations was found in their suture zones. A weak isolation by distance pattern was detected within each population, indicating a high extent of historical gene flow for the European crabapple. © 2013 Blackwell Publishing Ltd.

  14. Multilocus microsatellite typing shows three different genetic clusters of Leishmania major in Iran.

    PubMed

    Mahnaz, Tashakori; Al-Jawabreh, Amer; Kuhls, Katrin; Schönian, Gabriele

    2011-10-01

    Ten polymorphic microsatellite markers were used to analyse 25 strains of Leishmania major collected from cutaneous leishmaniasis cases in different endemic areas in Iran. Nine of the markers were polymorphic, revealing 21 different genotypes. The data displayed significant microsatellite polymorphism with rare allelic heterozygosity. Bayesian statistic and distance based analyses identified three genetic clusters among the 25 strains analysed. Cluster I represented mainly strains isolated in the west and south-west of Iran, with the exception of four strains originating from central Iran. Cluster II comprised strains from the central part of Iran, and cluster III included only strains from north Iran. The geographical distribution of L. major in Iran was supported by comparing the microsatellite profiles of the 25 Iranian strains to those of 105 strains collected in 19 Asian and African countries. The Iranian clusters I and II were separated from three previously described populations comprising strains from Africa, the Middle East and Central Asia whereas cluster III grouped together with the Central Asian population. The considerable genetic variability of L. major might be related to the existence of different populations of Phlebotomus papatasi and/or to differences in reservoir host abundance in different parts of Iran. Copyright © 2011 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  15. Analyses of amplified fragment length polymorphisms (AFLP) indicate rapid radiation of Diospyros species (Ebenaceae) endemic to New Caledonia

    PubMed Central

    2013-01-01

    Background Radiation in some plant groups has occurred on islands and due to the characteristic rapid pace of phenotypic evolution, standard molecular markers often provide insufficient variation for phylogenetic reconstruction. To resolve relationships within a clade of 21 closely related New Caledonian Diospyros species and evaluate species boundaries we analysed genome-wide DNA variation via amplified fragment length polymorphisms (AFLP). Results A neighbour-joining (NJ) dendrogram based on Dice distances shows all species except D. minimifolia, D. parviflora and D. vieillardii to form unique clusters of genetically similar accessions. However, there was little variation between these species clusters, resulting in unresolved species relationships and a star-like general NJ topology. Correspondingly, analyses of molecular variance showed more variation within species than between them. A Bayesian analysis with BEAST produced a similar result. Another Bayesian method, this time a clustering method, Structure, demonstrated the presence of two groups, highly congruent with those observed in a principal coordinate analysis (PCO). Molecular divergence between the two groups is low and does not correspond to any hypothesised taxonomic, ecological or geographical patterns. Conclusions We hypothesise that such a pattern could have been produced by rapid and complex evolution involving a widespread progenitor for which an initial split into two groups was followed by subsequent fragmentation into many diverging populations, which was followed by range expansion of then divergent entities. Overall, this process resulted in an opportunistic pattern of phenotypic diversification. The time since divergence was probably insufficient for some species to become genetically well-differentiated, resulting in progenitor/derivative relationships being exhibited in a few cases. In other cases, our analyses may have revealed evidence for the existence of cryptic species, for which more study of morphology and ecology are now required. PMID:24330478

  16. Comparison of Bayesian clustering and edge detection methods for inferring boundaries in landscape genetics

    USGS Publications Warehouse

    Safner, T.; Miller, M.P.; McRae, B.H.; Fortin, M.-J.; Manel, S.

    2011-01-01

    Recently, techniques available for identifying clusters of individuals or boundaries between clusters using genetic data from natural populations have expanded rapidly. Consequently, there is a need to evaluate these different techniques. We used spatially-explicit simulation models to compare three spatial Bayesian clustering programs and two edge detection methods. Spatially-structured populations were simulated where a continuous population was subdivided by barriers. We evaluated the ability of each method to correctly identify boundary locations while varying: (i) time after divergence, (ii) strength of isolation by distance, (iii) level of genetic diversity, and (iv) amount of gene flow across barriers. To further evaluate the methods' effectiveness to detect genetic clusters in natural populations, we used previously published data on North American pumas and a European shrub. Our results show that with simulated and empirical data, the Bayesian spatial clustering algorithms outperformed direct edge detection methods. All methods incorrectly detected boundaries in the presence of strong patterns of isolation by distance. Based on this finding, we support the application of Bayesian spatial clustering algorithms for boundary detection in empirical datasets, with necessary tests for the influence of isolation by distance. ?? 2011 by the authors; licensee MDPI, Basel, Switzerland.

  17. Genomic signature of successful colonization of Eurasia by the allopolyploid shepherd's purse (Capsella bursa-pastoris).

    PubMed

    Cornille, A; Salcedo, A; Kryvokhyzha, D; Glémin, S; Holm, K; Wright, S I; Lascoux, M

    2016-01-01

    Polyploidization is a dominant feature of flowering plant evolution. However, detailed genomic analyses of the interpopulation diversification of polyploids following genome duplication are still in their infancy, mainly because of methodological limits, both in terms of sequencing and computational analyses. The shepherd's purse (Capsella bursa-pastoris) is one of the most common weed species in the world. It is highly self-fertilizing, and recent genomic data indicate that it is an allopolyploid, resulting from hybridization between the ancestors of the diploid species Capsella grandiflora and Capsella orientalis. Here, we investigated the genomic diversity of C. bursa-pastoris, its population structure and demographic history, following allopolyploidization in Eurasia. To that end, we genotyped 261 C. bursa-pastoris accessions spread across Europe, the Middle East and Asia, using genotyping-by-sequencing, leading to a total of 4274 SNPs after quality control. Bayesian clustering analyses revealed three distinct genetic clusters in Eurasia: one cluster grouping samples from Western Europe and Southeastern Siberia, the second one centred on Eastern Asia and the third one in the Middle East. Approximate Bayesian computation (ABC) supported the hypothesis that C. bursa-pastoris underwent a typical colonization history involving low gene flow among colonizing populations, likely starting from the Middle East towards Europe and followed by successive human-mediated expansions into Eastern Asia. Altogether, these findings bring new insights into the recent multistage colonization history of the allotetraploid C. bursa-pastoris and highlight ABC and genotyping-by-sequencing data as promising but still challenging tools to infer demographic histories of selfing allopolyploids. © 2015 John Wiley & Sons Ltd.

  18. Slicing cluster mass functions with a Bayesian razor

    NASA Astrophysics Data System (ADS)

    Sealfon, C. D.

    2010-08-01

    We apply a Bayesian ``razor" to forecast Bayes factors between different parameterizations of the galaxy cluster mass function. To demonstrate this approach, we calculate the minimum size N-body simulation needed for strong evidence favoring a two-parameter mass function over one-parameter mass functions and visa versa, as a function of the minimum cluster mass.

  19. A Bayesian hierarchical model for mortality data from cluster-sampling household surveys in humanitarian crises.

    PubMed

    Heudtlass, Peter; Guha-Sapir, Debarati; Speybroeck, Niko

    2018-05-31

    The crude death rate (CDR) is one of the defining indicators of humanitarian emergencies. When data from vital registration systems are not available, it is common practice to estimate the CDR from household surveys with cluster-sampling design. However, sample sizes are often too small to compare mortality estimates to emergency thresholds, at least in a frequentist framework. Several authors have proposed Bayesian methods for health surveys in humanitarian crises. Here, we develop an approach specifically for mortality data and cluster-sampling surveys. We describe a Bayesian hierarchical Poisson-Gamma mixture model with generic (weakly informative) priors that could be used as default in absence of any specific prior knowledge, and compare Bayesian and frequentist CDR estimates using five different mortality datasets. We provide an interpretation of the Bayesian estimates in the context of an emergency threshold and demonstrate how to interpret parameters at the cluster level and ways in which informative priors can be introduced. With the same set of weakly informative priors, Bayesian CDR estimates are equivalent to frequentist estimates, for all practical purposes. The probability that the CDR surpasses the emergency threshold can be derived directly from the posterior of the mean of the mixing distribution. All observation in the datasets contribute to the estimation of cluster-level estimates, through the hierarchical structure of the model. In a context of sparse data, Bayesian mortality assessments have advantages over frequentist ones already when using only weakly informative priors. More informative priors offer a formal and transparent way of combining new data with existing data and expert knowledge and can help to improve decision-making in humanitarian crises by complementing frequentist estimates.

  20. Evaluation of the procedure 1A component of the 1980 US/Canada wheat and barley exploratory experiment

    NASA Technical Reports Server (NTRS)

    Chapman, G. M. (Principal Investigator); Carnes, J. G.

    1981-01-01

    Several techniques which use clusters generated by a new clustering algorithm, CLASSY, are proposed as alternatives to random sampling to obtain greater precision in crop proportion estimation: (1) Proportional Allocation/relative count estimator (PA/RCE) uses proportional allocation of dots to clusters on the basis of cluster size and a relative count cluster level estimate; (2) Proportional Allocation/Bayes Estimator (PA/BE) uses proportional allocation of dots to clusters and a Bayesian cluster-level estimate; and (3) Bayes Sequential Allocation/Bayesian Estimator (BSA/BE) uses sequential allocation of dots to clusters and a Bayesian cluster level estimate. Clustering in an effective method in making proportion estimates. It is estimated that, to obtain the same precision with random sampling as obtained by the proportional sampling of 50 dots with an unbiased estimator, samples of 85 or 166 would need to be taken if dot sets with AI labels (integrated procedure) or ground truth labels, respectively were input. Dot reallocation provides dot sets that are unbiased. It is recommended that these proportion estimation techniques are maintained, particularly the PA/BE because it provides the greatest precision.

  1. Estimating relative risks in multicenter studies with a small number of centers - which methods to use? A simulation study.

    PubMed

    Pedroza, Claudia; Truong, Van Thi Thanh

    2017-11-02

    Analyses of multicenter studies often need to account for center clustering to ensure valid inference. For binary outcomes, it is particularly challenging to properly adjust for center when the number of centers or total sample size is small, or when there are few events per center. Our objective was to evaluate the performance of generalized estimating equation (GEE) log-binomial and Poisson models, generalized linear mixed models (GLMMs) assuming binomial and Poisson distributions, and a Bayesian binomial GLMM to account for center effect in these scenarios. We conducted a simulation study with few centers (≤30) and 50 or fewer subjects per center, using both a randomized controlled trial and an observational study design to estimate relative risk. We compared the GEE and GLMM models with a log-binomial model without adjustment for clustering in terms of bias, root mean square error (RMSE), and coverage. For the Bayesian GLMM, we used informative neutral priors that are skeptical of large treatment effects that are almost never observed in studies of medical interventions. All frequentist methods exhibited little bias, and the RMSE was very similar across the models. The binomial GLMM had poor convergence rates, ranging from 27% to 85%, but performed well otherwise. The results show that both GEE models need to use small sample corrections for robust SEs to achieve proper coverage of 95% CIs. The Bayesian GLMM had similar convergence rates but resulted in slightly more biased estimates for the smallest sample sizes. However, it had the smallest RMSE and good coverage across all scenarios. These results were very similar for both study designs. For the analyses of multicenter studies with a binary outcome and few centers, we recommend adjustment for center with either a GEE log-binomial or Poisson model with appropriate small sample corrections or a Bayesian binomial GLMM with informative priors.

  2. Genetic Structure of Bluefin Tuna in the Mediterranean Sea Correlates with Environmental Variables

    PubMed Central

    Riccioni, Giulia; Stagioni, Marco; Landi, Monica; Ferrara, Giorgia; Barbujani, Guido; Tinti, Fausto

    2013-01-01

    Background Atlantic Bluefin Tuna (ABFT) shows complex demography and ecological variation in the Mediterranean Sea. Genetic surveys have detected significant, although weak, signals of population structuring; catch series analyses and tagging programs identified complex ABFT spatial dynamics and migration patterns. Here, we tested the hypothesis that the genetic structure of the ABFT in the Mediterranean is correlated with mean surface temperature and salinity. Methodology We used six samples collected from Western and Central Mediterranean integrated with a new sample collected from the recently identified easternmost reproductive area of Levantine Sea. To assess population structure in the Mediterranean we used a multidisciplinary framework combining classical population genetics, spatial and Bayesian clustering methods and a multivariate approach based on factor analysis. Conclusions FST analysis and Bayesian clustering methods detected several subpopulations in the Mediterranean, a result also supported by multivariate analyses. In addition, we identified significant correlations of genetic diversity with mean salinity and surface temperature values revealing that ABFT is genetically structured along two environmental gradients. These results suggest that a preference for some spawning habitat conditions could contribute to shape ABFT genetic structuring in the Mediterranean. However, further studies should be performed to assess to what extent ABFT spawning behaviour in the Mediterranean Sea can be affected by environmental variation. PMID:24260341

  3. Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data

    PubMed Central

    Tian, Ting; McLachlan, Geoffrey J.; Dieters, Mark J.; Basford, Kaye E.

    2015-01-01

    It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances. PMID:26689369

  4. Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data.

    PubMed

    Tian, Ting; McLachlan, Geoffrey J; Dieters, Mark J; Basford, Kaye E

    2015-01-01

    It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.

  5. Multiple independent introductions of Plasmodium falciparum in South America

    PubMed Central

    Yalcindag, Erhan; Elguero, Eric; Arnathau, Céline; Durand, Patrick; Akiana, Jean; Anderson, Timothy J.; Aubouy, Agnes; Balloux, François; Besnard, Patrick; Bogreau, Hervé; Carnevale, Pierre; D'Alessandro, Umberto; Fontenille, Didier; Gamboa, Dionicia; Jombart, Thibaut; Le Mire, Jacques; Leroy, Eric; Maestre, Amanda; Mayxay, Mayfong; Ménard, Didier; Musset, Lise; Newton, Paul N.; Nkoghé, Dieudonné; Noya, Oscar; Ollomo, Benjamin; Rogier, Christophe; Veron, Vincent; Wide, Albina; Zakeri, Sedigheh; Carme, Bernard; Legrand, Eric; Chevillon, Christine; Ayala, Francisco J.; Renaud, François; Prugnolle, Franck

    2012-01-01

    The origin of Plasmodium falciparum in South America is controversial. Some studies suggest a recent introduction during the European colonizations and the transatlantic slave trade. Other evidence—archeological and genetic—suggests a much older origin. We collected and analyzed P. falciparum isolates from different regions of the world, encompassing the distribution range of the parasite, including populations from sub-Saharan Africa, the Middle East, Southeast Asia, and South America. Analyses of microsatellite and SNP polymorphisms show that the populations of P. falciparum in South America are subdivided in two main genetic clusters (northern and southern). Phylogenetic analyses, as well as Approximate Bayesian Computation methods suggest independent introductions of the two clusters from African sources. Our estimates of divergence time between the South American populations and their likely sources favor a likely introduction from Africa during the transatlantic slave trade. PMID:22203975

  6. Clinical Outcome Prediction in Aneurysmal Subarachnoid Hemorrhage Using Bayesian Neural Networks with Fuzzy Logic Inferences

    PubMed Central

    Lo, Benjamin W. Y.; Macdonald, R. Loch; Baker, Andrew; Levine, Mitchell A. H.

    2013-01-01

    Objective. The novel clinical prediction approach of Bayesian neural networks with fuzzy logic inferences is created and applied to derive prognostic decision rules in cerebral aneurysmal subarachnoid hemorrhage (aSAH). Methods. The approach of Bayesian neural networks with fuzzy logic inferences was applied to data from five trials of Tirilazad for aneurysmal subarachnoid hemorrhage (3551 patients). Results. Bayesian meta-analyses of observational studies on aSAH prognostic factors gave generalizable posterior distributions of population mean log odd ratios (ORs). Similar trends were noted in Bayesian and linear regression ORs. Significant outcome predictors include normal motor response, cerebral infarction, history of myocardial infarction, cerebral edema, history of diabetes mellitus, fever on day 8, prior subarachnoid hemorrhage, admission angiographic vasospasm, neurological grade, intraventricular hemorrhage, ruptured aneurysm size, history of hypertension, vasospasm day, age and mean arterial pressure. Heteroscedasticity was present in the nontransformed dataset. Artificial neural networks found nonlinear relationships with 11 hidden variables in 1 layer, using the multilayer perceptron model. Fuzzy logic decision rules (centroid defuzzification technique) denoted cut-off points for poor prognosis at greater than 2.5 clusters. Discussion. This aSAH prognostic system makes use of existing knowledge, recognizes unknown areas, incorporates one's clinical reasoning, and compensates for uncertainty in prognostication. PMID:23690884

  7. Identifying and characterizing hepatitis C virus hotspots in Massachusetts: a spatial epidemiological approach.

    PubMed

    Stopka, Thomas J; Goulart, Michael A; Meyers, David J; Hutcheson, Marga; Barton, Kerri; Onofrey, Shauna; Church, Daniel; Donahue, Ashley; Chui, Kenneth K H

    2017-04-20

    Hepatitis C virus (HCV) infections have increased during the past decade but little is known about geographic clustering patterns. We used a unique analytical approach, combining geographic information systems (GIS), spatial epidemiology, and statistical modeling to identify and characterize HCV hotspots, statistically significant clusters of census tracts with elevated HCV counts and rates. We compiled sociodemographic and HCV surveillance data (n = 99,780 cases) for Massachusetts census tracts (n = 1464) from 2002 to 2013. We used a five-step spatial epidemiological approach, calculating incremental spatial autocorrelations and Getis-Ord Gi* statistics to identify clusters. We conducted logistic regression analyses to determine factors associated with the HCV hotspots. We identified nine HCV clusters, with the largest in Boston, New Bedford/Fall River, Worcester, and Springfield (p < 0.05). In multivariable analyses, we found that HCV hotspots were independently and positively associated with the percent of the population that was Hispanic (adjusted odds ratio [AOR]: 1.07; 95% confidence interval [CI]: 1.04, 1.09) and the percent of households receiving food stamps (AOR: 1.83; 95% CI: 1.22, 2.74). HCV hotspots were independently and negatively associated with the percent of the population that were high school graduates or higher (AOR: 0.91; 95% CI: 0.89, 0.93) and the percent of the population in the "other" race/ethnicity category (AOR: 0.88; 95% CI: 0.85, 0.91). We identified locations where HCV clusters were a concern, and where enhanced HCV prevention, treatment, and care can help combat the HCV epidemic in Massachusetts. GIS, spatial epidemiological and statistical analyses provided a rigorous approach to identify hotspot clusters of disease, which can inform public health policy and intervention targeting. Further studies that incorporate spatiotemporal cluster analyses, Bayesian spatial and geostatistical models, spatially weighted regression analyses, and assessment of associations between HCV clustering and the built environment are needed to expand upon our combined spatial epidemiological and statistical methods.

  8. A semiparametric Bayesian proportional hazards model for interval censored data with frailty effects.

    PubMed

    Henschel, Volkmar; Engel, Jutta; Hölzel, Dieter; Mansmann, Ulrich

    2009-02-10

    Multivariate analysis of interval censored event data based on classical likelihood methods is notoriously cumbersome. Likelihood inference for models which additionally include random effects are not available at all. Developed algorithms bear problems for practical users like: matrix inversion, slow convergence, no assessment of statistical uncertainty. MCMC procedures combined with imputation are used to implement hierarchical models for interval censored data within a Bayesian framework. Two examples from clinical practice demonstrate the handling of clustered interval censored event times as well as multilayer random effects for inter-institutional quality assessment. The software developed is called survBayes and is freely available at CRAN. The proposed software supports the solution of complex analyses in many fields of clinical epidemiology as well as health services research.

  9. Evaluating Mixture Modeling for Clustering: Recommendations and Cautions

    ERIC Educational Resources Information Center

    Steinley, Douglas; Brusco, Michael J.

    2011-01-01

    This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison,…

  10. Three sympatric clusters of the malaria vector Anopheles culicifacies E (Diptera: Culicidae) detected in Sri Lanka.

    PubMed

    Harischandra, Iresha Nilmini; Dassanayake, Ranil Samantha; De Silva, Bambaranda Gammacharige Don Nissanka Kolitha

    2016-01-04

    The disease re-emergence threat from the major malaria vector in Sri Lanka, Anopheles culicifacies, is currently increasing. To predict malaria vector dynamics, knowledge of population genetics and gene flow is required, but this information is unavailable for Sri Lanka. This study was carried out to determine the population structure of An. culicifacies E in Sri Lanka. Eight microsatellite markers were used to examine An. culicifacies E collected from six sites in Sri Lanka during 2010-2012. Standard population genetic tests and analyses, genetic differentiation, Hardy-Weinberg equilibrium, linkage disequilibrium, Bayesian cluster analysis, AMOVA, SAMOVA and isolation-by-distance were conducted using five polymorphic loci. Five microsatellite loci were highly polymorphic with high allelic richness. Hardy-Weinberg Equilibrium (HWE) was significantly rejected for four loci with positive F(IS) values in the pooled population (p < 0.0100). Three loci showed high deviations in all sites except Kataragama, which was in agreement with HWE for all loci except one locus (p < 0.0016). Observed heterozygosity was less than the expected values for all sites except Kataragama, where reported negative F(IS) values indicated a heterozygosity excess. Genetic differentiation was observed for all sampling site pairs and was not supported by the isolation by distance model. Bayesian clustering analysis identified the presence of three sympatric clusters (gene pools) in the studied population. Significant genetic differentiation was detected in cluster pairs with low gene flow and isolation by distance was not detected between clusters. Furthermore, the results suggested the presence of a barrier to gene flow that divided the populations into two parts with the central hill region of Sri Lanka as the dividing line. Three sympatric clusters were detected among An. culicifacies E specimens isolated in Sri Lanka. There was no effect of geographic distance on genetic differentiation and the central mountain ranges in Sri Lanka appeared to be a barrier to gene flow.

  11. Astrostatistical Analysis in Solar and Stellar Physics

    NASA Astrophysics Data System (ADS)

    Stenning, David Craig

    This dissertation focuses on developing statistical models and methods to address data-analytic challenges in astrostatistics---a growing interdisciplinary field fostering collaborations between statisticians and astrophysicists. The astrostatistics projects we tackle can be divided into two main categories: modeling solar activity and Bayesian analysis of stellar evolution. These categories from Part I and Part II of this dissertation, respectively. The first line of research we pursue involves classification and modeling of evolving solar features. Advances in space-based observatories are increasing both the quality and quantity of solar data, primarily in the form of high-resolution images. To analyze massive streams of solar image data, we develop a science-driven dimension reduction methodology to extract scientifically meaningful features from images. This methodology utilizes mathematical morphology to produce a concise numerical summary of the magnetic flux distribution in solar "active regions'' that (i) is far easier to work with than the source images, (ii) encapsulates scientifically relevant information in a more informative manner than existing schemes (i.e., manual classification schemes), and (iii) is amenable to sophisticated statistical analyses. In a related line of research, we perform a Bayesian analysis of the solar cycle using multiple proxy variables, such as sunspot numbers. We take advantage of patterns and correlations among the proxy variables to model solar activity using data from proxies that have become available more recently, while also taking advantage of the long history of observations of sunspot numbers. This model is an extension of the Yu et al. (2012) Bayesian hierarchical model for the solar cycle that used the sunspot numbers alone. Since proxies have different temporal coverage, we devise a multiple imputation scheme to account for missing data. We find that incorporating multiple proxies reveals important features of the solar cycle that are missed when the model is fit using only the sunspot numbers. In Part II of this dissertation we focus on two related lines of research involving Bayesian analysis of stellar evolution. We first focus on modeling multiple stellar populations in star clusters. It has long been assumed that all star clusters are comprised of single stellar populations---stars that formed at roughly the same time from a common molecular cloud. However, recent studies have produced evidence that some clusters host multiple populations, which has far-reaching scientific implications. We develop a Bayesian hierarchical model for multiple-population star clusters, extending earlier statistical models of stellar evolution (e.g., van Dyk et al. 2009, Stein et al. 2013). We also devise an adaptive Markov chain Monte Carlo algorithm to explore the complex posterior distribution. We use numerical studies to demonstrate that our method can recover parameters of multiple-population clusters, and also show how model misspecification can be diagnosed. Our model and computational tools are incorporated into an open-source software suite known as BASE-9. We also explore statistical properties of the estimators and determine that the influence of the prior distribution does not diminish with larger sample sizes, leading to non-standard asymptotics. In a final line of research, we present the first-ever attempt to estimate the carbon fraction of white dwarfs. This quantity has important implications for both astrophysics and fundamental nuclear physics, but is currently unknown. We use a numerical study to demonstrate that assuming an incorrect value for the carbon fraction leads to incorrect white-dwarf ages of star clusters. Finally, we present our attempt to estimate the carbon fraction of the white dwarfs in the well-studied star cluster 47 Tucanae.

  12. Learning Bayesian Networks from Correlated Data

    NASA Astrophysics Data System (ADS)

    Bae, Harold; Monti, Stefano; Montano, Monty; Steinberg, Martin H.; Perls, Thomas T.; Sebastiani, Paola

    2016-05-01

    Bayesian networks are probabilistic models that represent complex distributions in a modular way and have become very popular in many fields. There are many methods to build Bayesian networks from a random sample of independent and identically distributed observations. However, many observational studies are designed using some form of clustered sampling that introduces correlations between observations within the same cluster and ignoring this correlation typically inflates the rate of false positive associations. We describe a novel parameterization of Bayesian networks that uses random effects to model the correlation within sample units and can be used for structure and parameter learning from correlated data without inflating the Type I error rate. We compare different learning metrics using simulations and illustrate the method in two real examples: an analysis of genetic and non-genetic factors associated with human longevity from a family-based study, and an example of risk factors for complications of sickle cell anemia from a longitudinal study with repeated measures.

  13. Robust Bayesian clustering.

    PubMed

    Archambeau, Cédric; Verleysen, Michel

    2007-01-01

    A new variational Bayesian learning algorithm for Student-t mixture models is introduced. This algorithm leads to (i) robust density estimation, (ii) robust clustering and (iii) robust automatic model selection. Gaussian mixture models are learning machines which are based on a divide-and-conquer approach. They are commonly used for density estimation and clustering tasks, but are sensitive to outliers. The Student-t distribution has heavier tails than the Gaussian distribution and is therefore less sensitive to any departure of the empirical distribution from Gaussianity. As a consequence, the Student-t distribution is suitable for constructing robust mixture models. In this work, we formalize the Bayesian Student-t mixture model as a latent variable model in a different way from Svensén and Bishop [Svensén, M., & Bishop, C. M. (2005). Robust Bayesian mixture modelling. Neurocomputing, 64, 235-252]. The main difference resides in the fact that it is not necessary to assume a factorized approximation of the posterior distribution on the latent indicator variables and the latent scale variables in order to obtain a tractable solution. Not neglecting the correlations between these unobserved random variables leads to a Bayesian model having an increased robustness. Furthermore, it is expected that the lower bound on the log-evidence is tighter. Based on this bound, the model complexity, i.e. the number of components in the mixture, can be inferred with a higher confidence.

  14. Pan-genome and phylogeny of Bacillus cereus sensu lato.

    PubMed

    Bazinet, Adam L

    2017-08-02

    Bacillus cereus sensu lato (s. l.) is an ecologically diverse bacterial group of medical and agricultural significance. In this study, I use publicly available genomes and novel bioinformatic workflows to characterize the B. cereus s. l. pan-genome and perform the largest phylogenetic and population genetic analyses of this group to date in terms of the number of genes and taxa included. With these fundamental data in hand, I identify genes associated with particular phenotypic traits (i.e., "pan-GWAS" analysis), and quantify the degree to which taxa sharing common attributes are phylogenetically clustered. A rapid k-mer based approach (Mash) was used to create reduced representations of selected Bacillus genomes, and a fast distance-based phylogenetic analysis of this data (FastME) was performed to determine which species should be included in B. cereus s. l. The complete genomes of eight B. cereus s. l. species were annotated de novo with Prokka, and these annotations were used by Roary to produce the B. cereus s. l. pan-genome. Scoary was used to associate gene presence and absence patterns with various phenotypes. The orthologous protein sequence clusters produced by Roary were filtered and used to build HaMStR databases of gene models that were used in turn to construct phylogenetic data matrices. Phylogenetic analyses used RAxML, DendroPy, ClonalFrameML, PAUP*, and SplitsTree. Bayesian model-based population genetic analysis assigned taxa to clusters using hierBAPS. The genealogical sorting index was used to quantify the phylogenetic clustering of taxa sharing common attributes. The B. cereus s. l. pan-genome currently consists of ≈60,000 genes, ≈600 of which are "core" (common to at least 99% of taxa sampled). Pan-GWAS analysis revealed genes associated with phenotypes such as isolation source, oxygen requirement, and ability to cause diseases such as anthrax or food poisoning. Extensive phylogenetic analyses using an unprecedented amount of data produced phylogenies that were largely concordant with each other and with previous studies. Phylogenetic support as measured by bootstrap probabilities increased markedly when all suitable pan-genome data was included in phylogenetic analyses, as opposed to when only core genes were used. Bayesian population genetic analysis recommended subdividing the three major clades of B. cereus s. l. into nine clusters. Taxa sharing common traits and species designations exhibited varying degrees of phylogenetic clustering. All phylogenetic analyses recapitulated two previously used classification systems, and taxa were consistently assigned to the same major clade and group. By including accessory genes from the pan-genome in the phylogenetic analyses, I produced an exceptionally well-supported phylogeny of 114 complete B. cereus s. l. genomes. The best-performing methods were used to produce a phylogeny of all 498 publicly available B. cereus s. l. genomes, which was in turn used to compare three different classification systems and to test the monophyly status of various B. cereus s. l. species. The majority of the methodology used in this study is generic and could be leveraged to produce pan-genome estimates and similarly robust phylogenetic hypotheses for other bacterial groups.

  15. SOMBI: Bayesian identification of parameter relations in unstructured cosmological data

    NASA Astrophysics Data System (ADS)

    Frank, Philipp; Jasche, Jens; Enßlin, Torsten A.

    2016-11-01

    This work describes the implementation and application of a correlation determination method based on self organizing maps and Bayesian inference (SOMBI). SOMBI aims to automatically identify relations between different observed parameters in unstructured cosmological or astrophysical surveys by automatically identifying data clusters in high-dimensional datasets via the self organizing map neural network algorithm. Parameter relations are then revealed by means of a Bayesian inference within respective identified data clusters. Specifically such relations are assumed to be parametrized as a polynomial of unknown order. The Bayesian approach results in a posterior probability distribution function for respective polynomial coefficients. To decide which polynomial order suffices to describe correlation structures in data, we include a method for model selection, the Bayesian information criterion, to the analysis. The performance of the SOMBI algorithm is tested with mock data. As illustration we also provide applications of our method to cosmological data. In particular, we present results of a correlation analysis between galaxy and active galactic nucleus (AGN) properties provided by the SDSS catalog with the cosmic large-scale-structure (LSS). The results indicate that the combined galaxy and LSS dataset indeed is clustered into several sub-samples of data with different average properties (for example different stellar masses or web-type classifications). The majority of data clusters appear to have a similar correlation structure between galaxy properties and the LSS. In particular we revealed a positive and linear dependency between the stellar mass, the absolute magnitude and the color of a galaxy with the corresponding cosmic density field. A remaining subset of data shows inverted correlations, which might be an artifact of non-linear redshift distortions.

  16. Spatiotemporal Phylogenetic Analysis and Molecular Characterisation of Infectious Bursal Disease Viruses Based on the VP2 Hyper-Variable Region

    PubMed Central

    Dolz, Roser; Valle, Rosa; Perera, Carmen L.; Bertran, Kateri; Frías, Maria T.; Majó, Natàlia; Ganges, Llilianne; Pérez, Lester J.

    2013-01-01

    Background Infectious bursal disease is a highly contagious and acute viral disease caused by the infectious bursal disease virus (IBDV); it affects all major poultry producing areas of the world. The current study was designed to rigorously measure the global phylogeographic dynamics of IBDV strains to gain insight into viral population expansion as well as the emergence, spread and pattern of the geographical structure of very virulent IBDV (vvIBDV) strains. Methodology/Principal Findings Sequences of the hyper-variable region of the VP2 (HVR-VP2) gene from IBDV strains isolated from diverse geographic locations were obtained from the GenBank database; Cuban sequences were obtained in the current work. All sequences were analysed by Bayesian phylogeographic analysis, implemented in the Bayesian Evolutionary Analysis Sampling Trees (BEAST), Bayesian Tip-association Significance testing (BaTS) and Spatial Phylogenetic Reconstruction of Evolutionary Dynamics (SPREAD) software packages. Selection pressure on the HVR-VP2 was also assessed. The phylogeographic association-trait analysis showed that viruses sampled from individual countries tend to cluster together, suggesting a geographic pattern for IBDV strains. Spatial analysis from this study revealed that strains carrying sequences that were linked to increased virulence of IBDV appeared in Iran in 1981 and spread to Western Europe (Belgium) in 1987, Africa (Egypt) around 1990, East Asia (China and Japan) in 1993, the Caribbean Region (Cuba) by 1995 and South America (Brazil) around 2000. Selection pressure analysis showed that several codons in the HVR-VP2 region were under purifying selection. Conclusions/Significance To our knowledge, this work is the first study applying the Bayesian phylogeographic reconstruction approach to analyse the emergence and spread of vvIBDV strains worldwide. PMID:23805195

  17. Spatiotemporal Phylogenetic Analysis and Molecular Characterisation of Infectious Bursal Disease Viruses Based on the VP2 Hyper-Variable Region.

    PubMed

    Alfonso-Morales, Abdulahi; Martínez-Pérez, Orlando; Dolz, Roser; Valle, Rosa; Perera, Carmen L; Bertran, Kateri; Frías, Maria T; Majó, Natàlia; Ganges, Llilianne; Pérez, Lester J

    2013-01-01

    Infectious bursal disease is a highly contagious and acute viral disease caused by the infectious bursal disease virus (IBDV); it affects all major poultry producing areas of the world. The current study was designed to rigorously measure the global phylogeographic dynamics of IBDV strains to gain insight into viral population expansion as well as the emergence, spread and pattern of the geographical structure of very virulent IBDV (vvIBDV) strains. Sequences of the hyper-variable region of the VP2 (HVR-VP2) gene from IBDV strains isolated from diverse geographic locations were obtained from the GenBank database; Cuban sequences were obtained in the current work. All sequences were analysed by Bayesian phylogeographic analysis, implemented in the Bayesian Evolutionary Analysis Sampling Trees (BEAST), Bayesian Tip-association Significance testing (BaTS) and Spatial Phylogenetic Reconstruction of Evolutionary Dynamics (SPREAD) software packages. Selection pressure on the HVR-VP2 was also assessed. The phylogeographic association-trait analysis showed that viruses sampled from individual countries tend to cluster together, suggesting a geographic pattern for IBDV strains. Spatial analysis from this study revealed that strains carrying sequences that were linked to increased virulence of IBDV appeared in Iran in 1981 and spread to Western Europe (Belgium) in 1987, Africa (Egypt) around 1990, East Asia (China and Japan) in 1993, the Caribbean Region (Cuba) by 1995 and South America (Brazil) around 2000. Selection pressure analysis showed that several codons in the HVR-VP2 region were under purifying selection. To our knowledge, this work is the first study applying the Bayesian phylogeographic reconstruction approach to analyse the emergence and spread of vvIBDV strains worldwide.

  18. Assessing Genetic Structure in Common but Ecologically Distinct Carnivores: The Stone Marten and Red Fox.

    PubMed

    Basto, Mafalda P; Santos-Reis, Margarida; Simões, Luciana; Grilo, Clara; Cardoso, Luís; Cortes, Helder; Bruford, Michael W; Fernandes, Carlos

    2016-01-01

    The identification of populations and spatial genetic patterns is important for ecological and conservation research, and spatially explicit individual-based methods have been recognised as powerful tools in this context. Mammalian carnivores are intrinsically vulnerable to habitat fragmentation but not much is known about the genetic consequences of fragmentation in common species. Stone martens (Martes foina) and red foxes (Vulpes vulpes) share a widespread Palearctic distribution and are considered habitat generalists, but in the Iberian Peninsula stone martens tend to occur in higher quality habitats. We compared their genetic structure in Portugal to see if they are consistent with their differences in ecological plasticity, and also to illustrate an approach to explicitly delineate the spatial boundaries of consistently identified genetic units. We analysed microsatellite data using spatial Bayesian clustering methods (implemented in the software BAPS, GENELAND and TESS), a progressive partitioning approach and a multivariate technique (Spatial Principal Components Analysis-sPCA). Three consensus Bayesian clusters were identified for the stone marten. No consensus was achieved for the red fox, but one cluster was the most probable clustering solution. Progressive partitioning and sPCA suggested additional clusters in the stone marten but they were not consistent among methods and were geographically incoherent. The contrasting results between the two species are consistent with the literature reporting stricter ecological requirements of the stone marten in the Iberian Peninsula. The observed genetic structure in the stone marten may have been influenced by landscape features, particularly rivers, and fragmentation. We suggest that an approach based on a consensus clustering solution of multiple different algorithms may provide an objective and effective means to delineate potential boundaries of inferred subpopulations. sPCA and progressive partitioning offer further verification of possible population structure and may be useful for revealing cryptic spatial genetic patterns worth further investigation.

  19. Genome-wide SNP discovery and population structure analysis in pepper (Capsicum annuum) using genotyping by sequencing.

    PubMed

    Taranto, F; D'Agostino, N; Greco, B; Cardi, T; Tripodi, P

    2016-11-21

    Knowledge on population structure and genetic diversity in vegetable crops is essential for association mapping studies and genomic selection. Genotyping by sequencing (GBS) represents an innovative method for large scale SNP detection and genotyping of genetic resources. Herein we used the GBS approach for the genome-wide identification of SNPs in a collection of Capsicum spp. accessions and for the assessment of the level of genetic diversity in a subset of 222 cultivated pepper (Capsicum annum) genotypes. GBS analysis generated a total of 7,568,894 master tags, of which 43.4% uniquely aligned to the reference genome CM334. A total of 108,591 SNP markers were identified, of which 105,184 were in C. annuum accessions. In order to explore the genetic diversity of C. annuum and to select a minimal core set representing most of the total genetic variation with minimum redundancy, a subset of 222 C. annuum accessions were analysed using 32,950 high quality SNPs. Based on Bayesian and Hierarchical clustering it was possible to divide the collection into three clusters. Cluster I had the majority of varieties and landraces mainly from Southern and Northern Italy, and from Eastern Europe, whereas clusters II and III comprised accessions of different geographical origins. Considering the genome-wide genetic variation among the accessions included in cluster I, a second round of Bayesian (K = 3) and Hierarchical (K = 2) clustering was performed. These analysis showed that genotypes were grouped not only based on geographical origin, but also on fruit-related features. GBS data has proven useful to assess the genetic diversity in a collection of C. annuum accessions. The high number of SNP markers, uniformly distributed on the 12 chromosomes, allowed the accessions to be distinguished according to geographical origin and fruit-related features. SNP markers and information on population structure developed in this study will undoubtedly support genome-wide association mapping studies and marker-assisted selection programs.

  20. Assessing Genetic Structure in Common but Ecologically Distinct Carnivores: The Stone Marten and Red Fox

    PubMed Central

    Basto, Mafalda P.; Santos-Reis, Margarida; Simões, Luciana; Grilo, Clara; Cardoso, Luís; Cortes, Helder; Bruford, Michael W.; Fernandes, Carlos

    2016-01-01

    The identification of populations and spatial genetic patterns is important for ecological and conservation research, and spatially explicit individual-based methods have been recognised as powerful tools in this context. Mammalian carnivores are intrinsically vulnerable to habitat fragmentation but not much is known about the genetic consequences of fragmentation in common species. Stone martens (Martes foina) and red foxes (Vulpes vulpes) share a widespread Palearctic distribution and are considered habitat generalists, but in the Iberian Peninsula stone martens tend to occur in higher quality habitats. We compared their genetic structure in Portugal to see if they are consistent with their differences in ecological plasticity, and also to illustrate an approach to explicitly delineate the spatial boundaries of consistently identified genetic units. We analysed microsatellite data using spatial Bayesian clustering methods (implemented in the software BAPS, GENELAND and TESS), a progressive partitioning approach and a multivariate technique (Spatial Principal Components Analysis-sPCA). Three consensus Bayesian clusters were identified for the stone marten. No consensus was achieved for the red fox, but one cluster was the most probable clustering solution. Progressive partitioning and sPCA suggested additional clusters in the stone marten but they were not consistent among methods and were geographically incoherent. The contrasting results between the two species are consistent with the literature reporting stricter ecological requirements of the stone marten in the Iberian Peninsula. The observed genetic structure in the stone marten may have been influenced by landscape features, particularly rivers, and fragmentation. We suggest that an approach based on a consensus clustering solution of multiple different algorithms may provide an objective and effective means to delineate potential boundaries of inferred subpopulations. sPCA and progressive partitioning offer further verification of possible population structure and may be useful for revealing cryptic spatial genetic patterns worth further investigation. PMID:26727497

  1. False Discovery Control in Large-Scale Spatial Multiple Testing

    PubMed Central

    Sun, Wenguang; Reich, Brian J.; Cai, T. Tony; Guindani, Michele; Schwartzman, Armin

    2014-01-01

    Summary This article develops a unified theoretical and computational framework for false discovery control in multiple testing of spatial signals. We consider both point-wise and cluster-wise spatial analyses, and derive oracle procedures which optimally control the false discovery rate, false discovery exceedance and false cluster rate, respectively. A data-driven finite approximation strategy is developed to mimic the oracle procedures on a continuous spatial domain. Our multiple testing procedures are asymptotically valid and can be effectively implemented using Bayesian computational algorithms for analysis of large spatial data sets. Numerical results show that the proposed procedures lead to more accurate error control and better power performance than conventional methods. We demonstrate our methods for analyzing the time trends in tropospheric ozone in eastern US. PMID:25642138

  2. Determining open cluster membership. A Bayesian framework for quantitative member classification

    NASA Astrophysics Data System (ADS)

    Stott, Jonathan J.

    2018-01-01

    Aims: My goal is to develop a quantitative algorithm for assessing open cluster membership probabilities. The algorithm is designed to work with single-epoch observations. In its simplest form, only one set of program images and one set of reference images are required. Methods: The algorithm is based on a two-stage joint astrometric and photometric assessment of cluster membership probabilities. The probabilities were computed within a Bayesian framework using any available prior information. Where possible, the algorithm emphasizes simplicity over mathematical sophistication. Results: The algorithm was implemented and tested against three observational fields using published survey data. M 67 and NGC 654 were selected as cluster examples while a third, cluster-free, field was used for the final test data set. The algorithm shows good quantitative agreement with the existing surveys and has a false-positive rate significantly lower than the astrometric or photometric methods used individually.

  3. Buried landmine detection using multivariate normal clustering

    NASA Astrophysics Data System (ADS)

    Duston, Brian M.

    2001-10-01

    A Bayesian classification algorithm is presented for discriminating buried land mines from buried and surface clutter in Ground Penetrating Radar (GPR) signals. This algorithm is based on multivariate normal (MVN) clustering, where feature vectors are used to identify populations (clusters) of mines and clutter objects. The features are extracted from two-dimensional images created from ground penetrating radar scans. MVN clustering is used to determine the number of clusters in the data and to create probability density models for target and clutter populations, producing the MVN clustering classifier (MVNCC). The Bayesian Information Criteria (BIC) is used to evaluate each model to determine the number of clusters in the data. An extension of the MVNCC allows the model to adapt to local clutter distributions by treating each of the MVN cluster components as a Poisson process and adaptively estimating the intensity parameters. The algorithm is developed using data collected by the Mine Hunter/Killer Close-In Detector (MH/K CID) at prepared mine lanes. The Mine Hunter/Killer is a prototype mine detecting and neutralizing vehicle developed for the U.S. Army to clear roads of anti-tank mines.

  4. MC 2: A Deeper Look at ZwCl 2341.1+0000 with Bayesian Galaxy Clustering and Weak Lensing Analyses

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Benson, B.; Wittman, D. M.; Golovich, N.

    ZwCl 2341.1+0000, a merging galaxy cluster with disturbed X-ray morphology and widely separated (~3 Mpc) double radio relics, was thought to be an extremely massive (10 - 30 X 10 14M⊙) and complex system with little known about its merger history. We present JVLA 2-4 GHz observations of the cluster, along with new spectroscopy from our Keck/DEIMOS survey, and apply Gaussian Mixture Modeling to the three-dimensional distribution of 227 con rmed cluster galaxies. After adopting the Bayesian Information Criterion to avoid over tting, which we discover can bias total dynamical mass estimates high, we nd that a three-substructure model withmore » a total dynamical mass estimate of 9:39 ± 0:81 X 10 14M⊙ is favored. We also present deep Subaru imaging and perform the rst weak lensing analysis on this system, obtaining a weak lensing mass estimate of 5:57±2:47X10 14M⊙. This is a more robust estimate because it does not depend on the dynamical state of the system, which is disturbed due to the merger. Our results indicate that ZwCl 2341.1+0000 is a multiple merger system comprised of at least three substructures, with the main merger that produced the radio relics occurring near to the plane of the sky, and a younger merger in the North occurring closer to the line of sight. Dynamical modeling of the main merger reproduces observed quantities (relic positions and polarizations, subcluster separation and radial velocity difference), if the merger axis angle of ~10 +34 -6 degrees and the collision speed at pericenter is ~1900 +300 -200 km/s.« less

  5. MC 2: A Deeper Look at ZwCl 2341.1+0000 with Bayesian Galaxy Clustering and Weak Lensing Analyses

    DOE PAGES

    Benson, B.; Wittman, D. M.; Golovich, N.; ...

    2017-05-16

    ZwCl 2341.1+0000, a merging galaxy cluster with disturbed X-ray morphology and widely separated (~3 Mpc) double radio relics, was thought to be an extremely massive (10 - 30 X 10 14M⊙) and complex system with little known about its merger history. We present JVLA 2-4 GHz observations of the cluster, along with new spectroscopy from our Keck/DEIMOS survey, and apply Gaussian Mixture Modeling to the three-dimensional distribution of 227 con rmed cluster galaxies. After adopting the Bayesian Information Criterion to avoid over tting, which we discover can bias total dynamical mass estimates high, we nd that a three-substructure model withmore » a total dynamical mass estimate of 9:39 ± 0:81 X 10 14M⊙ is favored. We also present deep Subaru imaging and perform the rst weak lensing analysis on this system, obtaining a weak lensing mass estimate of 5:57±2:47X10 14M⊙. This is a more robust estimate because it does not depend on the dynamical state of the system, which is disturbed due to the merger. Our results indicate that ZwCl 2341.1+0000 is a multiple merger system comprised of at least three substructures, with the main merger that produced the radio relics occurring near to the plane of the sky, and a younger merger in the North occurring closer to the line of sight. Dynamical modeling of the main merger reproduces observed quantities (relic positions and polarizations, subcluster separation and radial velocity difference), if the merger axis angle of ~10 +34 -6 degrees and the collision speed at pericenter is ~1900 +300 -200 km/s.« less

  6. New Mycobacterium tuberculosis LAM sublineage with geographical specificity for the Old World revealed by phylogenetical and Bayesian analyses.

    PubMed

    Reynaud, Yann; Rastogi, Nalin

    2016-12-01

    We recently showed that the Mycobacterium tuberculosis sublineage LAM9 could be subdivided as two distinct subpopulations - each reflecting its unique biogeographical structure and evolutionary history. We subsequently attempted to verify if this genetic structuration could be traced in an enlarged global sample. For this purpose, we analyzed global evolutionary relationships of LAM strains in a large dataset (n = 1923 isolates from 35 countries worldwide) with concomitant spoligotyping and MIRU-VNTR data, followed by a deeper analysis of LAM9 sublineage (n = 851 isolates). Based on a combination of phylogenetical analysis and Bayesian statistics, a total of three different clusters, tentatively named LAM9C1, C2 and C3 were described in this dataset. Closer inspection of the phylogenetic tree with concomitant data on origin of isolates with genetic clusterization revealed LAM9C3 being the most tightly knit group exclusively found in the Old World as opposed to LAM9C2 being a loosely-knit group without any phylogeographical specificity; while LAM9C1 appeared with a majority of strains being well-clustered despite some isolates that intermixed with unrelated LAM clusters. Subsequently, we hereby describe a new M. tuberculosis LAM sublineage named LAM9C3 with phylogeographical specificity for the Old World. These findings open new perspectives to study respective migration histories and adaptation to human hosts of specific M. tuberculosis clones during the exploration and conquest of the New World. We therefore plan to reevaluate the nomenclature and evolutionary history of various LAM sublineages using Whole Genome Sequencing (WGS). Copyright © 2016 Elsevier Ltd. All rights reserved.

  7. Analysis of genetic population structure in Acacia caven (Leguminosae, Mimosoideae), comparing one exploratory and two Bayesian-model-based methods.

    PubMed

    Pometti, Carolina L; Bessega, Cecilia F; Saidman, Beatriz O; Vilardi, Juan C

    2014-03-01

    Bayesian clustering as implemented in STRUCTURE or GENELAND software is widely used to form genetic groups of populations or individuals. On the other hand, in order to satisfy the need for less computer-intensive approaches, multivariate analyses are specifically devoted to extracting information from large datasets. In this paper, we report the use of a dataset of AFLP markers belonging to 15 sampling sites of Acacia caven for studying the genetic structure and comparing the consistency of three methods: STRUCTURE, GENELAND and DAPC. Of these methods, DAPC was the fastest one and showed accuracy in inferring the K number of populations (K = 12 using the find.clusters option and K = 15 with a priori information of populations). GENELAND in turn, provides information on the area of membership probabilities for individuals or populations in the space, when coordinates are specified (K = 12). STRUCTURE also inferred the number of K populations and the membership probabilities of individuals based on ancestry, presenting the result K = 11 without prior information of populations and K = 15 using the LOCPRIOR option. Finally, in this work all three methods showed high consistency in estimating the population structure, inferring similar numbers of populations and the membership probabilities of individuals to each group, with a high correlation between each other.

  8. Population Structure of Sclerotinia subarctica and Sclerotinia sclerotiorum in England, Scotland and Norway

    PubMed Central

    Clarkson, John P.; Warmington, Rachel J.; Walley, Peter G.; Denton-Giles, Matthew; Barbetti, Martin J.; Brodal, Guro; Nordskog, Berit

    2017-01-01

    Sclerotinia species are important fungal pathogens of a wide range of crops and wild host plants. While the biology and population structure of Sclerotinia sclerotiorum has been well-studied, little information is available for the related species S. subarctica. In this study, Sclerotinia isolates were collected from different crop plants and the wild host Ranuculus ficaria (meadow buttercup) in England, Scotland, and Norway to determine the incidence of Sclerotinia subarctica and examine the population structure of this pathogen for the first time. Incidence was very low in England, comprising only 4.3% of isolates while moderate and high incidence of S. subarctica was identified in Scotland and Norway, comprising 18.3 and 48.0% of isolates respectively. Characterization with eight microsatellite markers identified 75 haplotypes within a total of 157 isolates over the three countries with a few haplotypes in Scotland and Norway sampled at a higher frequency than the rest across multiple locations and host plants. In total, eight microsatellite haplotypes were shared between Scotland and Norway while none were shared with England. Bayesian and principal component analyses revealed common ancestry and clustering of Scottish and Norwegian S. subarctica isolates while English isolates were assigned to a separate population cluster and exhibited low diversity indicative of isolation. Population structure was also examined for S. sclerotiorum isolates from England, Scotland, Norway, and Australia using microsatellite data, including some from a previous study in England. In total, 484 haplotypes were identified within 800 S. sclerotiorum isolates with just 15 shared between England and Scotland and none shared between any other countries. Bayesian and principal component analyses revealed a common ancestry and clustering of the English and Scottish isolates while Norwegian and Australian isolates were assigned to separate clusters. Furthermore, sequencing part of the intergenic spacer (IGS) region of the rRNA gene resulted in 26 IGS haplotypes within 870 S. sclerotiorum isolates, nine of which had not been previously identified and two of which were also widely distributed across different countries. S. subarctica therefore has a multiclonal population structure similar to S. sclerotiorum, but has a different ancestry and distribution across England, Scotland, and Norway. PMID:28421039

  9. Matched Filter Stochastic Background Characterization for Hyperspectral Target Detection

    DTIC Science & Technology

    2005-09-30

    and Pre- Clustering MVN Test.....................126 4.2.3 Pre- Clustering Detection Results.................................................130...4.2.4 Pre- Clustering Target Influence..................................................134 4.2.5 Statistical Distance Exclusion and Low Contrast...al, 2001] Figure 2.7 ROC Curve Comparison of RX, K-Means, and Bayesian Pre- Clustering Applied to Anomaly Detection [Ashton, 1998] Figure 2.8 ROC

  10. Manual hierarchical clustering of regional geochemical data using a Bayesian finite mixture model

    USGS Publications Warehouse

    Ellefsen, Karl J.; Smith, David

    2016-01-01

    Interpretation of regional scale, multivariate geochemical data is aided by a statistical technique called “clustering.” We investigate a particular clustering procedure by applying it to geochemical data collected in the State of Colorado, United States of America. The clustering procedure partitions the field samples for the entire survey area into two clusters. The field samples in each cluster are partitioned again to create two subclusters, and so on. This manual procedure generates a hierarchy of clusters, and the different levels of the hierarchy show geochemical and geological processes occurring at different spatial scales. Although there are many different clustering methods, we use Bayesian finite mixture modeling with two probability distributions, which yields two clusters. The model parameters are estimated with Hamiltonian Monte Carlo sampling of the posterior probability density function, which usually has multiple modes. Each mode has its own set of model parameters; each set is checked to ensure that it is consistent both with the data and with independent geologic knowledge. The set of model parameters that is most consistent with the independent geologic knowledge is selected for detailed interpretation and partitioning of the field samples.

  11. Assessing population genetic structure via the maximisation of genetic distance

    PubMed Central

    2009-01-01

    Background The inference of the hidden structure of a population is an essential issue in population genetics. Recently, several methods have been proposed to infer population structure in population genetics. Methods In this study, a new method to infer the number of clusters and to assign individuals to the inferred populations is proposed. This approach does not make any assumption on Hardy-Weinberg and linkage equilibrium. The implemented criterion is the maximisation (via a simulated annealing algorithm) of the averaged genetic distance between a predefined number of clusters. The performance of this method is compared with two Bayesian approaches: STRUCTURE and BAPS, using simulated data and also a real human data set. Results The simulations show that with a reduced number of markers, BAPS overestimates the number of clusters and presents a reduced proportion of correct groupings. The accuracy of the new method is approximately the same as for STRUCTURE. Also, in Hardy-Weinberg and linkage disequilibrium cases, BAPS performs incorrectly. In these situations, STRUCTURE and the new method show an equivalent behaviour with respect to the number of inferred clusters, although the proportion of correct groupings is slightly better with the new method. Re-establishing equilibrium with the randomisation procedures improves the precision of the Bayesian approaches. All methods have a good precision for FST ≥ 0.03, but only STRUCTURE estimates the correct number of clusters for FST as low as 0.01. In situations with a high number of clusters or a more complex population structure, MGD performs better than STRUCTURE and BAPS. The results for a human data set analysed with the new method are congruent with the geographical regions previously found. Conclusion This new method used to infer the hidden structure in a population, based on the maximisation of the genetic distance and not taking into consideration any assumption about Hardy-Weinberg and linkage equilibrium, performs well under different simulated scenarios and with real data. Therefore, it could be a useful tool to determine genetically homogeneous groups, especially in those situations where the number of clusters is high, with complex population structure and where Hardy-Weinberg and/or linkage equilibrium are present. PMID:19900278

  12. Spatial cluster detection using dynamic programming.

    PubMed

    Sverchkov, Yuriy; Jiang, Xia; Cooper, Gregory F

    2012-03-25

    The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm.

  13. Spatial cluster detection using dynamic programming

    PubMed Central

    2012-01-01

    Background The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. Methods We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. Results When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. Conclusions We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm. PMID:22443103

  14. Conformational Transition Pathways of Epidermal Growth Factor Receptor Kinase Domain from Multiple Molecular Dynamics Simulations and Bayesian Clustering.

    PubMed

    Li, Yan; Li, Xiang; Ma, Weiya; Dong, Zigang

    2014-08-12

    The epidermal growth factor receptor (EGFR) is aberrantly activated in various cancer cells and an important target for cancer treatment. Deep understanding of EGFR conformational changes between the active and inactive states is of pharmaceutical interest. Here we present a strategy combining multiply targeted molecular dynamics simulations, unbiased molecular dynamics simulations, and Bayesian clustering to investigate transition pathways during the activation/inactivation process of EGFR kinase domain. Two distinct pathways between the active and inactive forms are designed, explored, and compared. Based on Bayesian clustering and rough two-dimensional free energy surfaces, the energy-favorable pathway is recognized, though DFG-flip happens in both pathways. In addition, another pathway with different intermediate states appears in our simulations. Comparison of distinct pathways also indicates that disruption of the Lys745-Glu762 interaction is critically important in DFG-flip while movement of the A-loop significantly facilitates the conformational change. Our simulations yield new insights into EGFR conformational transitions. Moreover, our results verify that this approach is valid and efficient in sampling of protein conformational changes and comparison of distinct pathways.

  15. Population Structure of Two Rabies Hosts Relative to the Known Distribution of Rabies Virus Variants in Alaska

    PubMed Central

    Goldsmith, Elizabeth W.; Renshaw, Benjamin; Clement, Christopher J.; Himschoot, Elizabeth A.; Hundertmark, Kris J.; Hueffer, Karsten

    2015-01-01

    For pathogens that infect multiple species the distinction between reservoir hosts and spillover hosts is often difficult. In Alaska, three variants of the arctic rabies virus exist with distinct spatial distributions. We test the hypothesis that rabies virus variant distribution corresponds to the population structure of the primary rabies hosts in Alaska, arctic foxes (Vulpes lagopus) and red foxes (V. vulpes) in order to possibly distinguish reservoir and spill over hosts. We used mitochondrial DNA (mtDNA) sequence and nine microsatellites to assess population structure in those two species. mtDNA structure did not correspond to rabies virus variant structure in either species. Microsatellite analyses gave varying results. Bayesian clustering found 2 groups of arctic foxes in the coastal tundra region, but for red foxes it identified tundra and boreal types. Spatial Bayesian clustering and spatial principal components analysis identified 3 and 4 groups of arctic foxes, respectively, closely matching the distribution of rabies virus variants in the state. Red foxes, conversely, showed eight clusters comprising 2 regions (boreal and tundra) with much admixture. These results run contrary to previous beliefs that arctic fox show no fine-scale spatial population structure. While we cannot rule out that the red fox is part of the maintenance host community for rabies in Alaska, the distribution of virus variants appears to be driven primarily by the artic fox Therefore we show that host population genetics can be utilized to distinguish between maintenance and spillover hosts when used in conjunction with other approaches. PMID:26661691

  16. A two step Bayesian approach for genomic prediction of breeding values.

    PubMed

    Shariati, Mohammad M; Sørensen, Peter; Janss, Luc

    2012-05-21

    In genomic models that assign an individual variance to each marker, the contribution of one marker to the posterior distribution of the marker variance is only one degree of freedom (df), which introduces many variance parameters with only little information per variance parameter. A better alternative could be to form clusters of markers with similar effects where markers in a cluster have a common variance. Therefore, the influence of each marker group of size p on the posterior distribution of the marker variances will be p df. The simulated data from the 15th QTL-MAS workshop were analyzed such that SNP markers were ranked based on their effects and markers with similar estimated effects were grouped together. In step 1, all markers with minor allele frequency more than 0.01 were included in a SNP-BLUP prediction model. In step 2, markers were ranked based on their estimated variance on the trait in step 1 and each 150 markers were assigned to one group with a common variance. In further analyses, subsets of 1500 and 450 markers with largest effects in step 2 were kept in the prediction model. Grouping markers outperformed SNP-BLUP model in terms of accuracy of predicted breeding values. However, the accuracies of predicted breeding values were lower than Bayesian methods with marker specific variances. Grouping markers is less flexible than allowing each marker to have a specific marker variance but, by grouping, the power to estimate marker variances increases. A prior knowledge of the genetic architecture of the trait is necessary for clustering markers and appropriate prior parameterization.

  17. Population structure of two rabies hosts relative to the known distribution of rabies virus variants in Alaska.

    PubMed

    Goldsmith, Elizabeth W; Renshaw, Benjamin; Clement, Christopher J; Himschoot, Elizabeth A; Hundertmark, Kris J; Hueffer, Karsten

    2016-02-01

    For pathogens that infect multiple species, the distinction between reservoir hosts and spillover hosts is often difficult. In Alaska, three variants of the arctic rabies virus exist with distinct spatial distributions. We tested the hypothesis that rabies virus variant distribution corresponds to the population structure of the primary rabies hosts in Alaska, arctic foxes (Vulpes lagopus) and red foxes (Vulpes vulpes) to possibly distinguish reservoir and spillover hosts. We used mitochondrial DNA (mtDNA) sequence and nine microsatellites to assess population structure in those two species. mtDNA structure did not correspond to rabies virus variant structure in either species. Microsatellite analyses gave varying results. Bayesian clustering found two groups of arctic foxes in the coastal tundra region, but for red foxes it identified tundra and boreal types. Spatial Bayesian clustering and spatial principal components analysis identified 3 and 4 groups of arctic foxes, respectively, closely matching the distribution of rabies virus variants in the state. Red foxes, conversely, showed eight clusters comprising two regions (boreal and tundra) with much admixture. These results run contrary to previous beliefs that arctic fox show no fine-scale spatial population structure. While we cannot rule out that the red fox is part of the maintenance host community for rabies in Alaska, the distribution of virus variants appears to be driven primarily by the arctic fox. Therefore, we show that host population genetics can be utilized to distinguish between maintenance and spillover hosts when used in conjunction with other approaches. © 2015 John Wiley & Sons Ltd.

  18. Quasi-Likelihood Techniques in a Logistic Regression Equation for Identifying Simulium damnosum s.l. Larval Habitats Intra-cluster Covariates in Togo.

    PubMed

    Jacob, Benjamin G; Novak, Robert J; Toe, Laurent; Sanfo, Moussa S; Afriyie, Abena N; Ibrahim, Mohammed A; Griffith, Daniel A; Unnasch, Thomas R

    2012-01-01

    The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data. Thereafter, Durbin-Watson test statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG. Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC. The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters. The analyses also revealed that the estimators, levels of turbidity and presence of rocks were statistically significant for the high-ABR-stratified clusters, while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster. Varying and constant coefficient regression models, ABR- stratified GIS-generated clusters, sub-meter resolution satellite imagery, a robust residual intra-cluster diagnostic test, MBR-based histograms, eigendecomposition spatial filter algorithms and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities (i.e., heteroskedasticity) for testing correlations between georeferenced S. damnosum s.l. riverine larval habitat estimators. The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S. damnosum s.l habitats based on spatiotemporal field-sampled count data.

  19. Prediction of community prevalence of human onchocerciasis in the Amazonian onchocerciasis focus: Bayesian approach.

    PubMed Central

    Carabin, Hélène; Escalona, Marisela; Marshall, Clare; Vivas-Martínez, Sarai; Botto, Carlos; Joseph, Lawrence; Basáñez, María-Gloria

    2003-01-01

    OBJECTIVE: To develop a Bayesian hierarchical model for human onchocerciasis with which to explore the factors that influence prevalence of microfilariae in the Amazonian focus of onchocerciasis and predict the probability of any community being at least mesoendemic (>20% prevalence of microfilariae), and thus in need of priority ivermectin treatment. METHODS: Models were developed with data from 732 individuals aged > or =15 years who lived in 29 Yanomami communities along four rivers of the south Venezuelan Orinoco basin. The models' abilities to predict prevalences of microfilariae in communities were compared. The deviance information criterion, Bayesian P-values, and residual values were used to select the best model with an approximate cross-validation procedure. FINDINGS: A three-level model that acknowledged clustering of infection within communities performed best, with host age and sex included at the individual level, a river-dependent altitude effect at the community level, and additional clustering of communities along rivers. This model correctly classified 25/29 (86%) villages with respect to their need for priority ivermectin treatment. CONCLUSION: Bayesian methods are a flexible and useful approach for public health research and control planning. Our model acknowledges the clustering of infection within communities, allows investigation of links between individual- or community-specific characteristics and infection, incorporates additional uncertainty due to missing covariate data, and informs policy decisions by predicting the probability that a new community is at least mesoendemic. PMID:12973640

  20. Prediction of community prevalence of human onchocerciasis in the Amazonian onchocerciasis focus: Bayesian approach.

    PubMed

    Carabin, Hélène; Escalona, Marisela; Marshall, Clare; Vivas-Martínez, Sarai; Botto, Carlos; Joseph, Lawrence; Basáñez, María-Gloria

    2003-01-01

    To develop a Bayesian hierarchical model for human onchocerciasis with which to explore the factors that influence prevalence of microfilariae in the Amazonian focus of onchocerciasis and predict the probability of any community being at least mesoendemic (>20% prevalence of microfilariae), and thus in need of priority ivermectin treatment. Models were developed with data from 732 individuals aged > or =15 years who lived in 29 Yanomami communities along four rivers of the south Venezuelan Orinoco basin. The models' abilities to predict prevalences of microfilariae in communities were compared. The deviance information criterion, Bayesian P-values, and residual values were used to select the best model with an approximate cross-validation procedure. A three-level model that acknowledged clustering of infection within communities performed best, with host age and sex included at the individual level, a river-dependent altitude effect at the community level, and additional clustering of communities along rivers. This model correctly classified 25/29 (86%) villages with respect to their need for priority ivermectin treatment. Bayesian methods are a flexible and useful approach for public health research and control planning. Our model acknowledges the clustering of infection within communities, allows investigation of links between individual- or community-specific characteristics and infection, incorporates additional uncertainty due to missing covariate data, and informs policy decisions by predicting the probability that a new community is at least mesoendemic.

  1. Finding Groups Using Model-Based Cluster Analysis: Heterogeneous Emotional Self-Regulatory Processes and Heavy Alcohol Use Risk

    ERIC Educational Resources Information Center

    Mun, Eun Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.

    2008-01-01

    Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of nonnested models using the Bayesian information criterion to compare multiple models and identify the…

  2. Microsatellite variation and genetic structuring in Mugil liza (Teleostei: Mugilidae) populations from Argentina and Brazil

    NASA Astrophysics Data System (ADS)

    Mai, Ana C. G.; Miño, Carolina I.; Marins, Luis F. F.; Monteiro-Neto, Cassiano; Miranda, Laura; Schwingel, Paulo R.; Lemos, Valéria M.; Gonzalez-Castro, Mariano; Castello, Jorge P.; Vieira, João P.

    2014-08-01

    The mullet Mugil liza is distributed along the Atlantic coast of South America, from Argentina to Venezuela, and it is heavily exploited in Brazil. We assessed patterns of distribution of neutral nuclear genetic variation in 250 samples from the Brazilian states of Rio de Janeiro, São Paulo, Santa Catarina and Rio Grande do Sul (latitudinal range of 23-31°S) and from Buenos Aires Province in Argentina (36°S). Nine microsatellite loci revealed 131 total alleles, 3-23 alleles per locus, He: 0.69 and Ho: 0.67. Significant genetic differentiation was observed between Rio de Janeiro samples (23°S) and those from all other locations, as indicated by FST, hierarchical analyses of genetic structure, Bayesian cluster analyses and assignment tests. The presence of two different demographic clusters better explains the allelic diversity observed in mullets from the southernmost portion of the Atlantic coast of Brazil and from Argentina. This may be taken into account when designing fisheries management plans involving Brazilian, Uruguayan and Argentinean M. liza populations.

  3. Bayesian Multi-Trait Analysis Reveals a Useful Tool to Increase Oil Concentration and to Decrease Toxicity in Jatropha curcas L.

    PubMed Central

    Silva Junqueira, Vinícius; de Azevedo Peixoto, Leonardo; Galvêas Laviola, Bruno; Lopes Bhering, Leonardo; Mendonça, Simone; Agostini Costa, Tania da Silveira; Antoniassi, Rosemar

    2016-01-01

    The biggest challenge for jatropha breeding is to identify superior genotypes that present high seed yield and seed oil content with reduced toxicity levels. Therefore, the objective of this study was to estimate genetic parameters for three important traits (weight of 100 seed, oil seed content, and phorbol ester concentration), and to select superior genotypes to be used as progenitors in jatropha breeding. Additionally, the genotypic values and the genetic parameters estimated under the Bayesian multi-trait approach were used to evaluate different selection indices scenarios of 179 half-sib families. Three different scenarios and economic weights were considered. It was possible to simultaneously reduce toxicity and increase seed oil content and weight of 100 seed by using index selection based on genotypic value estimated by the Bayesian multi-trait approach. Indeed, we identified two families that present these characteristics by evaluating genetic diversity using the Ward clustering method, which suggested nine homogenous clusters. Future researches must integrate the Bayesian multi-trait methods with realized relationship matrix, aiming to build accurate selection indices models. PMID:27281340

  4. Micro- and macro-geographic scale effect on the molecular imprint of selection and adaptation in Norway spruce.

    PubMed

    Scalfi, Marta; Mosca, Elena; Di Pierro, Erica Adele; Troggio, Michela; Vendramin, Giovanni Giuseppe; Sperisen, Christoph; La Porta, Nicola; Neale, David B

    2014-01-01

    Forest tree species of temperate and boreal regions have undergone a long history of demographic changes and evolutionary adaptations. The main objective of this study was to detect signals of selection in Norway spruce (Picea abies [L.] Karst), at different sampling-scales and to investigate, accounting for population structure, the effect of environment on species genetic diversity. A total of 384 single nucleotide polymorphisms (SNPs) representing 290 genes were genotyped at two geographic scales: across 12 populations distributed along two altitudinal-transects in the Alps (micro-geographic scale), and across 27 populations belonging to the range of Norway spruce in central and south-east Europe (macro-geographic scale). At the macrogeographic scale, principal component analysis combined with Bayesian clustering revealed three major clusters, corresponding to the main areas of southern spruce occurrence, i.e. the Alps, Carpathians, and Hercynia. The populations along the altitudinal transects were not differentiated. To assess the role of selection in structuring genetic variation, we applied a Bayesian and coalescent-based F(ST)-outlier method and tested for correlations between allele frequencies and climatic variables using regression analyses. At the macro-geographic scale, the F(ST)-outlier methods detected together 11 F(ST)-outliers. Six outliers were detected when the same analyses were carried out taking into account the genetic structure. Regression analyses with population structure correction resulted in the identification of two (micro-geographic scale) and 38 SNPs (macro-geographic scale) significantly correlated with temperature and/or precipitation. Six of these loci overlapped with F(ST)-outliers, among them two loci encoding an enzyme involved in riboflavin biosynthesis and a sucrose synthase. The results of this study indicate a strong relationship between genetic and environmental variation at both geographic scales. It also suggests that an integrative approach combining different outlier detection methods and population sampling at different geographic scales is useful to identify loci potentially involved in adaptation.

  5. Micro- and Macro-Geographic Scale Effect on the Molecular Imprint of Selection and Adaptation in Norway Spruce

    PubMed Central

    Scalfi, Marta; Mosca, Elena; Di Pierro, Erica Adele; Troggio, Michela; Vendramin, Giovanni Giuseppe; Sperisen, Christoph; La Porta, Nicola; Neale, David B.

    2014-01-01

    Forest tree species of temperate and boreal regions have undergone a long history of demographic changes and evolutionary adaptations. The main objective of this study was to detect signals of selection in Norway spruce (Picea abies [L.] Karst), at different sampling-scales and to investigate, accounting for population structure, the effect of environment on species genetic diversity. A total of 384 single nucleotide polymorphisms (SNPs) representing 290 genes were genotyped at two geographic scales: across 12 populations distributed along two altitudinal-transects in the Alps (micro-geographic scale), and across 27 populations belonging to the range of Norway spruce in central and south-east Europe (macro-geographic scale). At the macrogeographic scale, principal component analysis combined with Bayesian clustering revealed three major clusters, corresponding to the main areas of southern spruce occurrence, i.e. the Alps, Carpathians, and Hercynia. The populations along the altitudinal transects were not differentiated. To assess the role of selection in structuring genetic variation, we applied a Bayesian and coalescent-based F ST-outlier method and tested for correlations between allele frequencies and climatic variables using regression analyses. At the macro-geographic scale, the F ST-outlier methods detected together 11 F ST-outliers. Six outliers were detected when the same analyses were carried out taking into account the genetic structure. Regression analyses with population structure correction resulted in the identification of two (micro-geographic scale) and 38 SNPs (macro-geographic scale) significantly correlated with temperature and/or precipitation. Six of these loci overlapped with F ST-outliers, among them two loci encoding an enzyme involved in riboflavin biosynthesis and a sucrose synthase. The results of this study indicate a strong relationship between genetic and environmental variation at both geographic scales. It also suggests that an integrative approach combining different outlier detection methods and population sampling at different geographic scales is useful to identify loci potentially involved in adaptation. PMID:25551624

  6. Origin, Migration Routes and Worldwide Population Genetic Structure of the Wheat Yellow Rust Pathogen Puccinia striiformis f.sp. tritici

    PubMed Central

    Ali, Sajid; Gladieux, Pierre; Leconte, Marc; Gautier, Angélique; Justesen, Annemarie F.; Hovmøller, Mogens S.; Enjalbert, Jérôme; de Vallavieille-Pope, Claude

    2014-01-01

    Analyses of large-scale population structure of pathogens enable the identification of migration patterns, diversity reservoirs or longevity of populations, the understanding of current evolutionary trajectories and the anticipation of future ones. This is particularly important for long-distance migrating fungal pathogens such as Puccinia striiformis f.sp. tritici (PST), capable of rapid spread to new regions and crop varieties. Although a range of recent PST invasions at continental scales are well documented, the worldwide population structure and the center of origin of the pathogen were still unknown. In this study, we used multilocus microsatellite genotyping to infer worldwide population structure of PST and the origin of new invasions based on 409 isolates representative of distribution of the fungus on six continents. Bayesian and multivariate clustering methods partitioned the set of multilocus genotypes into six distinct genetic groups associated with their geographical origin. Analyses of linkage disequilibrium and genotypic diversity indicated a strong regional heterogeneity in levels of recombination, with clear signatures of recombination in the Himalayan (Nepal and Pakistan) and near-Himalayan regions (China) and a predominant clonal population structure in other regions. The higher genotypic diversity, recombinant population structure and high sexual reproduction ability in the Himalayan and neighboring regions suggests this area as the putative center of origin of PST. We used clustering methods and approximate Bayesian computation (ABC) to compare different competing scenarios describing ancestral relationship among ancestral populations and more recently founded populations. Our analyses confirmed the Middle East-East Africa as the most likely source of newly spreading, high-temperature-adapted strains; Europe as the source of South American, North American and Australian populations; and Mediterranean-Central Asian populations as the origin of South African populations. Although most geographic populations are not markedly affected by recent dispersal events, this study emphasizes the influence of human activities on recent long-distance spread of the pathogen. PMID:24465211

  7. Progressive colonization and restricted gene flow shape island-dependent population structure in Galápagos marine iguanas (Amblyrhynchus cristatus)

    PubMed Central

    2009-01-01

    Background Marine iguanas (Amblyrhynchus cristatus) inhabit the coastlines of large and small islands throughout the Galápagos archipelago, providing a rich system to study the spatial and temporal factors influencing the phylogeographic distribution and population structure of a species. Here, we analyze the microevolution of marine iguanas using the complete mitochondrial control region (CR) as well as 13 microsatellite loci representing more than 1200 individuals from 13 islands. Results CR data show that marine iguanas occupy three general clades: one that is widely distributed across the northern archipelago, and likely spread from east to west by way of the South Equatorial current, a second that is found mostly on the older eastern and central islands, and a third that is limited to the younger northern and western islands. Generally, the CR haplotype distribution pattern supports the colonization of the archipelago from the older, eastern islands to the younger, western islands. However, there are also signatures of recurrent, historical gene flow between islands after population establishment. Bayesian cluster analysis of microsatellite genotypes indicates the existence of twenty distinct genetic clusters generally following a one-cluster-per-island pattern. However, two well-differentiated clusters were found on the easternmost island of San Cristóbal, while nine distinct and highly intermixed clusters were found on youngest, westernmost islands of Isabela and Fernandina. High mtDNA and microsatellite genetic diversity were observed for populations on Isabela and Fernandina that may be the result of a recent population expansion and founder events from multiple sources. Conclusions While a past genetic study based on pure FST analysis suggested that marine iguana populations display high levels of nuclear (but not mitochondrial) gene flow due to male-biased dispersal, the results of our sex-biased dispersal tests and the finding of strong genetic differentiation between islands do not support this view. Therefore, our study is a nice example of how recently developed analytical tools such as Bayesian clustering analysis and DNA sequence-based demographic analyses can overcome potential biases introduced by simply relying on FST estimates from markers with different inheritance patterns. PMID:20028547

  8. Modular analysis of the probabilistic genetic interaction network.

    PubMed

    Hou, Lin; Wang, Lin; Qian, Minping; Li, Dong; Tang, Chao; Zhu, Yunping; Deng, Minghua; Li, Fangting

    2011-03-15

    Epistatic Miniarray Profiles (EMAP) has enabled the mapping of large-scale genetic interaction networks; however, the quantitative information gained from EMAP cannot be fully exploited since the data are usually interpreted as a discrete network based on an arbitrary hard threshold. To address such limitations, we adopted a mixture modeling procedure to construct a probabilistic genetic interaction network and then implemented a Bayesian approach to identify densely interacting modules in the probabilistic network. Mixture modeling has been demonstrated as an effective soft-threshold technique of EMAP measures. The Bayesian approach was applied to an EMAP dataset studying the early secretory pathway in Saccharomyces cerevisiae. Twenty-seven modules were identified, and 14 of those were enriched by gold standard functional gene sets. We also conducted a detailed comparison with state-of-the-art algorithms, hierarchical cluster and Markov clustering. The experimental results show that the Bayesian approach outperforms others in efficiently recovering biologically significant modules.

  9. Analysis of genetic population structure in Acacia caven (Leguminosae, Mimosoideae), comparing one exploratory and two Bayesian-model-based methods

    PubMed Central

    Pometti, Carolina L.; Bessega, Cecilia F.; Saidman, Beatriz O.; Vilardi, Juan C.

    2014-01-01

    Bayesian clustering as implemented in STRUCTURE or GENELAND software is widely used to form genetic groups of populations or individuals. On the other hand, in order to satisfy the need for less computer-intensive approaches, multivariate analyses are specifically devoted to extracting information from large datasets. In this paper, we report the use of a dataset of AFLP markers belonging to 15 sampling sites of Acacia caven for studying the genetic structure and comparing the consistency of three methods: STRUCTURE, GENELAND and DAPC. Of these methods, DAPC was the fastest one and showed accuracy in inferring the K number of populations (K = 12 using the find.clusters option and K = 15 with a priori information of populations). GENELAND in turn, provides information on the area of membership probabilities for individuals or populations in the space, when coordinates are specified (K = 12). STRUCTURE also inferred the number of K populations and the membership probabilities of individuals based on ancestry, presenting the result K = 11 without prior information of populations and K = 15 using the LOCPRIOR option. Finally, in this work all three methods showed high consistency in estimating the population structure, inferring similar numbers of populations and the membership probabilities of individuals to each group, with a high correlation between each other. PMID:24688293

  10. Local differentiation amidst extensive allele sharing in Oryza nivara and O. rufipogon

    PubMed Central

    Banaticla-Hilario, Maria Celeste N; van den Berg, Ronald G; Hamilton, Nigel Ruaraidh Sackville; McNally, Kenneth L

    2013-01-01

    Genetic variation patterns within and between species may change along geographic gradients and at different spatial scales. This was revealed by microsatellite data at 29 loci obtained from 119 accessions of three Oryza series Sativae species in Asia Pacific: Oryza nivara Sharma and Shastry, O. rufipogon Griff., and O. meridionalis Ng. Genetic similarities between O. nivara and O. rufipogon across their distribution are evident in the clustering and ordination results and in the large proportion of shared alleles between these taxa. However, local-level species separation is recognized by Bayesian clustering and neighbor-joining analyses. At the regional scale, the two species seem more differentiated in South Asia than in Southeast Asia as revealed by FST analysis. The presence of strong gene flow barriers in smaller spatial units is also suggested in the analysis of molecular variance (AMOVA) results where 64% of the genetic variation is contained among populations (as compared to 26% within populations and 10% among species). Oryza nivara (HE = 0.67) exhibits slightly lower diversity and greater population differentiation than O. rufipogon (HE = 0.70). Bayesian inference identified four, and at a finer structural level eight, genetically distinct population groups that correspond to geographic populations within the three taxa. Oryza meridionalis and the Nepalese O. nivara seemed diverged from all the population groups of the series, whereas the Australasian O. rufipogon appeared distinct from the rest of the species. PMID:24101993

  11. Understanding the Scalability of Bayesian Network Inference Using Clique Tree Growth Curves

    NASA Technical Reports Server (NTRS)

    Mengshoel, Ole J.

    2010-01-01

    One of the main approaches to performing computation in Bayesian networks (BNs) is clique tree clustering and propagation. The clique tree approach consists of propagation in a clique tree compiled from a Bayesian network, and while it was introduced in the 1980s, there is still a lack of understanding of how clique tree computation time depends on variations in BN size and structure. In this article, we improve this understanding by developing an approach to characterizing clique tree growth as a function of parameters that can be computed in polynomial time from BNs, specifically: (i) the ratio of the number of a BN s non-root nodes to the number of root nodes, and (ii) the expected number of moral edges in their moral graphs. Analytically, we partition the set of cliques in a clique tree into different sets, and introduce a growth curve for the total size of each set. For the special case of bipartite BNs, there are two sets and two growth curves, a mixed clique growth curve and a root clique growth curve. In experiments, where random bipartite BNs generated using the BPART algorithm are studied, we systematically increase the out-degree of the root nodes in bipartite Bayesian networks, by increasing the number of leaf nodes. Surprisingly, root clique growth is well-approximated by Gompertz growth curves, an S-shaped family of curves that has previously been used to describe growth processes in biology, medicine, and neuroscience. We believe that this research improves the understanding of the scaling behavior of clique tree clustering for a certain class of Bayesian networks; presents an aid for trade-off studies of clique tree clustering using growth curves; and ultimately provides a foundation for benchmarking and developing improved BN inference and machine learning algorithms.

  12. Bayesian survival analysis in clinical trials: What methods are used in practice?

    PubMed

    Brard, Caroline; Le Teuff, Gwénaël; Le Deley, Marie-Cécile; Hampson, Lisa V

    2017-02-01

    Background Bayesian statistics are an appealing alternative to the traditional frequentist approach to designing, analysing, and reporting of clinical trials, especially in rare diseases. Time-to-event endpoints are widely used in many medical fields. There are additional complexities to designing Bayesian survival trials which arise from the need to specify a model for the survival distribution. The objective of this article was to critically review the use and reporting of Bayesian methods in survival trials. Methods A systematic review of clinical trials using Bayesian survival analyses was performed through PubMed and Web of Science databases. This was complemented by a full text search of the online repositories of pre-selected journals. Cost-effectiveness, dose-finding studies, meta-analyses, and methodological papers using clinical trials were excluded. Results In total, 28 articles met the inclusion criteria, 25 were original reports of clinical trials and 3 were re-analyses of a clinical trial. Most trials were in oncology (n = 25), were randomised controlled (n = 21) phase III trials (n = 13), and half considered a rare disease (n = 13). Bayesian approaches were used for monitoring in 14 trials and for the final analysis only in 14 trials. In the latter case, Bayesian survival analyses were used for the primary analysis in four cases, for the secondary analysis in seven cases, and for the trial re-analysis in three cases. Overall, 12 articles reported fitting Bayesian regression models (semi-parametric, n = 3; parametric, n = 9). Prior distributions were often incompletely reported: 20 articles did not define the prior distribution used for the parameter of interest. Over half of the trials used only non-informative priors for monitoring and the final analysis (n = 12) when it was specified. Indeed, no articles fitting Bayesian regression models placed informative priors on the parameter of interest. The prior for the treatment effect was based on historical data in only four trials. Decision rules were pre-defined in eight cases when trials used Bayesian monitoring, and in only one case when trials adopted a Bayesian approach to the final analysis. Conclusion Few trials implemented a Bayesian survival analysis and few incorporated external data into priors. There is scope to improve the quality of reporting of Bayesian methods in survival trials. Extension of the Consolidated Standards of Reporting Trials statement for reporting Bayesian clinical trials is recommended.

  13. Social network-based recruitment successfully reveals HIV-1 transmission networks among high-risk individuals in El Salvador.

    PubMed

    Dennis, Ann M; Murillo, Wendy; de Maria Hernandez, Flor; Guardado, Maria Elena; Nieto, Ana Isabel; Lorenzana de Rivera, Ivette; Eron, Joseph J; Paz-Bailey, Gabriela

    2013-05-01

    HIV in Central America is concentrated among certain groups such as men who have sex with men (MSM) and female sex workers (FSWs). We compared social recruitment chains and HIV transmission clusters from 699 MSM and 787 FSWs to better understand factors contributing to ongoing HIV transmission in El Salvador. Phylogenies were reconstructed using pol sequences from 119 HIV-positive individuals recruited by respondent-driven sampling (RDS) and compared with RDS chains in 3 cities in El Salvador. Transmission clusters with a mean pairwise genetic distance ≤ 0.015 and Bayesian posterior probabilities =1 were identified. Factors associated with cluster membership were evaluated among MSM. Sequences from 34 (43%) MSM and 4 (10%) FSW grouped in 14 transmission clusters. Clusters were defined by risk group (12 MSM clusters) and geographic residence (only 1 spanned separate cities). In 4 MSM clusters (all n = 2), individuals were also members of the same RDS chain, but only 2 had members directly linked through recruitment. All large clusters (n ≥ 3) spanned >1 RDS chain. Among MSM, factors independently associated with cluster membership included recent infection by BED assay (P = 0.02), sex with stable male partners (P = 0.02), and sex with ≥ 3 male partners in the past year (P = 0.04). We found few HIV transmissions corresponding directly with the social recruitment. However, we identified clustering in nearly one-half of MSM suggesting that RDS recruitment was indirectly but successfully uncovering transmission networks, particularly among recent infections. Interrogating RDS chains with phylogenetic analyses may help refine methods for identifying transmission clusters.

  14. A Poisson nonnegative matrix factorization method with parameter subspace clustering constraint for endmember extraction in hyperspectral imagery

    NASA Astrophysics Data System (ADS)

    Sun, Weiwei; Ma, Jun; Yang, Gang; Du, Bo; Zhang, Liangpei

    2017-06-01

    A new Bayesian method named Poisson Nonnegative Matrix Factorization with Parameter Subspace Clustering Constraint (PNMF-PSCC) has been presented to extract endmembers from Hyperspectral Imagery (HSI). First, the method integrates the liner spectral mixture model with the Bayesian framework and it formulates endmember extraction into a Bayesian inference problem. Second, the Parameter Subspace Clustering Constraint (PSCC) is incorporated into the statistical program to consider the clustering of all pixels in the parameter subspace. The PSCC could enlarge differences among ground objects and helps finding endmembers with smaller spectrum divergences. Meanwhile, the PNMF-PSCC method utilizes the Poisson distribution as the prior knowledge of spectral signals to better explain the quantum nature of light in imaging spectrometer. Third, the optimization problem of PNMF-PSCC is formulated into maximizing the joint density via the Maximum A Posterior (MAP) estimator. The program is finally solved by iteratively optimizing two sub-problems via the Alternating Direction Method of Multipliers (ADMM) framework and the FURTHESTSUM initialization scheme. Five state-of-the art methods are implemented to make comparisons with the performance of PNMF-PSCC on both the synthetic and real HSI datasets. Experimental results show that the PNMF-PSCC outperforms all the five methods in Spectral Angle Distance (SAD) and Root-Mean-Square-Error (RMSE), and especially it could identify good endmembers for ground objects with smaller spectrum divergences.

  15. Bayesian methods including nonrandomized study data increased the efficiency of postlaunch RCTs.

    PubMed

    Schmidt, Amand F; Klugkist, Irene; Klungel, Olaf H; Nielen, Mirjam; de Boer, Anthonius; Hoes, Arno W; Groenwold, Rolf H H

    2015-04-01

    Findings from nonrandomized studies on safety or efficacy of treatment in patient subgroups may trigger postlaunch randomized clinical trials (RCTs). In the analysis of such RCTs, results from nonrandomized studies are typically ignored. This study explores the trade-off between bias and power of Bayesian RCT analysis incorporating information from nonrandomized studies. A simulation study was conducted to compare frequentist with Bayesian analyses using noninformative and informative priors in their ability to detect interaction effects. In simulated subgroups, the effect of a hypothetical treatment differed between subgroups (odds ratio 1.00 vs. 2.33). Simulations varied in sample size, proportions of the subgroups, and specification of the priors. As expected, the results for the informative Bayesian analyses were more biased than those from the noninformative Bayesian analysis or frequentist analysis. However, because of a reduction in posterior variance, informative Bayesian analyses were generally more powerful to detect an effect. In scenarios where the informative priors were in the opposite direction of the RCT data, type 1 error rates could be 100% and power 0%. Bayesian methods incorporating data from nonrandomized studies can meaningfully increase power of interaction tests in postlaunch RCTs. Copyright © 2015 Elsevier Inc. All rights reserved.

  16. Patterns of population structure for inshore bottlenose dolphins along the eastern United States.

    PubMed

    Richards, Vincent P; Greig, Thomas W; Fair, Patricia A; McCulloch, Stephen D; Politz, Christine; Natoli, Ada; Driscoll, Carlos A; Hoelzel, A Rus; David, Victor; Bossart, Gregory D; Lopez, Jose V

    2013-01-01

    Globally distributed, the bottlenose dolphin (Tursiops truncatus) is found in a range of offshore and coastal habitats. Using 15 microsatellite loci and mtDNA control region sequences, we investigated patterns of genetic differentiation among putative populations along the eastern US shoreline (the Indian River Lagoon, Florida, and Charleston Harbor, South Carolina) (microsatellite analyses: n = 125, mtDNA analyses: n = 132). We further utilized the mtDNA to compare these populations with those from the Northwest Atlantic, Gulf of Mexico, and Caribbean. Results showed strong differentiation among inshore, alongshore, and offshore habitats (ФST = 0.744). In addition, Bayesian clustering analyses revealed the presence of 2 genetic clusters (populations) within the 250 km Indian River Lagoon. Habitat heterogeneity is likely an important force diversifying bottlenose dolphin populations through its influence on social behavior and foraging strategy. We propose that the spatial pattern of genetic variation within the lagoon reflects both its steep longitudinal transition of climate and also its historical discontinuity and recent connection as part of Intracoastal Waterway development. These findings have important management implications as they emphasize the role of habitat and the consequence of its modification in shaping bottlenose dolphin population structure and highlight the possibility of multiple management units existing in discrete inshore habitats along the entire eastern US shoreline.

  17. Patterns of Population Structure for Inshore Bottlenose Dolphins along the Eastern United States

    PubMed Central

    2013-01-01

    Globally distributed, the bottlenose dolphin (Tursiops truncatus) is found in a range of offshore and coastal habitats. Using 15 microsatellite loci and mtDNA control region sequences, we investigated patterns of genetic differentiation among putative populations along the eastern US shoreline (the Indian River Lagoon, Florida, and Charleston Harbor, South Carolina) (microsatellite analyses: n = 125, mtDNA analyses: n = 132). We further utilized the mtDNA to compare these populations with those from the Northwest Atlantic, Gulf of Mexico, and Caribbean. Results showed strong differentiation among inshore, alongshore, and offshore habitats (ФST = 0.744). In addition, Bayesian clustering analyses revealed the presence of 2 genetic clusters (populations) within the 250 km Indian River Lagoon. Habitat heterogeneity is likely an important force diversifying bottlenose dolphin populations through its influence on social behavior and foraging strategy. We propose that the spatial pattern of genetic variation within the lagoon reflects both its steep longitudinal transition of climate and also its historical discontinuity and recent connection as part of Intracoastal Waterway development. These findings have important management implications as they emphasize the role of habitat and the consequence of its modification in shaping bottlenose dolphin population structure and highlight the possibility of multiple management units existing in discrete inshore habitats along the entire eastern US shoreline. PMID:24129993

  18. Model-based Clustering of Categorical Time Series with Multinomial Logit Classification

    NASA Astrophysics Data System (ADS)

    Frühwirth-Schnatter, Sylvia; Pamminger, Christoph; Winter-Ebmer, Rudolf; Weber, Andrea

    2010-09-01

    A common problem in many areas of applied statistics is to identify groups of similar time series in a panel of time series. However, distance-based clustering methods cannot easily be extended to time series data, where an appropriate distance-measure is rather difficult to define, particularly for discrete-valued time series. Markov chain clustering, proposed by Pamminger and Frühwirth-Schnatter [6], is an approach for clustering discrete-valued time series obtained by observing a categorical variable with several states. This model-based clustering method is based on finite mixtures of first-order time-homogeneous Markov chain models. In order to further explain group membership we present an extension to the approach of Pamminger and Frühwirth-Schnatter [6] by formulating a probabilistic model for the latent group indicators within the Bayesian classification rule by using a multinomial logit model. The parameters are estimated for a fixed number of clusters within a Bayesian framework using an Markov chain Monte Carlo (MCMC) sampling scheme representing a (full) Gibbs-type sampler which involves only draws from standard distributions. Finally, an application to a panel of Austrian wage mobility data is presented which leads to an interesting segmentation of the Austrian labour market.

  19. Verification of Bayesian Clustering in Travel Behaviour Research – First Step to Macroanalysis of Travel Behaviour

    NASA Astrophysics Data System (ADS)

    Satra, P.; Carsky, J.

    2018-04-01

    Our research is looking at the travel behaviour from a macroscopic view, taking one municipality as a basic unit. The travel behaviour of one municipality as a whole is becoming one piece of a data in the research of travel behaviour of a larger area, perhaps a country. A data pre-processing is used to cluster the municipalities in groups, which show similarities in their travel behaviour. Such groups can be then researched for reasons of their prevailing pattern of travel behaviour without any distortion caused by municipalities with a different pattern. This paper deals with actual settings of the clustering process, which is based on Bayesian statistics, particularly the mixture model. An optimization of the settings parameters based on correlation of pointer model parameters and relative number of data in clusters is helpful, however not fully reliable method. Thus, method for graphic representation of clusters needs to be developed in order to check their quality. A training of the setting parameters in 2D has proven to be a beneficial method, because it allows visual control of the produced clusters. The clustering better be applied on separate groups of municipalities, where competition of only identical transport modes can be found.

  20. A Hierarchical Bayesian Model for Calibrating Estimates of Species Divergence Times

    PubMed Central

    Heath, Tracy A.

    2012-01-01

    In Bayesian divergence time estimation methods, incorporating calibrating information from the fossil record is commonly done by assigning prior densities to ancestral nodes in the tree. Calibration prior densities are typically parametric distributions offset by minimum age estimates provided by the fossil record. Specification of the parameters of calibration densities requires the user to quantify his or her prior knowledge of the age of the ancestral node relative to the age of its calibrating fossil. The values of these parameters can, potentially, result in biased estimates of node ages if they lead to overly informative prior distributions. Accordingly, determining parameter values that lead to adequate prior densities is not straightforward. In this study, I present a hierarchical Bayesian model for calibrating divergence time analyses with multiple fossil age constraints. This approach applies a Dirichlet process prior as a hyperprior on the parameters of calibration prior densities. Specifically, this model assumes that the rate parameters of exponential prior distributions on calibrated nodes are distributed according to a Dirichlet process, whereby the rate parameters are clustered into distinct parameter categories. Both simulated and biological data are analyzed to evaluate the performance of the Dirichlet process hyperprior. Compared with fixed exponential prior densities, the hierarchical Bayesian approach results in more accurate and precise estimates of internal node ages. When this hyperprior is applied using Markov chain Monte Carlo methods, the ages of calibrated nodes are sampled from mixtures of exponential distributions and uncertainty in the values of calibration density parameters is taken into account. PMID:22334343

  1. A Bayesian cluster analysis method for single-molecule localization microscopy data.

    PubMed

    Griffié, Juliette; Shannon, Michael; Bromley, Claire L; Boelen, Lies; Burn, Garth L; Williamson, David J; Heard, Nicholas A; Cope, Andrew P; Owen, Dylan M; Rubin-Delanchy, Patrick

    2016-12-01

    Cell function is regulated by the spatiotemporal organization of the signaling machinery, and a key facet of this is molecular clustering. Here, we present a protocol for the analysis of clustering in data generated by 2D single-molecule localization microscopy (SMLM)-for example, photoactivated localization microscopy (PALM) or stochastic optical reconstruction microscopy (STORM). Three features of such data can cause standard cluster analysis approaches to be ineffective: (i) the data take the form of a list of points rather than a pixel array; (ii) there is a non-negligible unclustered background density of points that must be accounted for; and (iii) each localization has an associated uncertainty in regard to its position. These issues are overcome using a Bayesian, model-based approach. Many possible cluster configurations are proposed and scored against a generative model, which assumes Gaussian clusters overlaid on a completely spatially random (CSR) background, before every point is scrambled by its localization precision. We present the process of generating simulated and experimental data that are suitable to our algorithm, the analysis itself, and the extraction and interpretation of key cluster descriptors such as the number of clusters, cluster radii and the number of localizations per cluster. Variations in these descriptors can be interpreted as arising from changes in the organization of the cellular nanoarchitecture. The protocol requires no specific programming ability, and the processing time for one data set, typically containing 30 regions of interest, is ∼18 h; user input takes ∼1 h.

  2. On selecting a prior for the precision parameter of Dirichlet process mixture models

    USGS Publications Warehouse

    Dorazio, R.M.

    2009-01-01

    In hierarchical mixture models the Dirichlet process is used to specify latent patterns of heterogeneity, particularly when the distribution of latent parameters is thought to be clustered (multimodal). The parameters of a Dirichlet process include a precision parameter ?? and a base probability measure G0. In problems where ?? is unknown and must be estimated, inferences about the level of clustering can be sensitive to the choice of prior assumed for ??. In this paper an approach is developed for computing a prior for the precision parameter ?? that can be used in the presence or absence of prior information about the level of clustering. This approach is illustrated in an analysis of counts of stream fishes. The results of this fully Bayesian analysis are compared with an empirical Bayes analysis of the same data and with a Bayesian analysis based on an alternative commonly used prior.

  3. Evaluating Spatial Variability in Sediment and Phosphorus Concentration-Discharge Relationships Using Bayesian Inference and Self-Organizing Maps

    NASA Astrophysics Data System (ADS)

    Underwood, Kristen L.; Rizzo, Donna M.; Schroth, Andrew W.; Dewoolkar, Mandar M.

    2017-12-01

    Given the variable biogeochemical, physical, and hydrological processes driving fluvial sediment and nutrient export, the water science and management communities need data-driven methods to identify regions prone to production and transport under variable hydrometeorological conditions. We use Bayesian analysis to segment concentration-discharge linear regression models for total suspended solids (TSS) and particulate and dissolved phosphorus (PP, DP) using 22 years of monitoring data from 18 Lake Champlain watersheds. Bayesian inference was leveraged to estimate segmented regression model parameters and identify threshold position. The identified threshold positions demonstrated a considerable range below and above the median discharge—which has been used previously as the default breakpoint in segmented regression models to discern differences between pre and post-threshold export regimes. We then applied a Self-Organizing Map (SOM), which partitioned the watersheds into clusters of TSS, PP, and DP export regimes using watershed characteristics, as well as Bayesian regression intercepts and slopes. A SOM defined two clusters of high-flux basins, one where PP flux was predominantly episodic and hydrologically driven; and another in which the sediment and nutrient sourcing and mobilization were more bimodal, resulting from both hydrologic processes at post-threshold discharges and reactive processes (e.g., nutrient cycling or lateral/vertical exchanges of fine sediment) at prethreshold discharges. A separate DP SOM defined two high-flux clusters exhibiting a bimodal concentration-discharge response, but driven by differing land use. Our novel framework shows promise as a tool with broad management application that provides insights into landscape drivers of riverine solute and sediment export.

  4. Bayesian performance metrics and small system integration in recent homeland security and defense applications

    NASA Astrophysics Data System (ADS)

    Jannson, Tomasz; Kostrzewski, Andrew; Patton, Edward; Pradhan, Ranjit; Shih, Min-Yi; Walter, Kevin; Savant, Gajendra; Shie, Rick; Forrester, Thomas

    2010-04-01

    In this paper, Bayesian inference is applied to performance metrics definition of the important class of recent Homeland Security and defense systems called binary sensors, including both (internal) system performance and (external) CONOPS. The medical analogy is used to define the PPV (Positive Predictive Value), the basic Bayesian metrics parameter of the binary sensors. Also, Small System Integration (SSI) is discussed in the context of recent Homeland Security and defense applications, emphasizing a highly multi-technological approach, within the broad range of clusters ("nexus") of electronics, optics, X-ray physics, γ-ray physics, and other disciplines.

  5. Competing risk models in reliability systems, a weibull distribution model with bayesian analysis approach

    NASA Astrophysics Data System (ADS)

    Iskandar, Ismed; Satria Gondokaryono, Yudi

    2016-02-01

    In reliability theory, the most important problem is to determine the reliability of a complex system from the reliability of its components. The weakness of most reliability theories is that the systems are described and explained as simply functioning or failed. In many real situations, the failures may be from many causes depending upon the age and the environment of the system and its components. Another problem in reliability theory is one of estimating the parameters of the assumed failure models. The estimation may be based on data collected over censored or uncensored life tests. In many reliability problems, the failure data are simply quantitatively inadequate, especially in engineering design and maintenance system. The Bayesian analyses are more beneficial than the classical one in such cases. The Bayesian estimation analyses allow us to combine past knowledge or experience in the form of an apriori distribution with life test data to make inferences of the parameter of interest. In this paper, we have investigated the application of the Bayesian estimation analyses to competing risk systems. The cases are limited to the models with independent causes of failure by using the Weibull distribution as our model. A simulation is conducted for this distribution with the objectives of verifying the models and the estimators and investigating the performance of the estimators for varying sample size. The simulation data are analyzed by using Bayesian and the maximum likelihood analyses. The simulation results show that the change of the true of parameter relatively to another will change the value of standard deviation in an opposite direction. For a perfect information on the prior distribution, the estimation methods of the Bayesian analyses are better than those of the maximum likelihood. The sensitivity analyses show some amount of sensitivity over the shifts of the prior locations. They also show the robustness of the Bayesian analysis within the range between the true value and the maximum likelihood estimated value lines.

  6. A Single-Cell Roadmap of Lineage Bifurcation in Human ESC Models of Embryonic Brain Development.

    PubMed

    Yao, Zizhen; Mich, John K; Ku, Sherman; Menon, Vilas; Krostag, Anne-Rachel; Martinez, Refugio A; Furchtgott, Leon; Mulholland, Heather; Bort, Susan; Fuqua, Margaret A; Gregor, Ben W; Hodge, Rebecca D; Jayabalu, Anu; May, Ryan C; Melton, Samuel; Nelson, Angelique M; Ngo, N Kiet; Shapovalova, Nadiya V; Shehata, Soraya I; Smith, Michael W; Tait, Leah J; Thompson, Carol L; Thomsen, Elliot R; Ye, Chaoyang; Glass, Ian A; Kaykas, Ajamete; Yao, Shuyuan; Phillips, John W; Grimley, Joshua S; Levi, Boaz P; Wang, Yanling; Ramanathan, Sharad

    2017-01-05

    During human brain development, multiple signaling pathways generate diverse cell types with varied regional identities. Here, we integrate single-cell RNA sequencing and clonal analyses to reveal lineage trees and molecular signals underlying early forebrain and mid/hindbrain cell differentiation from human embryonic stem cells (hESCs). Clustering single-cell transcriptomic data identified 41 distinct populations of progenitor, neuronal, and non-neural cells across our differentiation time course. Comparisons with primary mouse and human gene expression data demonstrated rostral and caudal progenitor and neuronal identities from early brain development. Bayesian analyses inferred a unified cell-type lineage tree that bifurcates between cortical and mid/hindbrain cell types. Two methods of clonal analyses confirmed these findings and further revealed the importance of Wnt/β-catenin signaling in controlling this lineage decision. Together, these findings provide a rich transcriptome-based lineage map for studying human brain development and modeling developmental disorders. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. Range overlap and individual movements during breeding season influence genetic relationships of caribou herds in south-central Alaska

    USGS Publications Warehouse

    Roffler, Gretchen H.; Adams, Layne G.; Talbot, Sandra L.; Sage, George K.; Dale, Bruce W.

    2012-01-01

    North American caribou (Rangifer tarandus) herds commonly exhibit little nuclear genetic differentiation among adjacent herds, although available evidence supports strong demographic separation, even for herds with seasonal range overlap. During 1997–2003, we studied the Mentasta and Nelchina caribou herds in south-central Alaska using radiotelemetry to determine individual movements and range overlap during the breeding season, and nuclear and mitochondrial DNA (mtDNA) markers to assess levels of genetic differentiation. Although the herds were considered discrete because females calved in separate regions, individual movements and breeding-range overlap in some years provided opportunity for male-mediated gene flow, even without demographic interchange. Telemetry results revealed strong female philopatry, and little evidence of female emigration despite overlapping seasonal distributions. Analyses of 13 microsatellites indicated the Mentasta and Nelchina herds were not significantly differentiated using both traditional population-based analyses and individual-based Bayesian clustering analyses. However, we observed mtDNA differentiation between the 2 herds (FSTM = 0.041, P

  8. Bayesian Analysis and Characterization of Multiple Populations in Galactic Globular Clusters

    NASA Astrophysics Data System (ADS)

    Wagner-Kaiser, Rachel A.; Stenning, David; Sarajedini, Ata; von Hippel, Ted; van Dyk, David A.; Robinson, Elliot; Stein, Nathan; Jefferys, William H.; BASE-9, HST UVIS Globular Cluster Treasury Program

    2017-01-01

    Globular clusters have long been important tools to unlock the early history of galaxies. Thus, it is crucial we understand the formation and characteristics of the globular clusters (GCs) themselves. Historically, GCs were thought to be simple and largely homogeneous populations, formed via collapse of a single molecular cloud. However, this classical view has been overwhelmingly invalidated by recent work. It is now clear that the vast majority of globular clusters in our Galaxy host two or more chemically distinct populations of stars, with variations in helium and light elements at discrete abundance levels. No coherent story has arisen that is able to fully explain the formation of multiple populations in globular clusters nor the mechanisms that drive stochastic variations from cluster to cluster.We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of 30 Galactic Globular Clusters to characterize two distinct stellar populations. A sophisticated Bayesian technique is employed to simultaneously sample the joint posterior distribution of age, distance, and extinction for each cluster, as well as unique helium values for two populations within each cluster and the relative proportion of those populations. We find the helium differences among the two populations in the clusters fall in the range of 0.04 to 0.11. Because adequate models varying in CNO are not presently available, we view these spreads as upper limits and present them with statistical rather than observational uncertainties. Evidence supports previous studies suggesting an increase in helium content concurrent with increasing mass of the cluster. We also find that the proportion of the first population of stars increases with mass. Our results are examined in the context of proposed globular cluster formation scenarios.

  9. Bayesian nonparametric clustering in phylogenetics: modeling antigenic evolution in influenza.

    PubMed

    Cybis, Gabriela B; Sinsheimer, Janet S; Bedford, Trevor; Rambaut, Andrew; Lemey, Philippe; Suchard, Marc A

    2018-01-30

    Influenza is responsible for up to 500,000 deaths every year, and antigenic variability represents much of its epidemiological burden. To visualize antigenic differences across many viral strains, antigenic cartography methods use multidimensional scaling on binding assay data to map influenza antigenicity onto a low-dimensional space. Analysis of such assay data ideally leads to natural clustering of influenza strains of similar antigenicity that correlate with sequence evolution. To understand the dynamics of these antigenic groups, we present a framework that jointly models genetic and antigenic evolution by combining multidimensional scaling of binding assay data, Bayesian phylogenetic machinery and nonparametric clustering methods. We propose a phylogenetic Chinese restaurant process that extends the current process to incorporate the phylogenetic dependency structure between strains in the modeling of antigenic clusters. With this method, we are able to use the genetic information to better understand the evolution of antigenicity throughout epidemics, as shown in applications of this model to H1N1 influenza. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  10. SpatialEpiApp: A Shiny web application for the analysis of spatial and spatio-temporal disease data.

    PubMed

    Moraga, Paula

    2017-11-01

    During last years, public health surveillance has been facilitated by the existence of several packages implementing statistical methods for the analysis of spatial and spatio-temporal disease data. However, these methods are still inaccesible for many researchers lacking the adequate programming skills to effectively use the required software. In this paper we present SpatialEpiApp, a Shiny web application that integrate two of the most common approaches in health surveillance: disease mapping and detection of clusters. SpatialEpiApp is easy to use and does not require any programming knowledge. Given information about the cases, population and optionally covariates for each of the areas and dates of study, the application allows to fit Bayesian models to obtain disease risk estimates and their uncertainty by using R-INLA, and to detect disease clusters by using SaTScan. The application allows user interaction and the creation of interactive data visualizations and reports showing the analyses performed. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. On the diversity of the laccase gene: a phylogenetic perspective from Botryosphaeria rhodina (Ascomycota: Fungi) and other related taxa.

    PubMed

    Castilho, Flávio J D; Torres, Rodrigo A; Barbosa, Aneli M; Dekker, Robert F H; Garcia, José E

    2009-02-01

    The present study is the first describing the sequencing of a fragment of the copper-oxidase domain of a laccase gene in the family Botryosphaeriaceae. The aim of this work was to assess the degree of genetic and evolutionary relationships of a laccase gene from Botryosphaeria rhodina MAMB-05 with other ascomycete and basidiomycete laccase genes. The 193-amino acid sequences of the copper-oxidase domain from several different fungi, insects, a plant, and a bacterial species were retrieved from GenBank and aligned. Phylogenetic analyses were performed using neighbor-joining, maximum parsimony, and Bayesian inference methods. The organisms studied clustered into five gene clades: fungi (ascomycetes and basidiomycetes), insects, plants, and bacteria. Also, the topologies showed that fungal laccases of the ascomycetes and basidiomycetes are clearly separated into two distinct clusters. This evidence indicated that B. rhodina MAMB-05 and other closely related ascomycetes are a new biological resource given the biotechnological potential of their laccase genes.

  12. An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems

    PubMed Central

    Dawson, Kevin J.; Belkhir, Khalid

    2009-01-01

    Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals, - the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. Since the number of possible partitions grows very rapidly with the sample size, we can not visualise this probability distribution in its entirety, unless the sample is very small. As a solution to this visualisation problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package Partition View. The exact linkage algorithm takes the posterior co-assignment probabilities as input, and yields as output a rooted binary tree, - or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities. PMID:19337306

  13. Dark Energy Survey Year 1 results: cross-correlation redshifts - methods and systematics characterization

    NASA Astrophysics Data System (ADS)

    Gatti, M.; Vielzeuf, P.; Davis, C.; Cawthon, R.; Rau, M. M.; DeRose, J.; De Vicente, J.; Alarcon, A.; Rozo, E.; Gaztanaga, E.; Hoyle, B.; Miquel, R.; Bernstein, G. M.; Bonnett, C.; Carnero Rosell, A.; Castander, F. J.; Chang, C.; da Costa, L. N.; Gruen, D.; Gschwend, J.; Hartley, W. G.; Lin, H.; MacCrann, N.; Maia, M. A. G.; Ogando, R. L. C.; Roodman, A.; Sevilla-Noarbe, I.; Troxel, M. A.; Wechsler, R. H.; Asorey, J.; Davis, T. M.; Glazebrook, K.; Hinton, S. R.; Lewis, G.; Lidman, C.; Macaulay, E.; Möller, A.; O'Neill, C. R.; Sommer, N. E.; Uddin, S. A.; Yuan, F.; Zhang, B.; Abbott, T. M. C.; Allam, S.; Annis, J.; Bechtol, K.; Brooks, D.; Burke, D. L.; Carollo, D.; Carrasco Kind, M.; Carretero, J.; Cunha, C. E.; D'Andrea, C. B.; DePoy, D. L.; Desai, S.; Eifler, T. F.; Evrard, A. E.; Flaugher, B.; Fosalba, P.; Frieman, J.; García-Bellido, J.; Gerdes, D. W.; Goldstein, D. A.; Gruendl, R. A.; Gutierrez, G.; Honscheid, K.; Hoormann, J. K.; Jain, B.; James, D. J.; Jarvis, M.; Jeltema, T.; Johnson, M. W. G.; Johnson, M. D.; Krause, E.; Kuehn, K.; Kuhlmann, S.; Kuropatkin, N.; Li, T. S.; Lima, M.; Marshall, J. L.; Melchior, P.; Menanteau, F.; Nichol, R. C.; Nord, B.; Plazas, A. A.; Reil, K.; Rykoff, E. S.; Sako, M.; Sanchez, E.; Scarpine, V.; Schubnell, M.; Sheldon, E.; Smith, M.; Smith, R. C.; Soares-Santos, M.; Sobreira, F.; Suchyta, E.; Swanson, M. E. C.; Tarle, G.; Thomas, D.; Tucker, B. E.; Tucker, D. L.; Vikram, V.; Walker, A. R.; Weller, J.; Wester, W.; Wolf, R. C.

    2018-06-01

    We use numerical simulations to characterize the performance of a clustering-based method to calibrate photometric redshift biases. In particular, we cross-correlate the weak lensing source galaxies from the Dark Energy Survey Year 1 sample with redMaGiC galaxies (luminous red galaxies with secure photometric redshifts) to estimate the redshift distribution of the former sample. The recovered redshift distributions are used to calibrate the photometric redshift bias of standard photo-z methods applied to the same source galaxy sample. We apply the method to two photo-z codes run in our simulated data: Bayesian Photometric Redshift and Directional Neighbourhood Fitting. We characterize the systematic uncertainties of our calibration procedure, and find that these systematic uncertainties dominate our error budget. The dominant systematics are due to our assumption of unevolving bias and clustering across each redshift bin, and to differences between the shapes of the redshift distributions derived by clustering versus photo-zs. The systematic uncertainty in the mean redshift bias of the source galaxy sample is Δz ≲ 0.02, though the precise value depends on the redshift bin under consideration. We discuss possible ways to mitigate the impact of our dominant systematics in future analyses.

  14. Hybridization and differential introgression associated with environmental shifts in a mistletoe species complex.

    PubMed

    Baena-Díaz, Fernanda; Ramírez-Barahona, Santiago; Ornelas, Juan Francisco

    2018-04-03

    Host specialization after host shifting is traditionally viewed as the pathway to speciation in parasitic plants. However, geographical and environmental changes can also influence parasite speciation, through hybridization processes. Here we investigated the impact of past climatic fluctuations, environment, and host shifts on the genetic structure and patterns of hybridization and gene flow between Psittacanthus calyculatus and P. schiedeanus, a Mesoamerican species complex. Using microsatellites (408 individuals), we document moderate genetic diversity but high genetic differentiation between widespread parental clusters, calyculatus in dry pine-oak forests and schiedeanus in cloud forests. Bayesian analyses identified a third cluster, with admixture between parental clusters in areas of xeric and tropical dry forests and high levels of migration rates following secondary contact. Coincidently host associations in these areas differ from those in areas of parental species, suggesting that past hybridization played a role in environmental and host shifts. Overall, the observed genetic and geographic patterns suggest that these Psittacanthus populations could have entered a distinct evolutionary pathway. The results provide evidence for highlights on the importance of the Pleistocene climate changes, habitat differences, and potential host shifts in the evolutionary history of Neotropical mistletoes.

  15. On the blind use of statistical tools in the analysis of globular cluster stars

    NASA Astrophysics Data System (ADS)

    D'Antona, Francesca; Caloi, Vittoria; Tailo, Marco

    2018-04-01

    As with most data analysis methods, the Bayesian method must be handled with care. We show that its application to determine stellar evolution parameters within globular clusters can lead to paradoxical results if used without the necessary precautions. This is a cautionary tale on the use of statistical tools for big data analysis.

  16. Limitations of cytochrome oxidase I for the barcoding of Neritidae (Mollusca: Gastropoda) as revealed by Bayesian analysis.

    PubMed

    Chee, S Y

    2015-05-25

    The mitochondrial DNA (mtDNA) cytochrome oxidase I (COI) gene has been universally and successfully utilized as a barcoding gene, mainly because it can be amplified easily, applied across a wide range of taxa, and results can be obtained cheaply and quickly. However, in rare cases, the gene can fail to distinguish between species, particularly when exposed to highly sensitive methods of data analysis, such as the Bayesian method, or when taxa have undergone introgressive hybridization, over-splitting, or incomplete lineage sorting. Such cases require the use of alternative markers, and nuclear DNA markers are commonly used. In this study, a dendrogram produced by Bayesian analysis of an mtDNA COI dataset was compared with that of a nuclear DNA ATPS-α dataset, in order to evaluate the efficiency of COI in barcoding Malaysian nerites (Neritidae). In the COI dendrogram, most of the species were in individual clusters, except for two species: Nerita chamaeleon and N. histrio. These two species were placed in the same subcluster, whereas in the ATPS-α dendrogram they were in their own subclusters. Analysis of the ATPS-α gene also placed the two genera of nerites (Nerita and Neritina) in separate clusters, whereas COI gene analysis placed both genera in the same cluster. Therefore, in the case of the Neritidae, the ATPS-α gene is a better barcoding gene than the COI gene.

  17. Multi-angle backscatter classification and sub-bottom profiling for improved seafloor characterization

    NASA Astrophysics Data System (ADS)

    Alevizos, Evangelos; Snellen, Mirjam; Simons, Dick; Siemes, Kerstin; Greinert, Jens

    2018-06-01

    This study applies three classification methods exploiting the angular dependence of acoustic seafloor backscatter along with high resolution sub-bottom profiling for seafloor sediment characterization in the Eckernförde Bay, Baltic Sea Germany. This area is well suited for acoustic backscatter studies due to its shallowness, its smooth bathymetry and the presence of a wide range of sediment types. Backscatter data were acquired using a Seabeam1180 (180 kHz) multibeam echosounder and sub-bottom profiler data were recorded using a SES-2000 parametric sonar transmitting 6 and 12 kHz. The high density of seafloor soundings allowed extracting backscatter layers for five beam angles over a large part of the surveyed area. A Bayesian probability method was employed for sediment classification based on the backscatter variability at a single incidence angle, whereas Maximum Likelihood Classification (MLC) and Principal Components Analysis (PCA) were applied to the multi-angle layers. The Bayesian approach was used for identifying the optimum number of acoustic classes because cluster validation is carried out prior to class assignment and class outputs are ordinal categorical values. The method is based on the principle that backscatter values from a single incidence angle express a normal distribution for a particular sediment type. The resulting Bayesian classes were well correlated to median grain sizes and the percentage of coarse material. The MLC method uses angular response information from five layers of training areas extracted from the Bayesian classification map. The subsequent PCA analysis is based on the transformation of these five layers into two principal components that comprise most of the data variability. These principal components were clustered in five classes after running an external cluster validation test. In general both methods MLC and PCA, separated the various sediment types effectively, showing good agreement (kappa >0.7) with the Bayesian approach which also correlates well with ground truth data (r2 > 0.7). In addition, sub-bottom data were used in conjunction with the Bayesian classification results to characterize acoustic classes with respect to their geological and stratigraphic interpretation. The joined interpretation of seafloor and sub-seafloor data sets proved to be an efficient approach for a better understanding of seafloor backscatter patchiness and to discriminate acoustically similar classes in different geological/bathymetric settings.

  18. Bayesian Modeling of Temporal Coherence in Videos for Entity Discovery and Summarization.

    PubMed

    Mitra, Adway; Biswas, Soma; Bhattacharyya, Chiranjib

    2017-03-01

    A video is understood by users in terms of entities present in it. Entity Discovery is the task of building appearance model for each entity (e.g., a person), and finding all its occurrences in the video. We represent a video as a sequence of tracklets, each spanning 10-20 frames, and associated with one entity. We pose Entity Discovery as tracklet clustering, and approach it by leveraging Temporal Coherence (TC): the property that temporally neighboring tracklets are likely to be associated with the same entity. Our major contributions are the first Bayesian nonparametric models for TC at tracklet-level. We extend Chinese Restaurant Process (CRP) to TC-CRP, and further to Temporally Coherent Chinese Restaurant Franchise (TC-CRF) to jointly model entities and temporal segments using mixture components and sparse distributions. For discovering persons in TV serial videos without meta-data like scripts, these methods show considerable improvement over state-of-the-art approaches to tracklet clustering in terms of clustering accuracy, cluster purity and entity coverage. The proposed methods can perform online tracklet clustering on streaming videos unlike existing approaches, and can automatically reject false tracklets. Finally we discuss entity-driven video summarization- where temporal segments of the video are selected based on the discovered entities, to create a semantically meaningful summary.

  19. Bayesian methods in reliability

    NASA Astrophysics Data System (ADS)

    Sander, P.; Badoux, R.

    1991-11-01

    The present proceedings from a course on Bayesian methods in reliability encompasses Bayesian statistical methods and their computational implementation, models for analyzing censored data from nonrepairable systems, the traits of repairable systems and growth models, the use of expert judgment, and a review of the problem of forecasting software reliability. Specific issues addressed include the use of Bayesian methods to estimate the leak rate of a gas pipeline, approximate analyses under great prior uncertainty, reliability estimation techniques, and a nonhomogeneous Poisson process. Also addressed are the calibration sets and seed variables of expert judgment systems for risk assessment, experimental illustrations of the use of expert judgment for reliability testing, and analyses of the predictive quality of software-reliability growth models such as the Weibull order statistics.

  20. Glaciation Effects on the Phylogeographic Structure of Oligoryzomys longicaudatus (Rodentia: Sigmodontinae) in the Southern Andes

    PubMed Central

    Palma, R. Eduardo; Boric-Bargetto, Dusan; Torres-Pérez, Fernando; Hernández, Cristián E.; Yates, Terry L.

    2012-01-01

    The long-tailed pygmy rice rat Oligoryzomys longicaudatus (Sigmodontinae), the major reservoir of Hantavirus in Chile and Patagonian Argentina, is widely distributed in the Mediterranean, Temperate and Patagonian Forests of Chile, as well as in adjacent areas in southern Argentina. We used molecular data to evaluate the effects of the last glacial event on the phylogeographic structure of this species. We examined if historical Pleistocene events had affected genetic variation and spatial distribution of this species along its distributional range. We sampled 223 individuals representing 47 localities along the species range, and sequenced the hypervariable domain I of the mtDNA control region. Aligned sequences were analyzed using haplotype network, Bayesian population structure and demographic analyses. Analysis of population structure and the haplotype network inferred three genetic clusters along the distribution of O. longicaudatus that mostly agreed with the three major ecogeographic regions in Chile: Mediterranean, Temperate Forests and Patagonian Forests. Bayesian Skyline Plots showed constant population sizes through time in all three clusters followed by an increase after and during the Last Glacial Maximum (LGM; between 26,000–13,000 years ago). Neutrality tests and the “g” parameter also suggest that populations of O. longicaudatus experienced demographic expansion across the species entire range. Past climate shifts have influenced population structure and lineage variation of O. longicaudatus. This species remained in refugia areas during Pleistocene times in southern Temperate Forests (and adjacent areas in Patagonia). From these refugia, O. longicaudatus experienced demographic expansions into Patagonian Forests and central Mediterranean Chile using glacial retreats. PMID:22396751

  1. Evolutionary History of Wild Barley (Hordeum vulgare subsp. spontaneum) Analyzed Using Multilocus Sequence Data and Paleodistribution Modeling

    PubMed Central

    Jakob, Sabine S.; Rödder, Dennis; Engler, Jan O.; Shaaf, Salar; Özkan, Hakan; Blattner, Frank R.; Kilian, Benjamin

    2014-01-01

    Studies of Hordeum vulgare subsp. spontaneum, the wild progenitor of cultivated barley, have mostly relied on materials collected decades ago and maintained since then ex situ in germplasm repositories. We analyzed spatial genetic variation in wild barley populations collected rather recently, exploring sequence variations at seven single-copy nuclear loci, and inferred the relationships among these populations and toward the genepool of the crop. The wild barley collection covers the whole natural distribution area from the Mediterranean to Middle Asia. In contrast to earlier studies, Bayesian assignment analyses revealed three population clusters, in the Levant, Turkey, and east of Turkey, respectively. Genetic diversity was exceptionally high in the Levant, while eastern populations were depleted of private alleles. Species distribution modeling based on climate parameters and extant occurrence points of the taxon inferred suitable habitat conditions during the ice-age, particularly in the Levant and Turkey. Together with the ecologically wide range of habitats, they might contribute to structured but long-term stable populations in this region and their high genetic diversity. For recently collected individuals, Bayesian assignment to geographic clusters was generally unambiguous, but materials from genebanks often showed accessions that were not placed according to their assumed geographic origin or showed traces of introgression from cultivated barley. We assign this to gene flow among accessions during ex situ maintenance. Evolutionary studies based on such materials might therefore result in wrong conclusions regarding the history of the species or the origin and mode of domestication of the crop, depending on the accessions included. PMID:24586028

  2. Glaciation effects on the phylogeographic structure of Oligoryzomys longicaudatus (Rodentia: Sigmodontinae) in the southern Andes.

    PubMed

    Palma, R Eduardo; Boric-Bargetto, Dusan; Torres-Pérez, Fernando; Hernández, Cristián E; Yates, Terry L

    2012-01-01

    The long-tailed pygmy rice rat Oligoryzomys longicaudatus (Sigmodontinae), the major reservoir of Hantavirus in Chile and Patagonian Argentina, is widely distributed in the Mediterranean, Temperate and Patagonian Forests of Chile, as well as in adjacent areas in southern Argentina. We used molecular data to evaluate the effects of the last glacial event on the phylogeographic structure of this species. We examined if historical Pleistocene events had affected genetic variation and spatial distribution of this species along its distributional range. We sampled 223 individuals representing 47 localities along the species range, and sequenced the hypervariable domain I of the mtDNA control region. Aligned sequences were analyzed using haplotype network, bayesian population structure and demographic analyses. Analysis of population structure and the haplotype network inferred three genetic clusters along the distribution of O. longicaudatus that mostly agreed with the three major ecogeographic regions in Chile: Mediterranean, Temperate Forests and Patagonian Forests. Bayesian Skyline Plots showed constant population sizes through time in all three clusters followed by an increase after and during the Last Glacial Maximum (LGM; between 26,000-13,000 years ago). Neutrality tests and the "g" parameter also suggest that populations of O. longicaudatus experienced demographic expansion across the species entire range. Past climate shifts have influenced population structure and lineage variation of O. longicaudatus. This species remained in refugia areas during Pleistocene times in southern Temperate Forests (and adjacent areas in Patagonia). From these refugia, O. longicaudatus experienced demographic expansions into Patagonian Forests and central Mediterranean Chile using glacial retreats.

  3. A Bayesian, generalized frailty model for comet assays.

    PubMed

    Ghebretinsae, Aklilu Habteab; Faes, Christel; Molenberghs, Geert; De Boeck, Marlies; Geys, Helena

    2013-05-01

    This paper proposes a flexible modeling approach for so-called comet assay data regularly encountered in preclinical research. While such data consist of non-Gaussian outcomes in a multilevel hierarchical structure, traditional analyses typically completely or partly ignore this hierarchical nature by summarizing measurements within a cluster. Non-Gaussian outcomes are often modeled using exponential family models. This is true not only for binary and count data, but also for, example, time-to-event outcomes. Two important reasons for extending this family are for (1) the possible occurrence of overdispersion, meaning that the variability in the data may not be adequately described by the models, which often exhibit a prescribed mean-variance link, and (2) the accommodation of a hierarchical structure in the data, owing to clustering in the data. The first issue is dealt with through so-called overdispersion models. Clustering is often accommodated through the inclusion of random subject-specific effects. Though not always, one conventionally assumes such random effects to be normally distributed. In the case of time-to-event data, one encounters, for example, the gamma frailty model (Duchateau and Janssen, 2007 ). While both of these issues may occur simultaneously, models combining both are uncommon. Molenberghs et al. ( 2010 ) proposed a broad class of generalized linear models accommodating overdispersion and clustering through two separate sets of random effects. Here, we use this method to model data from a comet assay with a three-level hierarchical structure. Although a conjugate gamma random effect is used for the overdispersion random effect, both gamma and normal random effects are considered for the hierarchical random effect. Apart from model formulation, we place emphasis on Bayesian estimation. Our proposed method has an upper hand over the traditional analysis in that it (1) uses the appropriate distribution stipulated in the literature; (2) deals with the complete hierarchical nature; and (3) uses all information instead of summary measures. The fit of the model to the comet assay is compared against the background of more conventional model fits. Results indicate the toxicity of 1,2-dimethylhydrazine dihydrochloride at different dose levels (low, medium, and high).

  4. Triadic split-merge sampler

    NASA Astrophysics Data System (ADS)

    van Rossum, Anne C.; Lin, Hai Xiang; Dubbeldam, Johan; van der Herik, H. Jaap

    2018-04-01

    In machine vision typical heuristic methods to extract parameterized objects out of raw data points are the Hough transform and RANSAC. Bayesian models carry the promise to optimally extract such parameterized objects given a correct definition of the model and the type of noise at hand. A category of solvers for Bayesian models are Markov chain Monte Carlo methods. Naive implementations of MCMC methods suffer from slow convergence in machine vision due to the complexity of the parameter space. Towards this blocked Gibbs and split-merge samplers have been developed that assign multiple data points to clusters at once. In this paper we introduce a new split-merge sampler, the triadic split-merge sampler, that perform steps between two and three randomly chosen clusters. This has two advantages. First, it reduces the asymmetry between the split and merge steps. Second, it is able to propose a new cluster that is composed out of data points from two different clusters. Both advantages speed up convergence which we demonstrate on a line extraction problem. We show that the triadic split-merge sampler outperforms the conventional split-merge sampler. Although this new MCMC sampler is demonstrated in this machine vision context, its application extend to the very general domain of statistical inference.

  5. Mitochondrial DNA Reveals Genetic Structuring of Pinna nobilis across the Mediterranean Sea

    PubMed Central

    Sanna, Daria; Cossu, Piero; Dedola, Gian Luca; Scarpa, Fabio; Maltagliati, Ferruccio; Castelli, Alberto; Franzoi, Piero; Lai, Tiziana; Cristo, Benedetto; Curini-Galletti, Marco; Francalacci, Paolo; Casu, Marco

    2013-01-01

    Pinna nobilis is the largest endemic Mediterranean marine bivalve. During past centuries, various human activities have promoted the regression of its populations. As a consequence of stringent standards of protection, demographic expansions are currently reported in many sites. The aim of this study was to provide the first large broad-scale insight into the genetic variability of P. nobilis in the area that encompasses the western Mediterranean, Ionian Sea, and Adriatic Sea marine ecoregions. To accomplish this objective twenty-five populations from this area were surveyed using two mitochondrial DNA markers (COI and 16S). Our dataset was then merged with those obtained in other studies for the Aegean and Tunisian populations (eastern Mediterranean), and statistical analyses (Bayesian model-based clustering, median-joining network, AMOVA, mismatch distribution, Tajima’s and Fu’s neutrality tests and Bayesian skyline plots) were performed. The results revealed genetic divergence among three distinguishable areas: (1) western Mediterranean and Ionian Sea; (2) Adriatic Sea; and (3) Aegean Sea and Tunisian coastal areas. From a conservational point of view, populations from the three genetically divergent groups found may be considered as different management units. PMID:23840684

  6. Anaysis of the quality of image data required by the LANDSAT-4 Thematic Mapper and Multispectral Scanner. [agricultural and forest cover types in California

    NASA Technical Reports Server (NTRS)

    Colwell, R. N. (Principal Investigator)

    1984-01-01

    The spatial, geometric, and radiometric qualities of LANDSAT 4 thematic mapper (TM) and multispectral scanner (MSS) data were evaluated by interpreting, through visual and computer means, film and digital products for selected agricultural and forest cover types in California. Multispectral analyses employing Bayesian maximum likelihood, discrete relaxation, and unsupervised clustering algorithms were used to compare the usefulness of TM and MSS data for discriminating individual cover types. Some of the significant results are as follows: (1) for maximizing the interpretability of agricultural and forest resources, TM color composites should contain spectral bands in the visible, near-reflectance infrared, and middle-reflectance infrared regions, namely TM 4 and TM % and must contain TM 4 in all cases even at the expense of excluding TM 5; (2) using enlarged TM film products, planimetric accuracy of mapped poins was within 91 meters (RMSE east) and 117 meters (RMSE north); (3) using TM digital products, planimetric accuracy of mapped points was within 12.0 meters (RMSE east) and 13.7 meters (RMSE north); and (4) applying a contextual classification algorithm to TM data provided classification accuracies competitive with Bayesian maximum likelihood.

  7. Introduction to Bayesian statistical approaches to compositional analyses of transgenic crops 1. Model validation and setting the stage.

    PubMed

    Harrison, Jay M; Breeze, Matthew L; Harrigan, George G

    2011-08-01

    Statistical comparisons of compositional data generated on genetically modified (GM) crops and their near-isogenic conventional (non-GM) counterparts typically rely on classical significance testing. This manuscript presents an introduction to Bayesian methods for compositional analysis along with recommendations for model validation. The approach is illustrated using protein and fat data from two herbicide tolerant GM soybeans (MON87708 and MON87708×MON89788) and a conventional comparator grown in the US in 2008 and 2009. Guidelines recommended by the US Food and Drug Administration (FDA) in conducting Bayesian analyses of clinical studies on medical devices were followed. This study is the first Bayesian approach to GM and non-GM compositional comparisons. The evaluation presented here supports a conclusion that a Bayesian approach to analyzing compositional data can provide meaningful and interpretable results. We further describe the importance of method validation and approaches to model checking if Bayesian approaches to compositional data analysis are to be considered viable by scientists involved in GM research and regulation. Copyright © 2011 Elsevier Inc. All rights reserved.

  8. Developing Critical Thinking about Reporting of Bayesian Analyses

    ERIC Educational Resources Information Center

    Pullenayegum, Eleanor M.; Guo, Qing; Hopkins, Robert B.

    2012-01-01

    Graduate students in the health sciences who hope to become independent researchers must be able to write up their results at a standard suitable for submission to peer-reviewed journals. Bayesian analyses are still rare in the medical literature, and students are often unclear on what should be included in a manuscript. Whilst there are published…

  9. Bayesian just-so stories in psychology and neuroscience.

    PubMed

    Bowers, Jeffrey S; Davis, Colin J

    2012-05-01

    According to Bayesian theories in psychology and neuroscience, minds and brains are (near) optimal in solving a wide range of tasks. We challenge this view and argue that more traditional, non-Bayesian approaches are more promising. We make 3 main arguments. First, we show that the empirical evidence for Bayesian theories in psychology is weak. This weakness relates to the many arbitrary ways that priors, likelihoods, and utility functions can be altered in order to account for the data that are obtained, making the models unfalsifiable. It further relates to the fact that Bayesian theories are rarely better at predicting data compared with alternative (and simpler) non-Bayesian theories. Second, we show that the empirical evidence for Bayesian theories in neuroscience is weaker still. There are impressive mathematical analyses showing how populations of neurons could compute in a Bayesian manner but little or no evidence that they do. Third, we challenge the general scientific approach that characterizes Bayesian theorizing in cognitive science. A common premise is that theories in psychology should largely be constrained by a rational analysis of what the mind ought to do. We question this claim and argue that many of the important constraints come from biological, evolutionary, and processing (algorithmic) considerations that have no adaptive relevance to the problem per se. In our view, these factors have contributed to the development of many Bayesian "just so" stories in psychology and neuroscience; that is, mathematical analyses of cognition that can be used to explain almost any behavior as optimal. 2012 APA, all rights reserved.

  10. Identification of a current hot spot of HIV type 1 transmission in Mongolia by molecular epidemiological analysis.

    PubMed

    Davaalkham, Jagdagsuren; Unenchimeg, Puntsag; Baigalmaa, Chultem; Erdenetuya, Gombo; Nyamkhuu, Dulmaa; Shiino, Teiichiro; Tsuchiya, Kiyoto; Hayashida, Tsunefusa; Gatanaga, Hiroyuki; Oka, Shinichi

    2011-10-01

    We investigated the current molecular epidemiological status of HIV-1 in Mongolia, a country with very low incidence of HIV-1 though with rapid expansion in recent years. HIV-1 pol (1065 nt) and env (447 nt) genes were sequenced to construct phylogenetic trees. The evolutionary rates, molecular clock phylogenies, and other evolutionary parameters were estimated from heterochronous genomic sequences of HIV-1 subtype B by the Bayesian Markov chain Monte Carlo method. We obtained 41 sera from 56 reported HIV-1-positive cases as of May 2009. The main route of infection was men who have sex with men (MSM). Dominant subtypes were subtype B in 32 cases (78%) followed by subtype CRF02_AG (9.8%). The phylogenetic analysis of the pol gene identified two clusters in subtype B sequences. Cluster 1 consisted of 21 cases including MSM and other routes of infection, and cluster 2 consisted of eight MSM cases. The tree analyses demonstrated very short branch lengths in cluster 1, suggesting a surprisingly active expansion of HIV-1 transmission during a short period with the same ancestor virus. Evolutionary analysis indicated that the outbreak started around the early 2000s. This study identified a current hot spot of HIV-1 transmission and potential seed of the epidemic in Mongolia. Comprehensive preventive measures targeting this group are urgently needed.

  11. Deep divergence and structure in the Tropical Oceanic Pacific: a multilocus phylogeography of a widespread gekkonid lizard (Squamata: Gekkonidae: Gehyra oceanica)

    USGS Publications Warehouse

    Tonione, Maria A.; Fisher, Robert N.; Zhu, Catherine; Moritz, Craig

    2016-01-01

    Aim The islands of the Tropical Oceanic Pacific (TOP) host both local radiations and widespread, colonizing species. The few phylogeographical analyses of widespread species often point to recent human-aided expansions through the Pacific, suggesting that the communities are recently assembled. Here we apply multilocus data to infer biogeographical history of the gekkonid lizard, Gehyra oceanica, which is widespread, but for which prior analyses suggested a pre-human history and in situ diversification. Location Tropical Oceanic Pacific. Methods We generated a data set including mtDNA and diagnostic SNPs for 173 individuals of G. oceanica spanning Micronesia, Melanesia, and Polynesia. For a subset of these individuals, we also sequenced nuclear loci. From these data, we performed maximum likelihood and Bayesian inference to reveal major clades. We also performed Bayesian clustering analyses and coalescence–based species delimitation tests to infer the number of species in this area. Results We found evidence for six independent evolutionary lineages (candidate species) within G. oceanica that diverged between the Pliocene and the early Pleistocene, with high diversity through northern Melanesia, and pairing of northern Melanesian endemic taxa with widespread lineages across Micronesia and Polynesia. Main conclusions The islands of northern Melanesia not only have unrecognized diversity, but also were the source of independent expansions of lineages through the more remote northern and eastern Pacific. These results highlight the very different evolutionary histories of island faunas on remote archipelagos versus those across Melanesia and point to the need for more intensive studies of fauna within Melanesia if we are to understand the evolution of diversity across the tropical Pacific.

  12. Predictive distributions for between-study heterogeneity and simple methods for their application in Bayesian meta-analysis

    PubMed Central

    Turner, Rebecca M; Jackson, Dan; Wei, Yinghui; Thompson, Simon G; Higgins, Julian P T

    2015-01-01

    Numerous meta-analyses in healthcare research combine results from only a small number of studies, for which the variance representing between-study heterogeneity is estimated imprecisely. A Bayesian approach to estimation allows external evidence on the expected magnitude of heterogeneity to be incorporated. The aim of this paper is to provide tools that improve the accessibility of Bayesian meta-analysis. We present two methods for implementing Bayesian meta-analysis, using numerical integration and importance sampling techniques. Based on 14 886 binary outcome meta-analyses in the Cochrane Database of Systematic Reviews, we derive a novel set of predictive distributions for the degree of heterogeneity expected in 80 settings depending on the outcomes assessed and comparisons made. These can be used as prior distributions for heterogeneity in future meta-analyses. The two methods are implemented in R, for which code is provided. Both methods produce equivalent results to standard but more complex Markov chain Monte Carlo approaches. The priors are derived as log-normal distributions for the between-study variance, applicable to meta-analyses of binary outcomes on the log odds-ratio scale. The methods are applied to two example meta-analyses, incorporating the relevant predictive distributions as prior distributions for between-study heterogeneity. We have provided resources to facilitate Bayesian meta-analysis, in a form accessible to applied researchers, which allow relevant prior information on the degree of heterogeneity to be incorporated. © 2014 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:25475839

  13. Rising prevalence of non-B HIV-1 subtypes in North Carolina and evidence for local onward transmission.

    PubMed

    Dennis, Ann M; Hué, Stephane; Learner, Emily; Sebastian, Joseph; Miller, William C; Eron, Joseph J

    2017-01-01

    HIV-1 diversity is increasing in North American and European cohorts which may have public health implications. However, little is known about non-B subtype diversity in the southern United States, despite the region being the epicenter of the nation's epidemic. We characterized HIV-1 diversity and transmission clusters to identify the extent to which non-B strains are transmitted locally. We conducted cross-sectional analyses of HIV-1 partial pol sequences collected from 1997 to 2014 from adults accessing routine clinical care in North Carolina (NC). Subtypes were evaluated using COMET and phylogenetic analysis. Putative transmission clusters were identified using maximum-likelihood trees. Clusters involving non-B strains were confirmed and their dates of origin were estimated using Bayesian phylogenetics. Data were combined with demographic information collected at the time of sample collection and country of origin for a subset of patients. Among 24,972 sequences from 15,246 persons, the non-B subtype prevalence increased from 0% to 3.46% over the study period. Of 325 persons with non-B subtypes, diversity was high with over 15 pure subtypes and recombinants; subtype C (28.9%) and CRF02_AG (24.0%) were most common. While identification of transmission clusters was lower for persons with non-B versus B subtypes, several local transmission clusters (≥3 persons) involving non-B subtypes were identified and all were presumably due to heterosexual transmission. Prevalence of non-B subtype diversity remains low in NC but a statistically significant rise was identified over time which likely reflects multiple importation. However, the combined phylogenetic clustering analysis reveals evidence for local onward transmission. Detection of these non-B clusters suggests heterosexual transmission and may guide diagnostic and prevention interventions.

  14. PyClone: statistical inference of clonal population structure in cancer.

    PubMed

    Roth, Andrew; Khattra, Jaswinder; Yap, Damian; Wan, Adrian; Laks, Emma; Biele, Justina; Ha, Gavin; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P

    2014-04-01

    We introduce PyClone, a statistical model for inference of clonal population structures in cancers. PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy-number changes and normal-cell contamination. Single-cell sequencing validation demonstrates PyClone's accuracy.

  15. A practical Bayesian stepped wedge design for community-based cluster-randomized clinical trials: The British Columbia Telehealth Trial.

    PubMed

    Cunanan, Kristen M; Carlin, Bradley P; Peterson, Kevin A

    2016-12-01

    Many clinical trial designs are impractical for community-based clinical intervention trials. Stepped wedge trial designs provide practical advantages, but few descriptions exist of their clinical implementational features, statistical design efficiencies, and limitations. Enhance efficiency of stepped wedge trial designs by evaluating the impact of design characteristics on statistical power for the British Columbia Telehealth Trial. The British Columbia Telehealth Trial is a community-based, cluster-randomized, controlled clinical trial in rural and urban British Columbia. To determine the effect of an Internet-based telehealth intervention on healthcare utilization, 1000 subjects with an existing diagnosis of congestive heart failure or type 2 diabetes will be enrolled from 50 clinical practices. Hospital utilization is measured using a composite of disease-specific hospital admissions and emergency visits. The intervention comprises online telehealth data collection and counseling provided to support a disease-specific action plan developed by the primary care provider. The planned intervention is sequentially introduced across all participating practices. We adopt a fully Bayesian, Markov chain Monte Carlo-driven statistical approach, wherein we use simulation to determine the effect of cluster size, sample size, and crossover interval choice on type I error and power to evaluate differences in hospital utilization. For our Bayesian stepped wedge trial design, simulations suggest moderate decreases in power when crossover intervals from control to intervention are reduced from every 3 to 2 weeks, and dramatic decreases in power as the numbers of clusters decrease. Power and type I error performance were not notably affected by the addition of nonzero cluster effects or a temporal trend in hospitalization intensity. Stepped wedge trial designs that intervene in small clusters across longer periods can provide enhanced power to evaluate comparative effectiveness, while offering practical implementation advantages in geographic stratification, temporal change, use of existing data, and resource distribution. Current population estimates were used; however, models may not reflect actual event rates during the trial. In addition, temporal or spatial heterogeneity can bias treatment effect estimates. © The Author(s) 2016.

  16. Bayesian techniques for analyzing group differences in the Iowa Gambling Task: A case study of intuitive and deliberate decision-makers.

    PubMed

    Steingroever, Helen; Pachur, Thorsten; Šmíra, Martin; Lee, Michael D

    2018-06-01

    The Iowa Gambling Task (IGT) is one of the most popular experimental paradigms for comparing complex decision-making across groups. Most commonly, IGT behavior is analyzed using frequentist tests to compare performance across groups, and to compare inferred parameters of cognitive models developed for the IGT. Here, we present a Bayesian alternative based on Bayesian repeated-measures ANOVA for comparing performance, and a suite of three complementary model-based methods for assessing the cognitive processes underlying IGT performance. The three model-based methods involve Bayesian hierarchical parameter estimation, Bayes factor model comparison, and Bayesian latent-mixture modeling. We illustrate these Bayesian methods by applying them to test the extent to which differences in intuitive versus deliberate decision style are associated with differences in IGT performance. The results show that intuitive and deliberate decision-makers behave similarly on the IGT, and the modeling analyses consistently suggest that both groups of decision-makers rely on similar cognitive processes. Our results challenge the notion that individual differences in intuitive and deliberate decision styles have a broad impact on decision-making. They also highlight the advantages of Bayesian methods, especially their ability to quantify evidence in favor of the null hypothesis, and that they allow model-based analyses to incorporate hierarchical and latent-mixture structures.

  17. Latent structure modeling underlying theophylline tablet formulations using a Bayesian network based on a self-organizing map clustering.

    PubMed

    Yasuda, Akihito; Onuki, Yoshinori; Obata, Yasuko; Takayama, Kozo

    2015-01-01

    The "quality by design" concept in pharmaceutical formulation development requires the establishment of a science-based rationale and design space. In this article, we integrate thin-plate spline (TPS) interpolation, Kohonen's self-organizing map (SOM) and a Bayesian network (BN) to visualize the latent structure underlying causal factors and pharmaceutical responses. As a model pharmaceutical product, theophylline tablets were prepared using a standard formulation. We measured the tensile strength and disintegration time as response variables and the compressibility, cohesion and dispersibility of the pretableting blend as latent variables. We predicted these variables quantitatively using nonlinear TPS, generated a large amount of data on pretableting blends and tablets and clustered these data into several clusters using a SOM. Our results show that we are able to predict the experimental values of the latent and response variables with a high degree of accuracy and are able to classify the tablet data into several distinct clusters. In addition, to visualize the latent structure between the causal and latent factors and the response variables, we applied a BN method to the SOM clustering results. We found that despite having inserted latent variables between the causal factors and response variables, their relation is equivalent to the results for the SOM clustering, and thus we are able to explain the underlying latent structure. Consequently, this technique provides a better understanding of the relationships between causal factors and pharmaceutical responses in theophylline tablet formulation.

  18. Quantitative comparison of alternative methods for coarse-graining biological networks

    PubMed Central

    Bowman, Gregory R.; Meng, Luming; Huang, Xuhui

    2013-01-01

    Markov models and master equations are a powerful means of modeling dynamic processes like protein conformational changes. However, these models are often difficult to understand because of the enormous number of components and connections between them. Therefore, a variety of methods have been developed to facilitate understanding by coarse-graining these complex models. Here, we employ Bayesian model comparison to determine which of these coarse-graining methods provides the models that are most faithful to the original set of states. We find that the Bayesian agglomerative clustering engine and the hierarchical Nyström expansion graph (HNEG) typically provide the best performance. Surprisingly, the original Perron cluster cluster analysis (PCCA) method often provides the next best results, outperforming the newer PCCA+ method and the most probable paths algorithm. We also show that the differences between the models are qualitatively significant, rather than being minor shifts in the boundaries between states. The performance of the methods correlates well with the entropy of the resulting coarse-grainings, suggesting that finding states with more similar populations (i.e., avoiding low population states that may just be noise) gives better results. PMID:24089717

  19. Recent Transmission Clustering of HIV-1 C and CRF17_BF Strains Characterized by NNRTI-Related Mutations among Newly Diagnosed Men in Central Italy

    PubMed Central

    Orchi, Nicoletta; Gori, Caterina; Bertoli, Ada; Forbici, Federica; Montella, Francesco; Pennica, Alfredo; De Carli, Gabriella; Giuliani, Massimo; Continenza, Fabio; Pinnetti, Carmela; Nicastri, Emanuele; Ceccherini-Silberstein, Francesca; Mastroianni, Claudio Maria; Girardi, Enrico; Andreoni, Massimo; Antinori, Andrea; Santoro, Maria Mercedes; Perno, Carlo Federico

    2015-01-01

    Background Increased evidence of relevant HIV-1 epidemic transmission in European countries is being reported, with an increased circulation of non-B-subtypes. Here, we present two recent HIV-1 non-B transmission clusters characterized by NNRTI-related amino-acidic mutations among newly diagnosed HIV-1 infected men, living in Rome (Central-Italy). Methods Pol and V3 sequences were available at the time of diagnosis for all individuals. Maximum-Likelihood and Bayesian phylogenetic-trees with bootstrap and Bayesian-probability supports defined transmission-clusters. HIV-1 drug-resistance and V3-tropism were also evaluated. Results Among 534 new HIV-1 non-B cases, diagnosed from 2011 to 2014, in Central-Italy, 35 carried virus gathering in two distinct clusters, including 27 HIV-1 C and 8 CRF17_BF subtypes, respectively. Both clusters were centralized in Rome, and their origin was estimated to have been after 2007. All individuals within both clusters were males and 37.1% of them had been recently-infected. While C-cluster was entirely composed by Italian men-who-have-sex-with-men, with a median-age of 34 years (IQR:30–39), individuals in CRF17_BF-cluster were older, with a median-age of 51 years (IQR:48–59) and almost all reported sexual-contacts with men and women. All carried R5-tropic viruses, with evidence of atypical or resistance amino-acidic mutations related to NNRTI-drugs (K103Q in C-cluster, and K101E+E138K in CRF17_BF-cluster). Conclusions These two epidemiological clusters provided evidence of a strong and recent circulation of C and CRF17_BF strains in central Italy, characterized by NNRTI-related mutations among men engaging in high-risk behaviours. These findings underline the role of molecular epidemiology in identifying groups at increased risk of HIV-1 transmission, and in enhancing additional prevention efforts. PMID:26270824

  20. Understanding the Scalability of Bayesian Network Inference using Clique Tree Growth Curves

    NASA Technical Reports Server (NTRS)

    Mengshoel, Ole Jakob

    2009-01-01

    Bayesian networks (BNs) are used to represent and efficiently compute with multi-variate probability distributions in a wide range of disciplines. One of the main approaches to perform computation in BNs is clique tree clustering and propagation. In this approach, BN computation consists of propagation in a clique tree compiled from a Bayesian network. There is a lack of understanding of how clique tree computation time, and BN computation time in more general, depends on variations in BN size and structure. On the one hand, complexity results tell us that many interesting BN queries are NP-hard or worse to answer, and it is not hard to find application BNs where the clique tree approach in practice cannot be used. On the other hand, it is well-known that tree-structured BNs can be used to answer probabilistic queries in polynomial time. In this article, we develop an approach to characterizing clique tree growth as a function of parameters that can be computed in polynomial time from BNs, specifically: (i) the ratio of the number of a BN's non-root nodes to the number of root nodes, or (ii) the expected number of moral edges in their moral graphs. Our approach is based on combining analytical and experimental results. Analytically, we partition the set of cliques in a clique tree into different sets, and introduce a growth curve for each set. For the special case of bipartite BNs, we consequently have two growth curves, a mixed clique growth curve and a root clique growth curve. In experiments, we systematically increase the degree of the root nodes in bipartite Bayesian networks, and find that root clique growth is well-approximated by Gompertz growth curves. It is believed that this research improves the understanding of the scaling behavior of clique tree clustering, provides a foundation for benchmarking and developing improved BN inference and machine learning algorithms, and presents an aid for analytical trade-off studies of clique tree clustering using growth curves.

  1. Developing a new Bayesian Risk Index for risk evaluation of soil contamination.

    PubMed

    Albuquerque, M T D; Gerassis, S; Sierra, C; Taboada, J; Martín, J E; Antunes, I M H R; Gallego, J R

    2017-12-15

    Industrial and agricultural activities heavily constrain soil quality. Potentially Toxic Elements (PTEs) are a threat to public health and the environment alike. In this regard, the identification of areas that require remediation is crucial. In the herein research a geochemical dataset (230 samples) comprising 14 elements (Cu, Pb, Zn, Ag, Ni, Mn, Fe, As, Cd, V, Cr, Ti, Al and S) was gathered throughout eight different zones distinguished by their main activity, namely, recreational, agriculture/livestock and heavy industry in the Avilés Estuary (North of Spain). Then a stratified systematic sampling method was used at short, medium, and long distances from each zone to obtain a representative picture of the total variability of the selected attributes. The information was then combined in four risk classes (Low, Moderate, High, Remediation) following reference values from several sediment quality guidelines (SQGs). A Bayesian analysis, inferred for each zone, allowed the characterization of PTEs correlations, the unsupervised learning network technique proving to be the best fit. Based on the Bayesian network structure obtained, Pb, As and Mn were selected as key contamination parameters. For these 3 elements, the conditional probability obtained was allocated to each observed point, and a simple, direct index (Bayesian Risk Index-BRI) was constructed as a linear rating of the pre-defined risk classes weighted by the previously obtained probability. Finally, the BRI underwent geostatistical modeling. One hundred Sequential Gaussian Simulations (SGS) were computed. The Mean Image and the Standard Deviation maps were obtained, allowing the definition of High/Low risk clusters (Local G clustering) and the computation of spatial uncertainty. High-risk clusters are mainly distributed within the area with the highest altitude (agriculture/livestock) showing an associated low spatial uncertainty, clearly indicating the need for remediation. Atmospheric emissions, mainly derived from the metallurgical industry, contribute to soil contamination by PTEs. Copyright © 2017 Elsevier B.V. All rights reserved.

  2. Species delimitation in the Stenocereus griseus (Cactaceae) species complex reveals a new species, S. huastecorum

    PubMed Central

    Alvarado-Sizzo, Hernán; Parra, Fabiola; Arreola-Nava, Hilda Julieta; Terrazas, Teresa; Sánchez, Cristian

    2018-01-01

    The Stenocereus griseus species complex (SGSC) has long been considered taxonomically challenging because the number of taxa belonging to the complex and their geographical boundaries remain poorly understood. Bayesian clustering and genetic distance-based methods were used based on nine microsatellite loci in 377 individuals of three main putative species of the complex. The resulting genetic clusters were assessed for ecological niche divergence and areolar morphology, particularly spination patterns. We based our species boundaries on concordance between genetic, ecological, and morphological data, and were able to resolve four species, three of them corresponding to S. pruinosus from central Mexico, S. laevigatus from southern Mexico, and S. griseus from northern South America. A fourth species, previously considered to be S. griseus and commonly misidentified as S. pruinosus in northern Mexico showed significant genetic, ecological, and morphological differentiation suggesting that it should be considered a new species, S. huastecorum, which we describe here. We show that population genetic analyses, ecological niche modeling, and morphological studies are complementary approaches for delimiting species in taxonomically challenging plant groups such as the SGSC. PMID:29342184

  3. Species delimitation in the Stenocereus griseus (Cactaceae) species complex reveals a new species, S. huastecorum.

    PubMed

    Alvarado-Sizzo, Hernán; Casas, Alejandro; Parra, Fabiola; Arreola-Nava, Hilda Julieta; Terrazas, Teresa; Sánchez, Cristian

    2018-01-01

    The Stenocereus griseus species complex (SGSC) has long been considered taxonomically challenging because the number of taxa belonging to the complex and their geographical boundaries remain poorly understood. Bayesian clustering and genetic distance-based methods were used based on nine microsatellite loci in 377 individuals of three main putative species of the complex. The resulting genetic clusters were assessed for ecological niche divergence and areolar morphology, particularly spination patterns. We based our species boundaries on concordance between genetic, ecological, and morphological data, and were able to resolve four species, three of them corresponding to S. pruinosus from central Mexico, S. laevigatus from southern Mexico, and S. griseus from northern South America. A fourth species, previously considered to be S. griseus and commonly misidentified as S. pruinosus in northern Mexico showed significant genetic, ecological, and morphological differentiation suggesting that it should be considered a new species, S. huastecorum, which we describe here. We show that population genetic analyses, ecological niche modeling, and morphological studies are complementary approaches for delimiting species in taxonomically challenging plant groups such as the SGSC.

  4. Patterns of population structure and environmental associations to aridity across the range of loblolly pine (Pinus taeda L., Pinaceae).

    PubMed

    Eckert, Andrew J; van Heerwaarden, Joost; Wegrzyn, Jill L; Nelson, C Dana; Ross-Ibarra, Jeffrey; González-Martínez, Santíago C; Neale, David B

    2010-07-01

    Natural populations of forest trees exhibit striking phenotypic adaptations to diverse environmental gradients, thereby making them appealing subjects for the study of genes underlying ecologically relevant phenotypes. Here, we use a genome-wide data set of single nucleotide polymorphisms genotyped across 3059 functional genes to study patterns of population structure and identify loci associated with aridity across the natural range of loblolly pine (Pinus taeda L.). Overall patterns of population structure, as inferred using principal components and Bayesian cluster analyses, were consistent with three genetic clusters likely resulting from expansions out of Pleistocene refugia located in Mexico and Florida. A novel application of association analysis, which removes the confounding effects of shared ancestry on correlations between genetic and environmental variation, identified five loci correlated with aridity. These loci were primarily involved with abiotic stress response to temperature and drought. A unique set of 24 loci was identified as F(ST) outliers on the basis of the genetic clusters identified previously and after accounting for expansions out of Pleistocene refugia. These loci were involved with a diversity of physiological processes. Identification of nonoverlapping sets of loci highlights the fundamental differences implicit in the use of either method and suggests a pluralistic, yet complementary, approach to the identification of genes underlying ecologically relevant phenotypes.

  5. Efficient Matrix Models for Relational Learning

    DTIC Science & Technology

    2009-10-01

    74 4.5.3 Comparison to pLSI- pHITS . . . . . . . . . . . . . . . . . . . . 76 5 Hierarchical Bayesian Collective...Behaviour of Newton vs. Stochastic Newton on a three-factor model. 4.5.3 Comparison to pLSI- pHITS Caveat: Collective Matrix Factorization makes no guarantees...leads to better results; and another where a co-clustering model, pLSI- pHITS , has the advantage. pLSI- pHITS [24] is a relational clustering technique

  6. Hierarchical imputation of systematically and sporadically missing data: An approximate Bayesian approach using chained equations.

    PubMed

    Jolani, Shahab

    2018-03-01

    In health and medical sciences, multiple imputation (MI) is now becoming popular to obtain valid inferences in the presence of missing data. However, MI of clustered data such as multicenter studies and individual participant data meta-analysis requires advanced imputation routines that preserve the hierarchical structure of data. In clustered data, a specific challenge is the presence of systematically missing data, when a variable is completely missing in some clusters, and sporadically missing data, when it is partly missing in some clusters. Unfortunately, little is known about how to perform MI when both types of missing data occur simultaneously. We develop a new class of hierarchical imputation approach based on chained equations methodology that simultaneously imputes systematically and sporadically missing data while allowing for arbitrary patterns of missingness among them. Here, we use a random effect imputation model and adopt a simplification over fully Bayesian techniques such as Gibbs sampler to directly obtain draws of parameters within each step of the chained equations. We justify through theoretical arguments and extensive simulation studies that the proposed imputation methodology has good statistical properties in terms of bias and coverage rates of parameter estimates. An illustration is given in a case study with eight individual participant datasets. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Molecular epidemiology of hepatitis B virus in Misiones, Argentina.

    PubMed

    Mojsiejczuk, Laura Noelia; Torres, Carolina; Sevic, Ina; Badano, Inés; Malan, Richard; Flichman, Diego Martin; Liotta, Domingo Javier; Campos, Rodolfo Hector

    2016-10-01

    Hepatitis B virus (HBV) infection is a major public health problem worldwide. The aims of this study were to describe the molecular epidemiology of HBV in the Province of Misiones, Argentina and estimate the phylodynamic of the main groups in a Bayesian coalescent framework. To this end, partial or complete genome sequences were obtained from 52 blood donor candidates. The phylogenetic analysis based on partial sequences of S/P region showed a predominance of genotype D (65.4%), followed by genotype F (30.8%) and genotype A as a minority (3.8%). At subgenotype level, the circulation of subgenotypes D3 (42.3%), D2 (13.5%), F1b (11.5%) and F4 (9.6%) was mainly identified. The Bayesian coalescent analysis of 29 complete genome sequences for the main groups revealed that the subgenotypes D2 and D3 had several introductions to the region, with ancestors dating back from 1921 to 1969 and diversification events until the late '70s. The genotype F in Misiones has a more recent history; subgenotype F4 isolates were intermixed with sequences from Argentina and neighboring countries and only one significant cluster dated back in 1994 was observed. Subgenotype F1b isolates exhibited low genetic distance and formed a closely related monophyletic cluster, suggesting a very recent introduction. In conclusion, the phylogenetic and coalescent analyses showed that the European genotype D has a higher circulation, a longer history of diversification and may be responsible for the largest proportion of chronic HBV infections in the Province of Misiones. Genotype F, especially subgenotype F1b, had a more recent introduction and its diversification in the last 20years might be related to its involvement in new transmission events. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Alternative glacial-interglacial refugia demographic hypotheses tested on Cephalocereus columna-trajani (Cactaceae) in the intertropical Mexican drylands

    PubMed Central

    Cornejo-Romero, Amelia; Aguilar-Martínez, Gustavo F.; Medina-Sánchez, Javier; Rendón-Aguilar, Beatriz; Valverde, Pedro Luis; Zavala-Hurtado, Jose Alejandro; Serrato, Alejandra; Rivas-Arancibia, Sombra; Pérez-Hernández, Marco Aurelio; López-Ortega, Gerardo; Jiménez-Sierra, Cecilia

    2017-01-01

    Historic demography changes of plant species adapted to New World arid environments could be consistent with either the Glacial Refugium Hypothesis (GRH), which posits that populations contracted to refuges during the cold-dry glacial and expanded in warm-humid interglacial periods, or with the Interglacial Refugium Hypothesis (IRH), which suggests that populations contracted during interglacials and expanded in glacial times. These contrasting hypotheses are developed in the present study for the giant columnar cactus Cephalocereus columna-trajani in the intertropical Mexican drylands where the effects of Late Quaternary climatic changes on phylogeography of cacti remain largely unknown. In order to determine if the historic demography and phylogeographic structure of the species are consistent with either hypothesis, sequences of the chloroplast regions psbA-trnH and trnT-trnL from 110 individuals from 10 populations comprising the full distribution range of this species were analysed. Standard estimators of genetic diversity and structure were calculated. The historic demography was analysed using a Bayesian approach and the palaeodistribution was derived from ecological niche modelling to determine if, in the arid environments of south-central Mexico, glacial-interglacial cycles drove the genetic divergence and diversification of this species. Results reveal low but statistically significant population differentiation (FST = 0.124, P < 0.001), although very clear geographic clusters are not formed. Genetic diversity, haplotype network and Approximate Bayesian Computation (ABC) demographic analyses suggest a population expansion estimated to have taken place in the Last Interglacial (123.04 kya, 95% CI 115.3–130.03). The species palaeodistribution is consistent with the ABC analyses and indicates that the potential area of palaedistribution and climatic suitability were larger during the Last Interglacial and Holocene than in the Last Glacial Maximum. Overall, these results suggest that C. columna-trajani experienced an expansion following the warm conditions of interglacials, in accordance with the GRH. PMID:28426818

  9. Alternative glacial-interglacial refugia demographic hypotheses tested on Cephalocereus columna-trajani (Cactaceae) in the intertropical Mexican drylands.

    PubMed

    Cornejo-Romero, Amelia; Vargas-Mendoza, Carlos Fabián; Aguilar-Martínez, Gustavo F; Medina-Sánchez, Javier; Rendón-Aguilar, Beatriz; Valverde, Pedro Luis; Zavala-Hurtado, Jose Alejandro; Serrato, Alejandra; Rivas-Arancibia, Sombra; Pérez-Hernández, Marco Aurelio; López-Ortega, Gerardo; Jiménez-Sierra, Cecilia

    2017-01-01

    Historic demography changes of plant species adapted to New World arid environments could be consistent with either the Glacial Refugium Hypothesis (GRH), which posits that populations contracted to refuges during the cold-dry glacial and expanded in warm-humid interglacial periods, or with the Interglacial Refugium Hypothesis (IRH), which suggests that populations contracted during interglacials and expanded in glacial times. These contrasting hypotheses are developed in the present study for the giant columnar cactus Cephalocereus columna-trajani in the intertropical Mexican drylands where the effects of Late Quaternary climatic changes on phylogeography of cacti remain largely unknown. In order to determine if the historic demography and phylogeographic structure of the species are consistent with either hypothesis, sequences of the chloroplast regions psbA-trnH and trnT-trnL from 110 individuals from 10 populations comprising the full distribution range of this species were analysed. Standard estimators of genetic diversity and structure were calculated. The historic demography was analysed using a Bayesian approach and the palaeodistribution was derived from ecological niche modelling to determine if, in the arid environments of south-central Mexico, glacial-interglacial cycles drove the genetic divergence and diversification of this species. Results reveal low but statistically significant population differentiation (FST = 0.124, P < 0.001), although very clear geographic clusters are not formed. Genetic diversity, haplotype network and Approximate Bayesian Computation (ABC) demographic analyses suggest a population expansion estimated to have taken place in the Last Interglacial (123.04 kya, 95% CI 115.3-130.03). The species palaeodistribution is consistent with the ABC analyses and indicates that the potential area of palaedistribution and climatic suitability were larger during the Last Interglacial and Holocene than in the Last Glacial Maximum. Overall, these results suggest that C. columna-trajani experienced an expansion following the warm conditions of interglacials, in accordance with the GRH.

  10. Molecular phylogeny of the aquatic beetle family Noteridae (Coleoptera: Adephaga) with an emphasis on data partitioning strategies.

    PubMed

    Baca, Stephen M; Toussaint, Emmanuel F A; Miller, Kelly B; Short, Andrew E Z

    2017-02-01

    The first molecular phylogenetic hypothesis for the aquatic beetle family Noteridae is inferred using DNA sequence data from five gene fragments (mitochondrial and nuclear): COI, H3, 16S, 18S, and 28S. Our analysis is the most comprehensive phylogenetic reconstruction of Noteridae to date, and includes 53 species representing all subfamilies, tribes and 16 of the 17 genera within the family. We examine the impact of data partitioning on phylogenetic inference by comparing two different algorithm-based partitioning strategies: one using predefined subsets of the dataset, and another recently introduced method, which uses the k-means algorithm to iteratively divide the dataset into clusters of sites evolving at similar rates across sampled loci. We conducted both maximum likelihood and Bayesian inference analyses using these different partitioning schemes. Resulting trees are strongly incongruent with prior classifications of Noteridae. We recover variant tree topologies and support values among the implemented partitioning schemes. Bayes factors calculated with marginal likelihoods of Bayesian analyses support a priori partitioning over k-means and unpartitioned data strategies. Our study substantiates the importance of data partitioning in phylogenetic inference, and underscores the use of comparative analyses to determine optimal analytical strategies. Our analyses recover Noterini Thomson to be paraphyletic with respect to three other tribes. The genera Suphisellus Crotch and Hydrocanthus Say are also recovered as paraphyletic. Following the results of the preferred partitioning scheme, we here propose a revised classification of Noteridae, comprising two subfamilies, three tribes and 18 genera. The following taxonomic changes are made: Notomicrinae sensu n. (= Phreatodytinae syn. n.) is expanded to include the tribe Phreatodytini; Noterini sensu n. (= Neohydrocoptini syn. n., Pronoterini syn. n., Tonerini syn. n.) is expanded to include all genera of the Noterinae; The genus Suphisellus Crotch is expanded to include species of Pronoterus Sharp syn. n.; and the former subgenus Sternocanthus Guignot stat. rev. is resurrected from synonymy and elevated to genus rank. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. Bayesian Nonparametric Inference – Why and How

    PubMed Central

    Müller, Peter; Mitra, Riten

    2013-01-01

    We review inference under models with nonparametric Bayesian (BNP) priors. The discussion follows a set of examples for some common inference problems. The examples are chosen to highlight problems that are challenging for standard parametric inference. We discuss inference for density estimation, clustering, regression and for mixed effects models with random effects distributions. While we focus on arguing for the need for the flexibility of BNP models, we also review some of the more commonly used BNP models, thus hopefully answering a bit of both questions, why and how to use BNP. PMID:24368932

  12. Emergence and Continuous Evolution of Genotype 1E Rubella Viruses in China

    PubMed Central

    Zhu, Zhen; Cui, Aili; Wang, Huanhuan; Zhang, Yan; Liu, Chunyu; Wang, Changyin; Zhou, Shujie; Chen, Xia; Zhang, Zhenying; Feng, Daxin; Wang, Yan; Chen, Haiyun; Pan, Zhengfan; Zeng, Xiangjie; Zhou, Jianhui; Wang, Shuang; Chang, Xin; Lei, Yue; Tian, Hong; Liu, Yang; Zhou, Shunde; Zhan, Jun; Chen, Hui; Gu, Suyi; Tian, Xiaoling; Liu, Jianfeng; Chen, Ying; Fu, Hong; Yang, Xiuhui; Zheng, Huanying; Liu, Leng; Zheng, Lei; Gao, Hui; He, Jilan; Sun, Li

    2012-01-01

    In China, rubella vaccination was introduced into the national immunization program in 2008, and a rubella epidemic occurred in the same year. In order to know whether changes in the genotypic distribution of rubella viruses have occurred in the postvaccination era, we investigate in detail the epidemiological profile of rubella in China and estimate the evolutionary rate, molecular clock phylogeny, and demographic history of the predominant rubella virus genotypes circulating in China using Bayesian Markov chain Monte Carlo phylodynamic analyses. 1E was found to be the predominant rubella virus genotype since its initial isolation in China in 2001, and no genotypic shift has occurred since then. The results suggest that the global 1E genotype may have diverged in 1995 and that it has evolved at a mutation rate of 1.65 × 10−3 per site per year. The Chinese 1E rubella virus isolates were grouped into either cluster 1 or cluster 2, which likely originated in 1997 and 2006, respectively. Cluster 1 viruses were found in all provinces examined in this study and had a mutation rate of 1.90 × 10−3 per site per year. The effective number of infections remained constant until 2007, and along with the introduction of rubella vaccine into the national immunization program, although the circulation of cluster 1 viruses has not been interrupted, some viral lineages have disappeared, and the epidemic started a decline that led to a decrease in the effective population size. Cluster 2 viruses were found only in Hainan Province, likely because of importation. PMID:22162559

  13. Emergence and continuous evolution of genotype 1E rubella viruses in China.

    PubMed

    Zhu, Zhen; Cui, Aili; Wang, Huanhuan; Zhang, Yan; Liu, Chunyu; Wang, Changyin; Zhou, Shujie; Chen, Xia; Zhang, Zhenying; Feng, Daxin; Wang, Yan; Chen, Haiyun; Pan, Zhengfan; Zeng, Xiangjie; Zhou, Jianhui; Wang, Shuang; Chang, Xin; Lei, Yue; Tian, Hong; Liu, Yang; Zhou, Shunde; Zhan, Jun; Chen, Hui; Gu, Suyi; Tian, Xiaoling; Liu, Jianfeng; Chen, Ying; Fu, Hong; Yang, Xiuhui; Zheng, Huanying; Liu, Leng; Zheng, Lei; Gao, Hui; He, Jilan; Sun, Li; Xu, Wenbo

    2012-02-01

    In China, rubella vaccination was introduced into the national immunization program in 2008, and a rubella epidemic occurred in the same year. In order to know whether changes in the genotypic distribution of rubella viruses have occurred in the postvaccination era, we investigate in detail the epidemiological profile of rubella in China and estimate the evolutionary rate, molecular clock phylogeny, and demographic history of the predominant rubella virus genotypes circulating in China using Bayesian Markov chain Monte Carlo phylodynamic analyses. 1E was found to be the predominant rubella virus genotype since its initial isolation in China in 2001, and no genotypic shift has occurred since then. The results suggest that the global 1E genotype may have diverged in 1995 and that it has evolved at a mutation rate of 1.65 × 10(-3) per site per year. The Chinese 1E rubella virus isolates were grouped into either cluster 1 or cluster 2, which likely originated in 1997 and 2006, respectively. Cluster 1 viruses were found in all provinces examined in this study and had a mutation rate of 1.90 × 10(-3) per site per year. The effective number of infections remained constant until 2007, and along with the introduction of rubella vaccine into the national immunization program, although the circulation of cluster 1 viruses has not been interrupted, some viral lineages have disappeared, and the epidemic started a decline that led to a decrease in the effective population size. Cluster 2 viruses were found only in Hainan Province, likely because of importation.

  14. Bayesian correction for covariate measurement error: A frequentist evaluation and comparison with regression calibration.

    PubMed

    Bartlett, Jonathan W; Keogh, Ruth H

    2018-06-01

    Bayesian approaches for handling covariate measurement error are well established and yet arguably are still relatively little used by researchers. For some this is likely due to unfamiliarity or disagreement with the Bayesian inferential paradigm. For others a contributory factor is the inability of standard statistical packages to perform such Bayesian analyses. In this paper, we first give an overview of the Bayesian approach to handling covariate measurement error, and contrast it with regression calibration, arguably the most commonly adopted approach. We then argue why the Bayesian approach has a number of statistical advantages compared to regression calibration and demonstrate that implementing the Bayesian approach is usually quite feasible for the analyst. Next, we describe the closely related maximum likelihood and multiple imputation approaches and explain why we believe the Bayesian approach to generally be preferable. We then empirically compare the frequentist properties of regression calibration and the Bayesian approach through simulation studies. The flexibility of the Bayesian approach to handle both measurement error and missing data is then illustrated through an analysis of data from the Third National Health and Nutrition Examination Survey.

  15. CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data.

    PubMed

    Fidaner, Işık Barış; Cankorur-Cetinkaya, Ayca; Dikicioglu, Duygu; Kirdar, Betul; Cemgil, Ali Taylan; Oliver, Stephen G

    2016-02-01

    Simple bioinformatic tools are frequently used to analyse time-series datasets regardless of their ability to deal with transient phenomena, limiting the meaningful information that may be extracted from them. This situation requires the development and exploitation of tailor-made, easy-to-use and flexible tools designed specifically for the analysis of time-series datasets. We present a novel statistical application called CLUSTERnGO, which uses a model-based clustering algorithm that fulfils this need. This algorithm involves two components of operation. Component 1 constructs a Bayesian non-parametric model (Infinite Mixture of Piecewise Linear Sequences) and Component 2, which applies a novel clustering methodology (Two-Stage Clustering). The software can also assign biological meaning to the identified clusters using an appropriate ontology. It applies multiple hypothesis testing to report the significance of these enrichments. The algorithm has a four-phase pipeline. The application can be executed using either command-line tools or a user-friendly Graphical User Interface. The latter has been developed to address the needs of both specialist and non-specialist users. We use three diverse test cases to demonstrate the flexibility of the proposed strategy. In all cases, CLUSTERnGO not only outperformed existing algorithms in assigning unique GO term enrichments to the identified clusters, but also revealed novel insights regarding the biological systems examined, which were not uncovered in the original publications. The C++ and QT source codes, the GUI applications for Windows, OS X and Linux operating systems and user manual are freely available for download under the GNU GPL v3 license at http://www.cmpe.boun.edu.tr/content/CnG. sgo24@cam.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  16. Assessment of phylogenetic sensitivity for reconstructing HIV-1 epidemiological relationships.

    PubMed

    Beloukas, Apostolos; Magiorkinis, Emmanouil; Magiorkinis, Gkikas; Zavitsanou, Asimina; Karamitros, Timokratis; Hatzakis, Angelos; Paraskevis, Dimitrios

    2012-06-01

    Phylogenetic analysis has been extensively used as a tool for the reconstruction of epidemiological relations for research or for forensic purposes. It was our objective to assess the sensitivity of different phylogenetic methods and various phylogenetic programs to reconstruct epidemiological links among HIV-1 infected patients that is the probability to reveal a true transmission relationship. Multiple datasets (90) were prepared consisting of HIV-1 sequences in protease (PR) and partial reverse transcriptase (RT) sampled from patients with documented epidemiological relationship (target population), and from unrelated individuals (control population) belonging to the same HIV-1 subtype as the target population. Each dataset varied regarding the number, the geographic origin and the transmission risk groups of the sequences among the control population. Phylogenetic trees were inferred by neighbor-joining (NJ), maximum likelihood heuristics (hML) and Bayesian methods. All clusters of sequences belonging to the target population were correctly reconstructed by NJ and Bayesian methods receiving high bootstrap and posterior probability (PP) support, respectively. On the other hand, TreePuzzle failed to reconstruct or provide significant support for several clusters; high puzzling step support was associated with the inclusion of control sequences from the same geographic area as the target population. In contrary, all clusters were correctly reconstructed by hML as implemented in PhyML 3.0 receiving high bootstrap support. We report that under the conditions of our study, hML using PhyML, NJ and Bayesian methods were the most sensitive for the reconstruction of epidemiological links mostly from sexually infected individuals. Copyright © 2012 Elsevier B.V. All rights reserved.

  17. Combining Computational Methods for Hit to Lead Optimization in Mycobacterium tuberculosis Drug Discovery

    PubMed Central

    Ekins, Sean; Freundlich, Joel S.; Hobrath, Judith V.; White, E. Lucile; Reynolds, Robert C

    2013-01-01

    Purpose Tuberculosis treatments need to be shorter and overcome drug resistance. Our previous large scale phenotypic high-throughput screening against Mycobacterium tuberculosis (Mtb) has identified 737 active compounds and thousands that are inactive. We have used this data for building computational models as an approach to minimize the number of compounds tested. Methods A cheminformatics clustering approach followed by Bayesian machine learning models (based on publicly available Mtb screening data) was used to illustrate that application of these models for screening set selections can enrich the hit rate. Results In order to explore chemical diversity around active cluster scaffolds of the dose-response hits obtained from our previous Mtb screens a set of 1924 commercially available molecules have been selected and evaluated for antitubercular activity and cytotoxicity using Vero, THP-1 and HepG2 cell lines with 4.3%, 4.2% and 2.7% hit rates, respectively. We demonstrate that models incorporating antitubercular and cytotoxicity data in Vero cells can significantly enrich the selection of non-toxic actives compared to random selection. Across all cell lines, the Molecular Libraries Small Molecule Repository (MLSMR) and cytotoxicity model identified ~10% of the hits in the top 1% screened (>10 fold enrichment). We also showed that seven out of nine Mtb active compounds from different academic published studies and eight out of eleven Mtb active compounds from a pharmaceutical screen (GSK) would have been identified by these Bayesian models. Conclusion Combining clustering and Bayesian models represents a useful strategy for compound prioritization and hit-to lead optimization of antitubercular agents. PMID:24132686

  18. Bayesian molecular dating: opening up the black box.

    PubMed

    Bromham, Lindell; Duchêne, Sebastián; Hua, Xia; Ritchie, Andrew M; Duchêne, David A; Ho, Simon Y W

    2018-05-01

    Molecular dating analyses allow evolutionary timescales to be estimated from genetic data, offering an unprecedented capacity for investigating the evolutionary past of all species. These methods require us to make assumptions about the relationship between genetic change and evolutionary time, often referred to as a 'molecular clock'. Although initially regarded with scepticism, molecular dating has now been adopted in many areas of biology. This broad uptake has been due partly to the development of Bayesian methods that allow complex aspects of molecular evolution, such as variation in rates of change across lineages, to be taken into account. But in order to do this, Bayesian dating methods rely on a range of assumptions about the evolutionary process, which vary in their degree of biological realism and empirical support. These assumptions can have substantial impacts on the estimates produced by molecular dating analyses. The aim of this review is to open the 'black box' of Bayesian molecular dating and have a look at the machinery inside. We explain the components of these dating methods, the important decisions that researchers must make in their analyses, and the factors that need to be considered when interpreting results. We illustrate the effects that the choices of different models and priors can have on the outcome of the analysis, and suggest ways to explore these impacts. We describe some major research directions that may improve the reliability of Bayesian dating. The goal of our review is to help researchers to make informed choices when using Bayesian phylogenetic methods to estimate evolutionary rates and timescales. © 2017 Cambridge Philosophical Society.

  19. Bayesian models for comparative analysis integrating phylogenetic uncertainty.

    PubMed

    de Villemereuil, Pierre; Wells, Jessie A; Edwards, Robert D; Blomberg, Simon P

    2012-06-28

    Uncertainty in comparative analyses can come from at least two sources: a) phylogenetic uncertainty in the tree topology or branch lengths, and b) uncertainty due to intraspecific variation in trait values, either due to measurement error or natural individual variation. Most phylogenetic comparative methods do not account for such uncertainties. Not accounting for these sources of uncertainty leads to false perceptions of precision (confidence intervals will be too narrow) and inflated significance in hypothesis testing (e.g. p-values will be too small). Although there is some application-specific software for fitting Bayesian models accounting for phylogenetic error, more general and flexible software is desirable. We developed models to directly incorporate phylogenetic uncertainty into a range of analyses that biologists commonly perform, using a Bayesian framework and Markov Chain Monte Carlo analyses. We demonstrate applications in linear regression, quantification of phylogenetic signal, and measurement error models. Phylogenetic uncertainty was incorporated by applying a prior distribution for the phylogeny, where this distribution consisted of the posterior tree sets from Bayesian phylogenetic tree estimation programs. The models were analysed using simulated data sets, and applied to a real data set on plant traits, from rainforest plant species in Northern Australia. Analyses were performed using the free and open source software OpenBUGS and JAGS. Incorporating phylogenetic uncertainty through an empirical prior distribution of trees leads to more precise estimation of regression model parameters than using a single consensus tree and enables a more realistic estimation of confidence intervals. In addition, models incorporating measurement errors and/or individual variation, in one or both variables, are easily formulated in the Bayesian framework. We show that BUGS is a useful, flexible general purpose tool for phylogenetic comparative analyses, particularly for modelling in the face of phylogenetic uncertainty and accounting for measurement error or individual variation in explanatory variables. Code for all models is provided in the BUGS model description language.

  20. Bayesian models for comparative analysis integrating phylogenetic uncertainty

    PubMed Central

    2012-01-01

    Background Uncertainty in comparative analyses can come from at least two sources: a) phylogenetic uncertainty in the tree topology or branch lengths, and b) uncertainty due to intraspecific variation in trait values, either due to measurement error or natural individual variation. Most phylogenetic comparative methods do not account for such uncertainties. Not accounting for these sources of uncertainty leads to false perceptions of precision (confidence intervals will be too narrow) and inflated significance in hypothesis testing (e.g. p-values will be too small). Although there is some application-specific software for fitting Bayesian models accounting for phylogenetic error, more general and flexible software is desirable. Methods We developed models to directly incorporate phylogenetic uncertainty into a range of analyses that biologists commonly perform, using a Bayesian framework and Markov Chain Monte Carlo analyses. Results We demonstrate applications in linear regression, quantification of phylogenetic signal, and measurement error models. Phylogenetic uncertainty was incorporated by applying a prior distribution for the phylogeny, where this distribution consisted of the posterior tree sets from Bayesian phylogenetic tree estimation programs. The models were analysed using simulated data sets, and applied to a real data set on plant traits, from rainforest plant species in Northern Australia. Analyses were performed using the free and open source software OpenBUGS and JAGS. Conclusions Incorporating phylogenetic uncertainty through an empirical prior distribution of trees leads to more precise estimation of regression model parameters than using a single consensus tree and enables a more realistic estimation of confidence intervals. In addition, models incorporating measurement errors and/or individual variation, in one or both variables, are easily formulated in the Bayesian framework. We show that BUGS is a useful, flexible general purpose tool for phylogenetic comparative analyses, particularly for modelling in the face of phylogenetic uncertainty and accounting for measurement error or individual variation in explanatory variables. Code for all models is provided in the BUGS model description language. PMID:22741602

  1. Evolutionary Dynamics of West Nile Virus in the United States, 1999–2011: Phylogeny, Selection Pressure and Evolutionary Time-Scale Analysis

    PubMed Central

    Chancey, Caren; Ball, Christopher; Akolkar, Namita; Land, Kevin J.; Winkelman, Valerie; Stramer, Susan L.; Kramer, Laura D.; Rios, Maria

    2013-01-01

    West Nile virus (WNV), an arbovirus maintained in a bird-mosquito enzootic cycle, can infect other vertebrates including humans. WNV was first reported in the US in 1999 where, to date, three genotypes belonging to WNV lineage I have been described (NY99, WN02, SW/WN03). We report here the WNV sequences obtained from two birds, one mosquito, and 29 selected human samples acquired during the US epidemics from 2006–2011 and our examination of the evolutionary dynamics in the open-reading frame of WNV isolates reported from 1999–2011. Maximum-likelihood and Bayesian methods were used to perform the phylogenetic analyses and selection pressure analyses were conducted with the HyPhy package. Phylogenetic analysis identified human WNV isolates within the main WNV genotypes that have circulated in the US. Within genotype SW/WN03, we have identified a cluster with strains derived from blood donors and birds from Idaho and North Dakota collected during 2006–2007, termed here MW/WN06. Using different codon-based and branch-site selection models, we detected a number of codons subjected to positive pressure in WNV genes. The mean nucleotide substitution rate for WNV isolates obtained from humans was calculated to be 5.06×10−4 substitutions/site/year (s/s/y). The Bayesian skyline plot shows that after a period of high genetic variability following the introduction of WNV into the US, the WNV population appears to have reached genetic stability. The establishment of WNV in the US represents a unique opportunity to understand how an arbovirus adapts and evolves in a naïve environment. We describe a novel, well-supported cluster of WNV formed by strains collected from humans and birds from Idaho and North Dakota. Adequate genetic surveillance is essential to public health since new mutants could potentially affect viral pathogenesis, decrease performance of diagnostic assays, and negatively impact the efficacy of vaccines and the development of specific therapies. PMID:23738027

  2. Diagnostic accuracy of a bayesian latent group analysis for the detection of malingering-related poor effort.

    PubMed

    Ortega, Alonso; Labrenz, Stephan; Markowitsch, Hans J; Piefke, Martina

    2013-01-01

    In the last decade, different statistical techniques have been introduced to improve assessment of malingering-related poor effort. In this context, we have recently shown preliminary evidence that a Bayesian latent group model may help to optimize classification accuracy using a simulation research design. In the present study, we conducted two analyses. Firstly, we evaluated how accurately this Bayesian approach can distinguish between participants answering in an honest way (honest response group) and participants feigning cognitive impairment (experimental malingering group). Secondly, we tested the accuracy of our model in the differentiation between patients who had real cognitive deficits (cognitively impaired group) and participants who belonged to the experimental malingering group. All Bayesian analyses were conducted using the raw scores of a visual recognition forced-choice task (2AFC), the Test of Memory Malingering (TOMM, Trial 2), and the Word Memory Test (WMT, primary effort subtests). The first analysis showed 100% accuracy for the Bayesian model in distinguishing participants of both groups with all effort measures. The second analysis showed outstanding overall accuracy of the Bayesian model when estimates were obtained from the 2AFC and the TOMM raw scores. Diagnostic accuracy of the Bayesian model diminished when using the WMT total raw scores. Despite, overall diagnostic accuracy can still be considered excellent. The most plausible explanation for this decrement is the low performance in verbal recognition and fluency tasks of some patients of the cognitively impaired group. Additionally, the Bayesian model provides individual estimates, p(zi |D), of examinees' effort levels. In conclusion, both high classification accuracy levels and Bayesian individual estimates of effort may be very useful for clinicians when assessing for effort in medico-legal settings.

  3. Bayesian statistics in medicine: a 25 year review.

    PubMed

    Ashby, Deborah

    2006-11-15

    This review examines the state of Bayesian thinking as Statistics in Medicine was launched in 1982, reflecting particularly on its applicability and uses in medical research. It then looks at each subsequent five-year epoch, with a focus on papers appearing in Statistics in Medicine, putting these in the context of major developments in Bayesian thinking and computation with reference to important books, landmark meetings and seminal papers. It charts the growth of Bayesian statistics as it is applied to medicine and makes predictions for the future. From sparse beginnings, where Bayesian statistics was barely mentioned, Bayesian statistics has now permeated all the major areas of medical statistics, including clinical trials, epidemiology, meta-analyses and evidence synthesis, spatial modelling, longitudinal modelling, survival modelling, molecular genetics and decision-making in respect of new technologies.

  4. An introduction to using Bayesian linear regression with clinical data.

    PubMed

    Baldwin, Scott A; Larson, Michael J

    2017-11-01

    Statistical training psychology focuses on frequentist methods. Bayesian methods are an alternative to standard frequentist methods. This article provides researchers with an introduction to fundamental ideas in Bayesian modeling. We use data from an electroencephalogram (EEG) and anxiety study to illustrate Bayesian models. Specifically, the models examine the relationship between error-related negativity (ERN), a particular event-related potential, and trait anxiety. Methodological topics covered include: how to set up a regression model in a Bayesian framework, specifying priors, examining convergence of the model, visualizing and interpreting posterior distributions, interval estimates, expected and predicted values, and model comparison tools. We also discuss situations where Bayesian methods can outperform frequentist methods as well has how to specify more complicated regression models. Finally, we conclude with recommendations about reporting guidelines for those using Bayesian methods in their own research. We provide data and R code for replicating our analyses. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. A Gibbs sampler for Bayesian analysis of site-occupancy data

    USGS Publications Warehouse

    Dorazio, Robert M.; Rodriguez, Daniel Taylor

    2012-01-01

    1. A Bayesian analysis of site-occupancy data containing covariates of species occurrence and species detection probabilities is usually completed using Markov chain Monte Carlo methods in conjunction with software programs that can implement those methods for any statistical model, not just site-occupancy models. Although these software programs are quite flexible, considerable experience is often required to specify a model and to initialize the Markov chain so that summaries of the posterior distribution can be estimated efficiently and accurately. 2. As an alternative to these programs, we develop a Gibbs sampler for Bayesian analysis of site-occupancy data that include covariates of species occurrence and species detection probabilities. This Gibbs sampler is based on a class of site-occupancy models in which probabilities of species occurrence and detection are specified as probit-regression functions of site- and survey-specific covariate measurements. 3. To illustrate the Gibbs sampler, we analyse site-occupancy data of the blue hawker, Aeshna cyanea (Odonata, Aeshnidae), a common dragonfly species in Switzerland. Our analysis includes a comparison of results based on Bayesian and classical (non-Bayesian) methods of inference. We also provide code (based on the R software program) for conducting Bayesian and classical analyses of site-occupancy data.

  6. Bayesian multimodel inference for dose-response studies

    USGS Publications Warehouse

    Link, W.A.; Albers, P.H.

    2007-01-01

    Statistical inference in dose?response studies is model-based: The analyst posits a mathematical model of the relation between exposure and response, estimates parameters of the model, and reports conclusions conditional on the model. Such analyses rarely include any accounting for the uncertainties associated with model selection. The Bayesian inferential system provides a convenient framework for model selection and multimodel inference. In this paper we briefly describe the Bayesian paradigm and Bayesian multimodel inference. We then present a family of models for multinomial dose?response data and apply Bayesian multimodel inferential methods to the analysis of data on the reproductive success of American kestrels (Falco sparveriuss) exposed to various sublethal dietary concentrations of methylmercury.

  7. Nonlinear inversion of electrical resistivity imaging using pruning Bayesian neural networks

    NASA Astrophysics Data System (ADS)

    Jiang, Fei-Bo; Dai, Qian-Wei; Dong, Li

    2016-06-01

    Conventional artificial neural networks used to solve electrical resistivity imaging (ERI) inversion problem suffer from overfitting and local minima. To solve these problems, we propose to use a pruning Bayesian neural network (PBNN) nonlinear inversion method and a sample design method based on the K-medoids clustering algorithm. In the sample design method, the training samples of the neural network are designed according to the prior information provided by the K-medoids clustering results; thus, the training process of the neural network is well guided. The proposed PBNN, based on Bayesian regularization, is used to select the hidden layer structure by assessing the effect of each hidden neuron to the inversion results. Then, the hyperparameter α k , which is based on the generalized mean, is chosen to guide the pruning process according to the prior distribution of the training samples under the small-sample condition. The proposed algorithm is more efficient than other common adaptive regularization methods in geophysics. The inversion of synthetic data and field data suggests that the proposed method suppresses the noise in the neural network training stage and enhances the generalization. The inversion results with the proposed method are better than those of the BPNN, RBFNN, and RRBFNN inversion methods as well as the conventional least squares inversion.

  8. Social deprivation and population density are not associated with small area risk of amyotrophic lateral sclerosis.

    PubMed

    Rooney, James P K; Tobin, Katy; Crampsie, Arlene; Vajda, Alice; Heverin, Mark; McLaughlin, Russell; Staines, Anthony; Hardiman, Orla

    2015-10-01

    Evidence of an association between areal ALS risk and population density has been previously reported. We aim to examine ALS spatial incidence in Ireland using small areas, to compare this analysis with our previous analysis of larger areas and to examine the associations between population density, social deprivation and ALS incidence. Residential area social deprivation has not been previously investigated as a risk factor for ALS. Using the Irish ALS register, we included all cases of ALS diagnosed in Ireland from 1995-2013. 2006 census data was used to calculate age and sex standardised expected cases per small area. Social deprivation was assessed using the pobalHP deprivation index. Bayesian smoothing was used to calculate small area relative risk for ALS, whilst cluster analysis was performed using SaTScan. The effects of population density and social deprivation were tested in two ways: (1) as covariates in the Bayesian spatial model; (2) via post-Bayesian regression. 1701 cases were included. Bayesian smoothed maps of relative risk at small area resolution matched closely to our previous analysis at a larger area resolution. Cluster analysis identified two areas of significant low risk. These areas did not correlate with population density or social deprivation indices. Two areas showing low frequency of ALS have been identified in the Republic of Ireland. These areas do not correlate with population density or residential area social deprivation, indicating that other reasons, such as genetic admixture may account for the observed findings. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. Bayesian data analysis in population ecology: motivations, methods, and benefits

    USGS Publications Warehouse

    Dorazio, Robert

    2016-01-01

    During the 20th century ecologists largely relied on the frequentist system of inference for the analysis of their data. However, in the past few decades ecologists have become increasingly interested in the use of Bayesian methods of data analysis. In this article I provide guidance to ecologists who would like to decide whether Bayesian methods can be used to improve their conclusions and predictions. I begin by providing a concise summary of Bayesian methods of analysis, including a comparison of differences between Bayesian and frequentist approaches to inference when using hierarchical models. Next I provide a list of problems where Bayesian methods of analysis may arguably be preferred over frequentist methods. These problems are usually encountered in analyses based on hierarchical models of data. I describe the essentials required for applying modern methods of Bayesian computation, and I use real-world examples to illustrate these methods. I conclude by summarizing what I perceive to be the main strengths and weaknesses of using Bayesian methods to solve ecological inference problems.

  10. Dual Sticky Hierarchical Dirichlet Process Hidden Markov Model and Its Application to Natural Language Description of Motions.

    PubMed

    Hu, Weiming; Tian, Guodong; Kang, Yongxin; Yuan, Chunfeng; Maybank, Stephen

    2017-09-25

    In this paper, a new nonparametric Bayesian model called the dual sticky hierarchical Dirichlet process hidden Markov model (HDP-HMM) is proposed for mining activities from a collection of time series data such as trajectories. All the time series data are clustered. Each cluster of time series data, corresponding to a motion pattern, is modeled by an HMM. Our model postulates a set of HMMs that share a common set of states (topics in an analogy with topic models for document processing), but have unique transition distributions. For the application to motion trajectory modeling, topics correspond to motion activities. The learnt topics are clustered into atomic activities which are assigned predicates. We propose a Bayesian inference method to decompose a given trajectory into a sequence of atomic activities. On combining the learnt sources and sinks, semantic motion regions, and the learnt sequence of atomic activities, the action represented by the trajectory can be described in natural language in as automatic a way as possible. The effectiveness of our dual sticky HDP-HMM is validated on several trajectory datasets. The effectiveness of the natural language descriptions for motions is demonstrated on the vehicle trajectories extracted from a traffic scene.

  11. Fast genomic predictions via Bayesian G-BLUP and multilocus models of threshold traits including censored Gaussian data.

    PubMed

    Kärkkäinen, Hanni P; Sillanpää, Mikko J

    2013-09-04

    Because of the increased availability of genome-wide sets of molecular markers along with reduced cost of genotyping large samples of individuals, genomic estimated breeding values have become an essential resource in plant and animal breeding. Bayesian methods for breeding value estimation have proven to be accurate and efficient; however, the ever-increasing data sets are placing heavy demands on the parameter estimation algorithms. Although a commendable number of fast estimation algorithms are available for Bayesian models of continuous Gaussian traits, there is a shortage for corresponding models of discrete or censored phenotypes. In this work, we consider a threshold approach of binary, ordinal, and censored Gaussian observations for Bayesian multilocus association models and Bayesian genomic best linear unbiased prediction and present a high-speed generalized expectation maximization algorithm for parameter estimation under these models. We demonstrate our method with simulated and real data. Our example analyses suggest that the use of the extra information present in an ordered categorical or censored Gaussian data set, instead of dichotomizing the data into case-control observations, increases the accuracy of genomic breeding values predicted by Bayesian multilocus association models or by Bayesian genomic best linear unbiased prediction. Furthermore, the example analyses indicate that the correct threshold model is more accurate than the directly used Gaussian model with a censored Gaussian data, while with a binary or an ordinal data the superiority of the threshold model could not be confirmed.

  12. Fast Genomic Predictions via Bayesian G-BLUP and Multilocus Models of Threshold Traits Including Censored Gaussian Data

    PubMed Central

    Kärkkäinen, Hanni P.; Sillanpää, Mikko J.

    2013-01-01

    Because of the increased availability of genome-wide sets of molecular markers along with reduced cost of genotyping large samples of individuals, genomic estimated breeding values have become an essential resource in plant and animal breeding. Bayesian methods for breeding value estimation have proven to be accurate and efficient; however, the ever-increasing data sets are placing heavy demands on the parameter estimation algorithms. Although a commendable number of fast estimation algorithms are available for Bayesian models of continuous Gaussian traits, there is a shortage for corresponding models of discrete or censored phenotypes. In this work, we consider a threshold approach of binary, ordinal, and censored Gaussian observations for Bayesian multilocus association models and Bayesian genomic best linear unbiased prediction and present a high-speed generalized expectation maximization algorithm for parameter estimation under these models. We demonstrate our method with simulated and real data. Our example analyses suggest that the use of the extra information present in an ordered categorical or censored Gaussian data set, instead of dichotomizing the data into case-control observations, increases the accuracy of genomic breeding values predicted by Bayesian multilocus association models or by Bayesian genomic best linear unbiased prediction. Furthermore, the example analyses indicate that the correct threshold model is more accurate than the directly used Gaussian model with a censored Gaussian data, while with a binary or an ordinal data the superiority of the threshold model could not be confirmed. PMID:23821618

  13. The Bayesian approach to reporting GSR analysis results: some first-hand experiences

    NASA Astrophysics Data System (ADS)

    Charles, Sebastien; Nys, Bart

    2010-06-01

    The use of Bayesian principles in the reporting of forensic findings has been a matter of interest for some years. Recently, also the GSR community is gradually exploring the advantages of this method, or rather approach, for writing reports. Since last year, our GSR group is adapting reporting procedures to the use of Bayesian principles. The police and magistrates find the reports more directly accessible and useful in their part of the criminal investigation. In the lab we find that, through applying the Bayesian principles, unnecessary analyses can be eliminated and thus time can be freed on the instruments.

  14. Implementing informative priors for heterogeneity in meta-analysis using meta-regression and pseudo data.

    PubMed

    Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T

    2016-12-20

    Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  15. The Gaia-ESO Survey: dynamical models of flattened, rotating globular clusters

    NASA Astrophysics Data System (ADS)

    Jeffreson, S. M. R.; Sanders, J. L.; Evans, N. W.; Williams, A. A.; Gilmore, G. F.; Bayo, A.; Bragaglia, A.; Casey, A. R.; Flaccomio, E.; Franciosini, E.; Hourihane, A.; Jackson, R. J.; Jeffries, R. D.; Jofré, P.; Koposov, S.; Lardo, C.; Lewis, J.; Magrini, L.; Morbidelli, L.; Pancino, E.; Randich, S.; Sacco, G. G.; Worley, C. C.; Zaggia, S.

    2017-08-01

    We present a family of self-consistent axisymmetric rotating globular cluster models which are fitted to spectroscopic data for NGC 362, NGC 1851, NGC 2808, NGC 4372, NGC 5927 and NGC 6752 to provide constraints on their physical and kinematic properties, including their rotation signals. They are constructed by flattening Modified Plummer profiles, which have the same asymptotic behaviour as classical Plummer models, but can provide better fits to young clusters due to a slower turnover in the density profile. The models are in dynamical equilibrium as they depend solely on the action variables. We employ a fully Bayesian scheme to investigate the uncertainty in our model parameters (including mass-to-light ratios and inclination angles) and evaluate the Bayesian evidence ratio for rotating to non-rotating models. We find convincing levels of rotation only in NGC 2808. In the other clusters, there is just a hint of rotation (in particular, NGC 4372 and NGC 5927), as the data quality does not allow us to draw strong conclusions. Where rotation is present, we find that it is confined to the central regions, within radii of R ≤ 2rh. As part of this work, we have developed a novel q-Gaussian basis expansion of the line-of-sight velocity distributions, from which general models can be constructed via interpolation on the basis coefficients.

  16. European wildcat populations are subdivided into five main biogeographic groups: consequences of Pleistocene climate changes or recent anthropogenic fragmentation?

    PubMed

    Mattucci, Federica; Oliveira, Rita; Lyons, Leslie A; Alves, Paulo C; Randi, Ettore

    2016-01-01

    Extant populations of the European wildcat are fragmented across the continent, the likely consequence of recent extirpations due to habitat loss and over-hunting. However, their underlying phylogeographic history has never been reconstructed. For testing the hypothesis that the European wildcat survived the Ice Age fragmented in Mediterranean refuges, we assayed the genetic variation at 31 microsatellites in 668 presumptive European wildcats sampled in 15 European countries. Moreover, to evaluate the extent of subspecies/population divergence and identify eventual wild × domestic cat hybrids, we genotyped 26 African wildcats from Sardinia and North Africa and 294 random-bred domestic cats. Results of multivariate analyses and Bayesian clustering confirmed that the European wild and the domestic cats (plus the African wildcats) belong to two well-differentiated clusters (average Ф ST = 0.159, r st = 0.392, P > 0.001; Analysis of molecular variance [AMOVA]). We identified from c. 5% to 10% cryptic hybrids in southern and central European populations. In contrast, wild-living cats in Hungary and Scotland showed deep signatures of genetic admixture and introgression with domestic cats. The European wildcats are subdivided into five main genetic clusters (average Ф ST = 0.103, r st = 0.143, P > 0.001; AMOVA) corresponding to five biogeographic groups, respectively, distributed in the Iberian Peninsula, central Europe, central Germany, Italian Peninsula and the island of Sicily, and in north-eastern Italy and northern Balkan regions (Dinaric Alps). Approximate Bayesian Computation simulations supported late Pleistocene-early Holocene population splittings (from c. 60 k to 10 k years ago), contemporary to the last Ice Age climatic changes. These results provide evidences for wildcat Mediterranean refuges in southwestern Europe, but the evolution history of eastern wildcat populations remains to be clarified. Historical genetic subdivisions suggest conservation strategies aimed at enhancing gene flow through the restoration of ecological corridors within each biogeographic units. Concomitantly, the risk of hybridization with free-ranging domestic cats along corridor edges should be carefully monitored.

  17. New seismogenic stress fields for southern Italy from a Bayesian approach

    NASA Astrophysics Data System (ADS)

    Totaro, Cristina; Orecchio, Barbara; Presti, Debora; Scolaro, Silvia; Neri, Giancarlo

    2017-04-01

    A new database of high-quality waveform inversion focal mechanism has been compiled for southern Italy by integrating the highest quality solutions, available from literature and catalogues, and 146 newly-computed ones. All the selected focal mechanisms are (i) coming from the Italian CMT, Regional CMT and TDMT catalogues (Pondrelli et al., PEPI 2006, PEPI 2011; http://www.ingv.it), or (ii) computed by using the Cut And Paste (CAP) method (Zhao & Helmberger, BSSA 1994; Zhu & Helmberger, BSSA 1996). Specific tests have been carried out in order to evaluate the robustness of the obtained solutions (e.g., by varying both seismic network configuration and Earth structure parameters) and to estimate uncertainties on the focal mechanism parameters. Only the resulting highest-quality solutions have been enclosed in the database, that has then been used for computation of posterior density distributions of stress tensor components by a Bayesian method (Arnold & Townend, GJI 2007). This algorithm furnishes the posterior density function of the principal components of stress tensor (maximum σ1, intermediate σ2, and minimum σ3 compressive stress, respectively) and the stress-magnitude ratio (R). Before stress computation, we applied the k-means clustering algorithm to subdivide the focal mechanism catalog on the basis of earthquake locations. This approach allows identifying the sectors to be investigated without any "a priori" constraint from faulting type distribution. The large amount of data and the application of the Bayesian algorithm allowed us to provide a more accurate local-to-regional scale stress distribution that has shed new light on the kinematics and dynamics of this very complex area, where lithospheric unit configuration and geodynamic engines are still strongly debated. The new high-quality information here furnished will then represent very useful tools and constraints for future geophysical analyses and geodynamic modeling.

  18. Bayesian statistical inference enhances the interpretation of contemporary randomized controlled trials.

    PubMed

    Wijeysundera, Duminda N; Austin, Peter C; Hux, Janet E; Beattie, W Scott; Laupacis, Andreas

    2009-01-01

    Randomized trials generally use "frequentist" statistics based on P-values and 95% confidence intervals. Frequentist methods have limitations that might be overcome, in part, by Bayesian inference. To illustrate these advantages, we re-analyzed randomized trials published in four general medical journals during 2004. We used Medline to identify randomized superiority trials with two parallel arms, individual-level randomization and dichotomous or time-to-event primary outcomes. Studies with P<0.05 in favor of the intervention were deemed "positive"; otherwise, they were "negative." We used several prior distributions and exact conjugate analyses to calculate Bayesian posterior probabilities for clinically relevant effects. Of 88 included studies, 39 were positive using a frequentist analysis. Although the Bayesian posterior probabilities of any benefit (relative risk or hazard ratio<1) were high in positive studies, these probabilities were lower and variable for larger benefits. The positive studies had only moderate probabilities for exceeding the effects that were assumed for calculating the sample size. By comparison, there were moderate probabilities of any benefit in negative studies. Bayesian and frequentist analyses complement each other when interpreting the results of randomized trials. Future reports of randomized trials should include both.

  19. Bayesian analyses of seasonal runoff forecasts

    NASA Astrophysics Data System (ADS)

    Krzysztofowicz, R.; Reese, S.

    1991-12-01

    Forecasts of seasonal snowmelt runoff volume provide indispensable information for rational decision making by water project operators, irrigation district managers, and farmers in the western United States. Bayesian statistical models and communication frames have been researched in order to enhance the forecast information disseminated to the users, and to characterize forecast skill from the decision maker's point of view. Four products are presented: (i) a Bayesian Processor of Forecasts, which provides a statistical filter for calibrating the forecasts, and a procedure for estimating the posterior probability distribution of the seasonal runoff; (ii) the Bayesian Correlation Score, a new measure of forecast skill, which is related monotonically to the ex ante economic value of forecasts for decision making; (iii) a statistical predictor of monthly cumulative runoffs within the snowmelt season, conditional on the total seasonal runoff forecast; and (iv) a framing of the forecast message that conveys the uncertainty associated with the forecast estimates to the users. All analyses are illustrated with numerical examples of forecasts for six gauging stations from the period 1971 1988.

  20. Dark Energy Survey Year 1 Results: Cross-Correlation Redshifts - Methods and Systematics Characterization

    DOE PAGES

    Gatti, M.

    2018-02-22

    We use numerical simulations to characterize the performance of a clustering-based method to calibrate photometric redshift biases. In particular, we cross-correlate the weak lensing (WL) source galaxies from the Dark Energy Survey Year 1 (DES Y1) sample with redMaGiC galaxies (luminous red galaxies with secure photometric red- shifts) to estimate the redshift distribution of the former sample. The recovered redshift distributions are used to calibrate the photometric redshift bias of standard photo-z methods applied to the same source galaxy sample. We also apply the method to three photo-z codes run in our simulated data: Bayesian Photometric Redshift (BPZ), Directional Neighborhoodmore » Fitting (DNF), and Random Forest-based photo-z (RF). We characterize the systematic uncertainties of our calibration procedure, and find that these systematic uncertainties dominate our error budget. The dominant systematics are due to our assumption of unevolving bias and clustering across each redshift bin, and to differences between the shapes of the redshift distributions derived by clustering vs photo-z's. The systematic uncertainty in the mean redshift bias of the source galaxy sample is z ≲ 0.02, though the precise value depends on the redshift bin under consideration. Here, we discuss possible ways to mitigate the impact of our dominant systematics in future analyses.« less

  1. Dark Energy Survey Year 1 Results: Cross-Correlation Redshifts - Methods and Systematics Characterization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gatti, M.

    We use numerical simulations to characterize the performance of a clustering-based method to calibrate photometric redshift biases. In particular, we cross-correlate the weak lensing (WL) source galaxies from the Dark Energy Survey Year 1 (DES Y1) sample with redMaGiC galaxies (luminous red galaxies with secure photometric red- shifts) to estimate the redshift distribution of the former sample. The recovered redshift distributions are used to calibrate the photometric redshift bias of standard photo-z methods applied to the same source galaxy sample. We also apply the method to three photo-z codes run in our simulated data: Bayesian Photometric Redshift (BPZ), Directional Neighborhoodmore » Fitting (DNF), and Random Forest-based photo-z (RF). We characterize the systematic uncertainties of our calibration procedure, and find that these systematic uncertainties dominate our error budget. The dominant systematics are due to our assumption of unevolving bias and clustering across each redshift bin, and to differences between the shapes of the redshift distributions derived by clustering vs photo-z's. The systematic uncertainty in the mean redshift bias of the source galaxy sample is z ≲ 0.02, though the precise value depends on the redshift bin under consideration. Here, we discuss possible ways to mitigate the impact of our dominant systematics in future analyses.« less

  2. Geographic Variation of Amyotrophic Lateral Sclerosis Incidence in New Jersey, 2009–2011

    PubMed Central

    Henry, Kevin A.; Fagliano, Jerald; Jordan, Heather M.; Rechtman, Lindsay; Kaye, Wendy E.

    2015-01-01

    Few analyses in the United States have examined geographic variation and socioeconomic disparities in amyotrophic lateral sclerosis (ALS) incidence, because of lack of population-based incidence data. In this analysis, we used population-based ALS data to identify whether ALS incidence clusters geographically and to determine whether ALS risk varies by area-based socioeconomic status (SES). This study included 493 incident ALS cases diagnosed (via El Escorial criteria) in New Jersey between 2009 and 2011. Geographic variation and clustering of ALS incidence was assessed using a spatial scan statistic and Bayesian geoadditive models. Poisson regression was used to estimate the associations between ALS risk and SES based on census-tract median income while controlling for age, sex, and race. ALS incidence varied across and within counties, but there were no statistically significant geographic clusters. SES was associated with ALS incidence. After adjustment for age, sex, and race, the relative risk of ALS was significantly higher (relative risk (RR) = 1.37, 95% confidence interval (CI): 1.02, 1.82) in the highest income quartile than in the lowest. The relative risk of ALS was significantly lower among blacks (RR = 0.57, 95% CI: 0.39, 0.83) and Asians (RR = 0.63, 95% CI: 0.41, 0.97) than among whites. Our findings suggest that ALS incidence in New Jersey appears to be associated with SES and race. PMID:26041711

  3. Patterns of Population Structure and Environmental Associations to Aridity Across the Range of Loblolly Pine (Pinus taeda L., Pinaceae)

    PubMed Central

    Eckert, Andrew J.; van Heerwaarden, Joost; Wegrzyn, Jill L.; Nelson, C. Dana; Ross-Ibarra, Jeffrey; González-Martínez, Santíago C.; Neale, David. B.

    2010-01-01

    Natural populations of forest trees exhibit striking phenotypic adaptations to diverse environmental gradients, thereby making them appealing subjects for the study of genes underlying ecologically relevant phenotypes. Here, we use a genome-wide data set of single nucleotide polymorphisms genotyped across 3059 functional genes to study patterns of population structure and identify loci associated with aridity across the natural range of loblolly pine (Pinus taeda L.). Overall patterns of population structure, as inferred using principal components and Bayesian cluster analyses, were consistent with three genetic clusters likely resulting from expansions out of Pleistocene refugia located in Mexico and Florida. A novel application of association analysis, which removes the confounding effects of shared ancestry on correlations between genetic and environmental variation, identified five loci correlated with aridity. These loci were primarily involved with abiotic stress response to temperature and drought. A unique set of 24 loci was identified as FST outliers on the basis of the genetic clusters identified previously and after accounting for expansions out of Pleistocene refugia. These loci were involved with a diversity of physiological processes. Identification of nonoverlapping sets of loci highlights the fundamental differences implicit in the use of either method and suggests a pluralistic, yet complementary, approach to the identification of genes underlying ecologically relevant phenotypes. PMID:20439779

  4. Habitat fragmentation in coastal southern California disrupts genetic connectivity in the cactus wren (Campylorhynchus brunneicapillus)

    USGS Publications Warehouse

    Barr, Kelly R.; Kus, Barbara E.; Preston, Kristine; Howell, Scarlett; Perkins, Emily; Vandergast, Amy

    2015-01-01

    Achieving long-term persistence of species in urbanized landscapes requires characterizing population genetic structure to understand and manage the effects of anthropogenic disturbance on connectivity. Urbanization over the past century in coastal southern California has caused both precipitous loss of coastal sage scrub habitat and declines in populations of the cactus wren (Campylorhynchus brunneicapillus). Using 22 microsatellite loci, we found that remnant cactus wren aggregations in coastal southern California comprised 20 populations based on strict exact tests for population differentiation, and 12 genetic clusters with hierarchical Bayesian clustering analyses. Genetic structure patterns largely mirrored underlying habitat availability, with cluster and population boundaries coinciding with fragmentation caused primarily by urbanization. Using a habitat model we developed, we detected stronger associations between habitat-based distances and genetic distances than Euclidean geographic distance. Within populations, we detected a positive association between available local habitat and allelic richness and a negative association with relatedness. Isolation-by-distance patterns varied over the study area, which we attribute to temporal differences in anthropogenic landscape development. We also found that genetic bottleneck signals were associated with wildfire frequency. These results indicate that habitat fragmentation and alterations have reduced genetic connectivity and diversity of cactus wren populations in coastal southern California. Management efforts focused on improving connectivity among remaining populations may help to ensure population persistence.

  5. A Bayesian approach to meta-analysis of plant pathology studies.

    PubMed

    Mila, A L; Ngugi, H K

    2011-01-01

    Bayesian statistical methods are used for meta-analysis in many disciplines, including medicine, molecular biology, and engineering, but have not yet been applied for quantitative synthesis of plant pathology studies. In this paper, we illustrate the key concepts of Bayesian statistics and outline the differences between Bayesian and classical (frequentist) methods in the way parameters describing population attributes are considered. We then describe a Bayesian approach to meta-analysis and present a plant pathological example based on studies evaluating the efficacy of plant protection products that induce systemic acquired resistance for the management of fire blight of apple. In a simple random-effects model assuming a normal distribution of effect sizes and no prior information (i.e., a noninformative prior), the results of the Bayesian meta-analysis are similar to those obtained with classical methods. Implementing the same model with a Student's t distribution and a noninformative prior for the effect sizes, instead of a normal distribution, yields similar results for all but acibenzolar-S-methyl (Actigard) which was evaluated only in seven studies in this example. Whereas both the classical (P = 0.28) and the Bayesian analysis with a noninformative prior (95% credibility interval [CRI] for the log response ratio: -0.63 to 0.08) indicate a nonsignificant effect for Actigard, specifying a t distribution resulted in a significant, albeit variable, effect for this product (CRI: -0.73 to -0.10). These results confirm the sensitivity of the analytical outcome (i.e., the posterior distribution) to the choice of prior in Bayesian meta-analyses involving a limited number of studies. We review some pertinent literature on more advanced topics, including modeling of among-study heterogeneity, publication bias, analyses involving a limited number of studies, and methods for dealing with missing data, and show how these issues can be approached in a Bayesian framework. Bayesian meta-analysis can readily include information not easily incorporated in classical methods, and allow for a full evaluation of competing models. Given the power and flexibility of Bayesian methods, we expect them to become widely adopted for meta-analysis of plant pathology studies.

  6. Travelling to the south: Phylogeographic spatial diffusion model in Monttea aphylla (Plantaginaceae), an endemic plant of the Monte Desert

    PubMed Central

    Cosacov, Andrea; Ferreiro, Gabriela; Johnson, Leigh A.; Sérsic, Alicia N.

    2017-01-01

    Effects of Pleistocene climatic oscillations on plant phylogeographic patterns are relatively well studied in forest, savanna and grassland biomes, but such impacts remain less explored on desert regions of the world, especially in South America. Here, we performed a phylogeographical study of Monttea aphylla, an endemic species of the Monte Desert, to understand the evolutionary history of vegetation communities inhabiting the South American Arid Diagonal. We obtained sequences of three chloroplast (trnS–trnfM, trnH–psbA and trnQ–rps16) and one nuclear (ITS) intergenic spacers from 272 individuals of 34 localities throughout the range of the species. Population genetic and Bayesian coalescent analyses were performed to infer genealogical relationships among haplotypes, population genetic structure, and demographic history of the study species. Timing of demographic events was inferred using Bayesian Skyline Plot and the spatio-temporal patterns of lineage diversification was reconstructed using Bayesian relaxed diffusion models. Palaeo-distribution models (PDM) were performed through three different timescales to validate phylogeographical patterns. Twenty-five and 22 haplotypes were identified in the cpDNA and nDNA data, respectively. that clustered into two main genealogical lineages following a latitudinal pattern, the northern and the southern Monte (south of 35° S). The northern Monte showed two lineages of high genetic structure, and more relative stable demography than the southern Monte that retrieved three groups with little phylogenetic structure and a strong signal of demographic expansion that would have started during the Last Interglacial period (ca. 120 Ka). The PDM and diffusion models analyses agreed in the southeast direction of the range expansion. Differential effect of climatic oscillations across the Monte phytogeographic province was observed in Monttea aphylla lineages. In northern Monte, greater genetic structure and more relative stable demography resulted from a more stable climate than in the southern Monte. Pleistocene glaciations drastically decreased the species area in the southern Monte, which expanded in a southeastern direction to the new available areas during the interglacial periods. PMID:28582433

  7. Bayesian inference for psychology. Part II: Example applications with JASP.

    PubMed

    Wagenmakers, Eric-Jan; Love, Jonathon; Marsman, Maarten; Jamil, Tahira; Ly, Alexander; Verhagen, Josine; Selker, Ravi; Gronau, Quentin F; Dropmann, Damian; Boutin, Bruno; Meerhoff, Frans; Knight, Patrick; Raj, Akash; van Kesteren, Erik-Jan; van Doorn, Johnny; Šmíra, Martin; Epskamp, Sacha; Etz, Alexander; Matzke, Dora; de Jong, Tim; van den Bergh, Don; Sarafoglou, Alexandra; Steingroever, Helen; Derks, Koen; Rouder, Jeffrey N; Morey, Richard D

    2018-02-01

    Bayesian hypothesis testing presents an attractive alternative to p value hypothesis testing. Part I of this series outlined several advantages of Bayesian hypothesis testing, including the ability to quantify evidence and the ability to monitor and update this evidence as data come in, without the need to know the intention with which the data were collected. Despite these and other practical advantages, Bayesian hypothesis tests are still reported relatively rarely. An important impediment to the widespread adoption of Bayesian tests is arguably the lack of user-friendly software for the run-of-the-mill statistical problems that confront psychologists for the analysis of almost every experiment: the t-test, ANOVA, correlation, regression, and contingency tables. In Part II of this series we introduce JASP ( http://www.jasp-stats.org ), an open-source, cross-platform, user-friendly graphical software package that allows users to carry out Bayesian hypothesis tests for standard statistical problems. JASP is based in part on the Bayesian analyses implemented in Morey and Rouder's BayesFactor package for R. Armed with JASP, the practical advantages of Bayesian hypothesis testing are only a mouse click away.

  8. A default Bayesian hypothesis test for mediation.

    PubMed

    Nuijten, Michèle B; Wetzels, Ruud; Matzke, Dora; Dolan, Conor V; Wagenmakers, Eric-Jan

    2015-03-01

    In order to quantify the relationship between multiple variables, researchers often carry out a mediation analysis. In such an analysis, a mediator (e.g., knowledge of a healthy diet) transmits the effect from an independent variable (e.g., classroom instruction on a healthy diet) to a dependent variable (e.g., consumption of fruits and vegetables). Almost all mediation analyses in psychology use frequentist estimation and hypothesis-testing techniques. A recent exception is Yuan and MacKinnon (Psychological Methods, 14, 301-322, 2009), who outlined a Bayesian parameter estimation procedure for mediation analysis. Here we complete the Bayesian alternative to frequentist mediation analysis by specifying a default Bayesian hypothesis test based on the Jeffreys-Zellner-Siow approach. We further extend this default Bayesian test by allowing a comparison to directional or one-sided alternatives, using Markov chain Monte Carlo techniques implemented in JAGS. All Bayesian tests are implemented in the R package BayesMed (Nuijten, Wetzels, Matzke, Dolan, & Wagenmakers, 2014).

  9. Bayesian Analysis of Silica Exposure and Lung Cancer Using Human and Animal Studies.

    PubMed

    Bartell, Scott M; Hamra, Ghassan Badri; Steenland, Kyle

    2017-03-01

    Bayesian methods can be used to incorporate external information into epidemiologic exposure-response analyses of silica and lung cancer. We used data from a pooled mortality analysis of silica and lung cancer (n = 65,980), using untransformed and log-transformed cumulative exposure. Animal data came from chronic silica inhalation studies using rats. We conducted Bayesian analyses with informative priors based on the animal data and different cross-species extrapolation factors. We also conducted analyses with exposure measurement error corrections in the absence of a gold standard, assuming Berkson-type error that increased with increasing exposure. The pooled animal data exposure-response coefficient was markedly higher (log exposure) or lower (untransformed exposure) than the coefficient for the pooled human data. With 10-fold uncertainty, the animal prior had little effect on results for pooled analyses and only modest effects in some individual studies. One-fold uncertainty produced markedly different results for both pooled and individual studies. Measurement error correction had little effect in pooled analyses using log exposure. Using untransformed exposure, measurement error correction caused a 5% decrease in the exposure-response coefficient for the pooled analysis and marked changes in some individual studies. The animal prior had more impact for smaller human studies and for one-fold versus three- or 10-fold uncertainty. Adjustment for Berkson error using Bayesian methods had little effect on the exposure-response coefficient when exposure was log transformed or when the sample size was large. See video abstract at, http://links.lww.com/EDE/B160.

  10. Spatial analysis of county-based gonorrhoea incidence in mainland China, from 2004 to 2009.

    PubMed

    Yin, Fei; Feng, Zijian; Li, Xiaosong

    2012-07-01

    Gonorrhoea is one of the most common sexually transmissible infections in mainland China. Effective spatial monitoring of gonorrhoea incidence is important for successful implementation of control and prevention programs. The county-level gonorrhoea incidence rates for all of mainland China was monitored through examining spatial patterns. County-level data on gonorrhoea cases between 2004 and 2009 were obtained from the China Information System for Disease Control and Prevention. Bayesian smoothing and exploratory spatial data analysis (ESDA) methods were used to characterise the spatial distribution pattern of gonorrhoea cases. During the 6-year study period, the average annual gonorrhoea incidence was 12.41 cases per 100000 people. Using empirical Bayes smoothed rates, the local Moran test identified one significant single-centre cluster and two significant multi-centre clusters of high gonorrhoea risk (all P-values <0.01). Bayesian smoothing and ESDA methods can assist public health officials in using gonorrhoea surveillance data to identify high risk areas. Allocating more resources to such areas could effectively reduce gonorrhoea incidence.

  11. Optical characterization limits of nanoparticle aggregates at different wavelengths using approximate Bayesian computation

    NASA Astrophysics Data System (ADS)

    Eriçok, Ozan Burak; Ertürk, Hakan

    2018-07-01

    Optical characterization of nanoparticle aggregates is a complex inverse problem that can be solved by deterministic or statistical methods. Previous studies showed that there exists a different lower size limit of reliable characterization, corresponding to the wavelength of light source used. In this study, these characterization limits are determined considering a light source wavelength range changing from ultraviolet to near infrared (266-1064 nm) relying on numerical light scattering experiments. Two different measurement ensembles are considered. Collection of well separated aggregates made up of same sized particles and that of having particle size distribution. Filippov's cluster-cluster algorithm is used to generate the aggregates and the light scattering behavior is calculated by discrete dipole approximation. A likelihood-free Approximate Bayesian Computation, relying on Adaptive Population Monte Carlo method, is used for characterization. It is found that when the wavelength range of 266-1064 nm is used, successful characterization limit changes from 21-62 nm effective radius for monodisperse and polydisperse soot aggregates.

  12. Genetic variation and factors affecting the genetic structure of the lichenicolous fungus Heterocephalacria bachmannii (Filobasidiales, Basidiomycota)

    PubMed Central

    Laakso, Into; Stenroos, Soili

    2017-01-01

    Heterocephalacria bachmannii is a lichenicolous fungus that takes as hosts numerous lichen species of the genus Cladonia. In the present study we analyze whether the geographical distance, the host species or the host secondary metabolites determine the genetic structure of this parasite. To address the question, populations mainly from the Southern Europe, Southern Finland and the Azores were sampled. The specimens were collected from 20 different host species representing ten chemotypes. Three loci, ITS rDNA, LSU rDNA and mtSSU, were sequenced. The genetic structure was assessed by AMOVA, redundance analyses and Bayesian clustering methods. The results indicated that the host species and the host secondary metabolites are the most influential factors over the genetic structure of this lichenicolous fungus. In addition, the genetic structure of H. bachmannii was compared with that of one of its hosts, Cladonia rangiformis. The population structure of parasite and host were discordant. The contents in phenolic compounds and fatty acids of C. rangiformis were quantified in order to test whether it had some influence on the genetic structure of the species. But no correlation was found with the genetic clusters of H. bachmannii. PMID:29253026

  13. Phylogenetic relationships of the dwarf boas and a comparison of Bayesian and bootstrap measures of phylogenetic support.

    PubMed

    Wilcox, Thomas P; Zwickl, Derrick J; Heath, Tracy A; Hillis, David M

    2002-11-01

    Four New World genera of dwarf boas (Exiliboa, Trachyboa, Tropidophis, and Ungaliophis) have been placed by many systematists in a single group (traditionally called Tropidophiidae). However, the monophyly of this group has been questioned in several studies. Moreover, the overall relationships among basal snake lineages, including the placement of the dwarf boas, are poorly understood. We obtained mtDNA sequence data for 12S, 16S, and intervening tRNA-val genes from 23 species of snakes representing most major snake lineages, including all four genera of New World dwarf boas. We then examined the phylogenetic position of these species by estimating the phylogeny of the basal snakes. Our phylogenetic analysis suggests that New World dwarf boas are not monophyletic. Instead, we find Exiliboa and Ungaliophis to be most closely related to sand boas (Erycinae), boas (Boinae), and advanced snakes (Caenophidea), whereas Tropidophis and Trachyboa form an independent clade that separated relatively early in snake radiation. Our estimate of snake phylogeny differs significantly in other ways from some previous estimates of snake phylogeny. For instance, pythons do not cluster with boas and sand boas, but instead show a strong relationship with Loxocemus and Xenopeltis. Additionally, uropeltids cluster strongly with Cylindrophis, and together are embedded in what has previously been considered the macrostomatan radiation. These relationships are supported by both bootstrapping (parametric and nonparametric approaches) and Bayesian analysis, although Bayesian support values are consistently higher than those obtained from nonparametric bootstrapping. Simulations show that Bayesian support values represent much better estimates of phylogenetic accuracy than do nonparametric bootstrap support values, at least under the conditions of our study. Copyright 2002 Elsevier Science (USA)

  14. Bayesian Unimodal Density Regression for Causal Inference

    ERIC Educational Resources Information Center

    Karabatsos, George; Walker, Stephen G.

    2011-01-01

    Karabatsos and Walker (2011) introduced a new Bayesian nonparametric (BNP) regression model. Through analyses of real and simulated data, they showed that the BNP regression model outperforms other parametric and nonparametric regression models of common use, in terms of predictive accuracy of the outcome (dependent) variable. The other,…

  15. Bayesian and Frequentist Methods for Estimating Joint Uncertainty of Freundlich Adsorption Isotherm Fitting Parameters

    EPA Science Inventory

    In this paper, we present methods for estimating Freundlich isotherm fitting parameters (K and N) and their joint uncertainty, which have been implemented into the freeware software platforms R and WinBUGS. These estimates were determined by both Frequentist and Bayesian analyse...

  16. Editorial: Bayesian benefits for child psychology and psychiatry researchers.

    PubMed

    Oldehinkel, Albertine J

    2016-09-01

    For many scientists, performing statistical tests has become an almost automated routine. However, p-values are frequently used and interpreted incorrectly; and even when used appropriately, p-values tend to provide answers that do not match researchers' questions and hypotheses well. Bayesian statistics present an elegant and often more suitable alternative. The Bayesian approach has rarely been applied in child psychology and psychiatry research so far, but the development of user-friendly software packages and tutorials has placed it well within reach now. Because Bayesian analyses require a more refined definition of hypothesized probabilities of possible outcomes than the classical approach, going Bayesian may offer the additional benefit of sparkling the development and refinement of theoretical models in our field. © 2016 Association for Child and Adolescent Mental Health.

  17. Genetic relatedness among indigenous rice varieties in the Eastern Himalayan region based on nucleotide sequences of the Waxy gene.

    PubMed

    Choudhury, Baharul I; Khan, Mohammed L; Dayanandan, Selvadurai

    2014-12-29

    Indigenous rice varieties in the Eastern Himalayan region of Northeast India are traditionally classified into sali, boro and jum ecotypes based on geographical locality and the season of cultivation. In this study, we used DNA sequence data from the Waxy (Wx) gene to infer the genetic relatedness among indigenous rice varieties in Northeast India and to assess the genetic distinctiveness of ecotypes. The results of all three analyses (Bayesian, Maximum Parsimony and Neighbor Joining) were congruent and revealed two genetically distinct clusters of rice varieties in the region. The large group comprised several varieties of sali and boro ecotypes, and all agronomically improved varieties. The small group consisted of only traditionally cultivated indigenous rice varieties, which included one boro, few sali and all jum varieties. The fixation index analysis revealed a very low level of differentiation between sali and boro (F(ST) = 0.005), moderate differentiation between sali and jum (F(ST) = 0.108) and high differentiation between jum and boro (F(ST) = 0.230) ecotypes. The genetic relatedness analyses revealed that sali, boro and jum ecotypes are genetically heterogeneous, and the current classification based on cultivation type is not congruent with the genetic background of rice varieties. Indigenous rice varieties chosen from genetically distinct clusters could be used in breeding programs to improve genetic gain through heterosis, while maintaining high genetic diversity.

  18. Quantile regression and Bayesian cluster detection to identify radon prone areas.

    PubMed

    Sarra, Annalina; Fontanella, Lara; Valentini, Pasquale; Palermi, Sergio

    2016-11-01

    Albeit the dominant source of radon in indoor environments is the geology of the territory, many studies have demonstrated that indoor radon concentrations also depend on dwelling-specific characteristics. Following a stepwise analysis, in this study we propose a combined approach to delineate radon prone areas. We first investigate the impact of various building covariates on indoor radon concentrations. To achieve a more complete picture of this association, we exploit the flexible formulation of a Bayesian spatial quantile regression, which is also equipped with parameters that controls the spatial dependence across data. The quantitative knowledge of the influence of each significant building-specific factor on the measured radon levels is employed to predict the radon concentrations that would have been found if the sampled buildings had possessed standard characteristics. Those normalised radon measures should reflect the geogenic radon potential of the underlying ground, which is a quantity directly related to the geological environment. The second stage of the analysis is aimed at identifying radon prone areas, and to this end, we adopt a Bayesian model for spatial cluster detection using as reference unit the building with standard characteristics. The case study is based on a data set of more than 2000 indoor radon measures, available for the Abruzzo region (Central Italy) and collected by the Agency of Environmental Protection of Abruzzo, during several indoor radon monitoring surveys. Copyright © 2016 Elsevier Ltd. All rights reserved.

  19. Stepwise and stagewise approaches for spatial cluster detection

    PubMed Central

    Xu, Jiale

    2016-01-01

    Spatial cluster detection is an important tool in many areas such as sociology, botany and public health. Previous work has mostly taken either hypothesis testing framework or Bayesian framework. In this paper, we propose a few approaches under a frequentist variable selection framework for spatial cluster detection. The forward stepwise methods search for multiple clusters by iteratively adding currently most likely cluster while adjusting for the effects of previously identified clusters. The stagewise methods also consist of a series of steps, but with tiny step size in each iteration. We study the features and performances of our proposed methods using simulations on idealized grids or real geographic area. From the simulations, we compare the performance of the proposed methods in terms of estimation accuracy and power of detections. These methods are applied to the the well-known New York leukemia data as well as Indiana poverty data. PMID:27246273

  20. Stepwise and stagewise approaches for spatial cluster detection.

    PubMed

    Xu, Jiale; Gangnon, Ronald E

    2016-05-01

    Spatial cluster detection is an important tool in many areas such as sociology, botany and public health. Previous work has mostly taken either a hypothesis testing framework or a Bayesian framework. In this paper, we propose a few approaches under a frequentist variable selection framework for spatial cluster detection. The forward stepwise methods search for multiple clusters by iteratively adding currently most likely cluster while adjusting for the effects of previously identified clusters. The stagewise methods also consist of a series of steps, but with a tiny step size in each iteration. We study the features and performances of our proposed methods using simulations on idealized grids or real geographic areas. From the simulations, we compare the performance of the proposed methods in terms of estimation accuracy and power. These methods are applied to the the well-known New York leukemia data as well as Indiana poverty data. Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. Genetic Structure in a Small Pelagic Fish Coincides with a Marine Protected Area: Seascape Genetics in Patagonian Fjords.

    PubMed

    Canales-Aguirre, Cristian B; Ferrada-Fuentes, Sandra; Galleguillos, Ricardo; Hernández, Cristián E

    2016-01-01

    Marine environmental variables can play an important role in promoting population genetic differentiation in marine organisms. Although fjord ecosystems have attracted much attention due to the great oscillation of environmental variables that produce heterogeneous habitats, species inhabiting this kind of ecosystem have received less attention. In this study, we used Sprattus fuegensis, a small pelagic species that populates the inner waters of the continental shelf, channels and fjords of Chilean Patagonia and Argentina, as a model species to test whether environmental variables of fjords relate to population genetic structure. A total of 282 individuals were analyzed from Chilean Patagonia with eight microsatellite loci. Bayesian and non-Bayesian analyses were conducted to describe the genetic variability of S. fuegensis and whether it shows spatial genetic structure. Results showed two well-differentiated genetic clusters along the Chilean Patagonia distribution (i.e. inside the embayment area called TicToc, and the rest of the fjords), but no spatial isolation by distance (IBD) pattern was found with a Mantel test analysis. Temperature and nitrate were correlated to the expected heterozygosities and explained the allelic frequency variation of data in the redundancy analyses. These results suggest that the singular genetic differences found in S. fuegensis from inside TicToc Bay (East of the Corcovado Gulf) are the result of larvae retention bya combination of oceanographic mesoscale processes (i.e. the west wind drift current reaches the continental shelf exactly in this zone), and the local geographical configuration (i.e. embayment area, islands, archipelagos). We propose that these features generated an isolated area in the Patagonian fjords that promoted genetic differentiation by drift and a singular biodiversity, adding support to the existence of the largest marine protected area (MPA) of continental Chile, which is the Tic-Toc MPA.

  2. Homogenous Population Genetic Structure of the Non-Native Raccoon Dog (Nyctereutes procyonoides) in Europe as a Result of Rapid Population Expansion

    PubMed Central

    Drygala, Frank; Korablev, Nikolay; Ansorge, Hermann; Fickel, Joerns; Isomursu, Marja; Elmeros, Morten; Kowalczyk, Rafał; Baltrunaite, Laima; Balciauskas, Linas; Saarma, Urmas; Schulze, Christoph; Borkenhagen, Peter; Frantz, Alain C.

    2016-01-01

    The extent of gene flow during the range expansion of non-native species influences the amount of genetic diversity retained in expanding populations. Here, we analyse the population genetic structure of the raccoon dog (Nyctereutes procyonoides) in north-eastern and central Europe. This invasive species is of management concern because it is highly susceptible to fox rabies and an important secondary host of the virus. We hypothesized that the large number of introduced animals and the species’ dispersal capabilities led to high population connectivity and maintenance of genetic diversity throughout the invaded range. We genotyped 332 tissue samples from seven European countries using 16 microsatellite loci. Different algorithms identified three genetic clusters corresponding to Finland, Denmark and a large ‘central’ population that reached from introduction areas in western Russia to northern Germany. Cluster assignments provided evidence of long-distance dispersal. The results of an Approximate Bayesian Computation analysis supported a scenario of equal effective population sizes among different pre-defined populations in the large central cluster. Our results are in line with strong gene flow and secondary admixture between neighbouring demes leading to reduced genetic structuring, probably a result of its fairly rapid population expansion after introduction. The results presented here are remarkable in the sense that we identified a homogenous genetic cluster inhabiting an area stretching over more than 1500km. They are also relevant for disease management, as in the event of a significant rabies outbreak, there is a great risk of a rapid virus spread among raccoon dog populations. PMID:27064784

  3. As-built design specification for proportion estimate software subsystem

    NASA Technical Reports Server (NTRS)

    Obrien, S. (Principal Investigator)

    1980-01-01

    The Proportion Estimate Processor evaluates four estimation techniques in order to get an improved estimate of the proportion of a scene that is planted in a selected crop. The four techniques to be evaluated were provided by the techniques development section and are: (1) random sampling; (2) proportional allocation, relative count estimate; (3) proportional allocation, Bayesian estimate; and (4) sequential Bayesian allocation. The user is given two options for computation of the estimated mean square error. These are referred to as the cluster calculation option and the segment calculation option. The software for the Proportion Estimate Processor is operational on the IBM 3031 computer.

  4. Genetic variation and differentiation of bison (Bison bison) subspecies and cattle (Bos taurus) breeds and subspecies.

    PubMed

    Cronin, Matthew A; MacNeil, Michael D; Vu, Ninh; Leesburg, Vicki; Blackburn, Harvey D; Derr, James N

    2013-01-01

    The genetic relationship of American plains bison (Bison bison bison) and wood bison (Bison bison athabascae) was quantified and compared with that among breeds and subspecies of cattle. Plains bison from 9 herds (N = 136), wood bison from 3 herds (N = 65), taurine cattle (Bos taurus taurus) from 14 breeds (N = 244), and indicine cattle (Bos taurus indicus) from 2 breeds (N = 53) were genotyped for 29 polymorphic microsatellite loci. Bayesian cluster analyses indicate 3 groups, 2 of which are plains bison and 1 of which is wood bison with some admixture, and genetic distances do not show plains bison and wood bison as distinct groups. Differentiation of wood bison and plains bison is also significantly less than that of cattle breeds and subspecies. These and other genetic data and historical interbreeding of bison do not support recognition of extant plains bison and wood bison as phylogenetically distinct subspecies.

  5. Characterization of an Avipoxvirus From a Bald Eagle ( Haliaeetus leucocephalus ) Using Novel Consensus PCR Protocols for the rpo147 and DNA-Dependent DNA Polymerase Genes.

    PubMed

    Stephen, Alexa A; Leone, Angelique M; Toplon, David E; Archer, Linda L; Wellehan, James F X

    2016-12-01

    A juvenile female bald eagle ( Haliaeetus leucocephalus ) was presented with emaciation and proliferative periocular lesions. The eagle did not respond to supportive therapy and was euthanatized. Histopathologic examination of the skin lesions revealed plaques of marked epidermal hyperplasia parakeratosis, marked acanthosis and spongiosis, and eosinophilic intracytoplasmic inclusion bodies. Novel polymerase chain reaction (PCR) assays were done to amplify and sequence DNA polymerase and rpo147 genes. The 4b gene was also analyzed by a previously developed assay. Bayesian and maximum likelihood phylogenetic analyses of the obtained sequences found it to be poxvirus of the genus Avipoxvirus and clustered with other raptor isolates. Better phylogenetic resolution was found in rpo147 rather than the commonly used DNA polymerase. The novel consensus rpo147 PCR assay will create more accurate phylogenic trees and allow better insight into poxvirus history.

  6. Origin of microbial biomineralization and magnetotaxis during the Archean.

    PubMed

    Lin, Wei; Paterson, Greig A; Zhu, Qiyun; Wang, Yinzhao; Kopylova, Evguenia; Li, Ying; Knight, Rob; Bazylinski, Dennis A; Zhu, Rixiang; Kirschvink, Joseph L; Pan, Yongxin

    2017-02-28

    Microbes that synthesize minerals, a process known as microbial biomineralization, contributed substantially to the evolution of current planetary environments through numerous important geochemical processes. Despite its geological significance, the origin and evolution of microbial biomineralization remain poorly understood. Through combined metagenomic and phylogenetic analyses of deep-branching magnetotactic bacteria from the Nitrospirae phylum, and using a Bayesian molecular clock-dating method, we show here that the gene cluster responsible for biomineralization of magnetosomes, and the arrangement of magnetosome chain(s) within cells, both originated before or near the Archean divergence between the Nitrospirae and Proteobacteria This phylogenetic divergence occurred well before the Great Oxygenation Event. Magnetotaxis likely evolved due to environmental pressures conferring an evolutionary advantage to navigation via the geomagnetic field. Earth's dynamo must therefore have been sufficiently strong to sustain microbial magnetotaxis in the Archean, suggesting that magnetotaxis coevolved with the geodynamo over geological time.

  7. Bayesian approach for counting experiment statistics applied to a neutrino point source analysis

    NASA Astrophysics Data System (ADS)

    Bose, D.; Brayeur, L.; Casier, M.; de Vries, K. D.; Golup, G.; van Eijndhoven, N.

    2013-12-01

    In this paper we present a model independent analysis method following Bayesian statistics to analyse data from a generic counting experiment and apply it to the search for neutrinos from point sources. We discuss a test statistic defined following a Bayesian framework that will be used in the search for a signal. In case no signal is found, we derive an upper limit without the introduction of approximations. The Bayesian approach allows us to obtain the full probability density function for both the background and the signal rate. As such, we have direct access to any signal upper limit. The upper limit derivation directly compares with a frequentist approach and is robust in the case of low-counting observations. Furthermore, it allows also to account for previous upper limits obtained by other analyses via the concept of prior information without the need of the ad hoc application of trial factors. To investigate the validity of the presented Bayesian approach, we have applied this method to the public IceCube 40-string configuration data for 10 nearby blazars and we have obtained a flux upper limit, which is in agreement with the upper limits determined via a frequentist approach. Furthermore, the upper limit obtained compares well with the previously published result of IceCube, using the same data set.

  8. Cryptic genetic diversity, population structure, and gene flow in the Mojave rattlesnake (Crotalus scutulatus).

    PubMed

    Schield, Drew R; Adams, Richard H; Card, Daren C; Corbin, Andrew B; Jezkova, Tereza; Hales, Nicole R; Meik, Jesse M; Perry, Blair W; Spencer, Carol L; Smith, Lydia L; García, Gustavo Campillo; Bouzid, Nassima M; Strickland, Jason L; Parkinson, Christopher L; Borja, Miguel; Castañeda-Gaytán, Gamaliel; Bryson, Robert W; Flores-Villela, Oscar A; Mackessy, Stephen P; Castoe, Todd A

    2018-06-15

    The Mojave rattlesnake (Crotalus scutulatus) inhabits deserts and arid grasslands of the western United States and Mexico. Despite considerable interest in its highly toxic venom and the recognition of two subspecies, no molecular studies have characterized range-wide genetic diversity and population structure or tested species limits within C. scutulatus. We used mitochondrial DNA and thousands of nuclear loci from double-digest restriction site associated DNA sequencing to infer population genetic structure throughout the range of C. scutulatus, and to evaluate divergence times and gene flow between populations. We find strong support for several divergent mitochondrial and nuclear clades of C. scutulatus, including splits coincident with two major phylogeographic barriers: the Continental Divide and the elevational increase associated with the Central Mexican Plateau. We apply Bayesian clustering, phylogenetic inference, and coalescent-based species delimitation to our nuclear genetic data to test hypotheses of population structure. We also performed demographic analyses to test hypotheses relating to population divergence and gene flow. Collectively, our results support the existence of four distinct lineages within C. scutulatus, and genetically defined populations do not correspond with currently recognized subspecies ranges. Finally, we use approximate Bayesian computation to test hypotheses of divergence among multiple rattlesnake species groups distributed across the Continental Divide, and find evidence for co-divergence at this boundary during the mid-Pleistocene. Copyright © 2018 Elsevier Inc. All rights reserved.

  9. Bayesian investigation of isochrone consistency using the old open cluster NGC 188

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hills, Shane; Courteau, Stéphane; Von Hippel, Ted

    2015-03-01

    This paper provides a detailed comparison of the differences in parameters derived for a star cluster from its color–magnitude diagrams (CMDs) depending on the filters and models used. We examine the consistency and reliability of fitting three widely used stellar evolution models to 15 combinations of optical and near-IR photometry for the old open cluster NGC 188. The optical filter response curves match those of theoretical systems and are thus not the source of fit inconsistencies. NGC 188 is ideally suited to this study thanks to a wide variety of high-quality photometry and available proper motions and radial velocities thatmore » enable us to remove non-cluster members and many binaries. Our Bayesian fitting technique yields inferred values of age, metallicity, distance modulus, and absorption as a function of the photometric band combinations and stellar models. We show that the historically favored three-band combinations of UBV and VRI can be meaningfully inconsistent with each other and with longer baseline data sets such as UBVRIJHK{sub S}. Differences among model sets can also be substantial. For instance, fitting Yi et al. (2001) and Dotter et al. (2008) models to UBVRIJHK{sub S} photometry for NGC 188 yields the following cluster parameters: age = (5.78 ± 0.03, 6.45 ± 0.04) Gyr, [Fe/H] = (+0.125 ± 0.003, −0.077 ± 0.003) dex, (m−M){sub V} = (11.441 ± 0.007, 11.525 ± 0.005) mag, and A{sub V} = (0.162 ± 0.003, 0.236 ± 0.003) mag, respectively. Within the formal fitting errors, these two fits are substantially and statistically different. Such differences among fits using different filters and models are a cautionary tale regarding our current ability to fit star cluster CMDs. Additional modeling of this kind, with more models and star clusters, and future Gaia parallaxes are critical for isolating and quantifying the most relevant uncertainties in stellar evolutionary models.« less

  10. Phylodynamic Analysis Reveals CRF01_AE Dissemination between Japan and Neighboring Asian Countries and the Role of Intravenous Drug Use in Transmission

    PubMed Central

    Shiino, Teiichiro; Hattori, Junko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru

    2014-01-01

    Background One major circulating HIV-1 subtype in Southeast Asian countries is CRF01_AE, but little is known about its epidemiology in Japan. We conducted a molecular phylodynamic study of patients newly diagnosed with CRF01_AE from 2003 to 2010. Methods Plasma samples from patients registered in Japanese Drug Resistance HIV-1 Surveillance Network were analyzed for protease-reverse transcriptase sequences; all sequences undergo subtyping and phylogenetic analysis using distance-matrix-based, maximum likelihood and Bayesian coalescent Markov Chain Monte Carlo (MCMC) phylogenetic inferences. Transmission clusters were identified using interior branch test and depth-first searches for sub-tree partitions. Times of most recent common ancestor (tMRCAs) of significant clusters were estimated using Bayesian MCMC analysis. Results Among 3618 patient registered in our network, 243 were infected with CRF01_AE. The majority of individuals with CRF01_AE were Japanese, predominantly male, and reported heterosexual contact as their risk factor. We found 5 large clusters with ≥5 members and 25 small clusters consisting of pairs of individuals with highly related CRF01_AE strains. The earliest cluster showed a tMRCA of 1996, and consisted of individuals with their known risk as heterosexual contacts. The other four large clusters showed later tMRCAs between 2000 and 2002 with members including intravenous drug users (IVDU) and non-Japanese, but not men who have sex with men (MSM). In contrast, small clusters included a high frequency of individuals reporting MSM risk factors. Phylogenetic analysis also showed that some individuals infected with HIV strains spread in East and South-eastern Asian countries. Conclusions Introduction of CRF01_AE viruses into Japan is estimated to have occurred in the 1990s. CFR01_AE spread via heterosexual behavior, then among persons connected with non-Japanese, IVDU, and MSM. Phylogenetic analysis demonstrated that some viral variants are largely restricted to Japan, while others have a broad geographic distribution. PMID:25025900

  11. Genetic Population Structure Analysis in New Hampshire Reveals Eastern European Ancestry

    PubMed Central

    Sloan, Chantel D.; Andrew, Angeline D.; Duell, Eric J.; Williams, Scott M.; Karagas, Margaret R.; Moore, Jason H.

    2009-01-01

    Genetic structure due to ancestry has been well documented among many divergent human populations. However, the ability to associate ancestry with genetic substructure without using supervised clustering has not been explored in more presumably homogeneous and admixed US populations. The goal of this study was to determine if genetic structure could be detected in a United States population from a single state where the individuals have mixed European ancestry. Using Bayesian clustering with a set of 960 single nucleotide polymorphisms (SNPs) we found evidence of population stratification in 864 individuals from New Hampshire that can be used to differentiate the population into six distinct genetic subgroups. We then correlated self-reported ancestry of the individuals with the Bayesian clustering results. Finnish and Russian/Polish/Lithuanian ancestries were most notably found to be associated with genetic substructure. The ancestral results were further explained and substantiated using New Hampshire census data from 1870 to 1930 when the largest waves of European immigrants came to the area. We also discerned distinct patterns of linkage disequilibrium (LD) between the genetic groups in the growth hormone receptor gene (GHR). To our knowledge, this is the first time such an investigation has uncovered a strong link between genetic structure and ancestry in what would otherwise be considered a homogenous US population. PMID:19738909

  12. Genetic population structure analysis in New Hampshire reveals Eastern European ancestry.

    PubMed

    Sloan, Chantel D; Andrew, Angeline D; Duell, Eric J; Williams, Scott M; Karagas, Margaret R; Moore, Jason H

    2009-09-07

    Genetic structure due to ancestry has been well documented among many divergent human populations. However, the ability to associate ancestry with genetic substructure without using supervised clustering has not been explored in more presumably homogeneous and admixed US populations. The goal of this study was to determine if genetic structure could be detected in a United States population from a single state where the individuals have mixed European ancestry. Using Bayesian clustering with a set of 960 single nucleotide polymorphisms (SNPs) we found evidence of population stratification in 864 individuals from New Hampshire that can be used to differentiate the population into six distinct genetic subgroups. We then correlated self-reported ancestry of the individuals with the Bayesian clustering results. Finnish and Russian/Polish/Lithuanian ancestries were most notably found to be associated with genetic substructure. The ancestral results were further explained and substantiated using New Hampshire census data from 1870 to 1930 when the largest waves of European immigrants came to the area. We also discerned distinct patterns of linkage disequilibrium (LD) between the genetic groups in the growth hormone receptor gene (GHR). To our knowledge, this is the first time such an investigation has uncovered a strong link between genetic structure and ancestry in what would otherwise be considered a homogenous US population.

  13. Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions

    PubMed Central

    Yoshimoto, Junichiro; Shimizu, Yu; Okada, Go; Takamura, Masahiro; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji

    2017-01-01

    We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data. PMID:29049392

  14. A Comparison of Imputation Methods for Bayesian Factor Analysis Models

    ERIC Educational Resources Information Center

    Merkle, Edgar C.

    2011-01-01

    Imputation methods are popular for the handling of missing data in psychology. The methods generally consist of predicting missing data based on observed data, yielding a complete data set that is amiable to standard statistical analyses. In the context of Bayesian factor analysis, this article compares imputation under an unrestricted…

  15. Bayesian Meta-Analysis of Cronbach's Coefficient Alpha to Evaluate Informative Hypotheses

    ERIC Educational Resources Information Center

    Okada, Kensuke

    2015-01-01

    This paper proposes a new method to evaluate informative hypotheses for meta-analysis of Cronbach's coefficient alpha using a Bayesian approach. The coefficient alpha is one of the most widely used reliability indices. In meta-analyses of reliability, researchers typically form specific informative hypotheses beforehand, such as "alpha of…

  16. Evidence of major genes affecting stress response in rainbow trout using Bayesian methods of complex segregation analysis

    USDA-ARS?s Scientific Manuscript database

    As a first step towards the genetic mapping of quantitative trait loci (QTL) affecting stress response variation in rainbow trout, we performed complex segregation analyses (CSA) fitting mixed inheritance models of plasma cortisol using Bayesian methods in large full-sib families of rainbow trout. ...

  17. Using Discrete Loss Functions and Weighted Kappa for Classification: An Illustration Based on Bayesian Network Analysis

    ERIC Educational Resources Information Center

    Zwick, Rebecca; Lenaburg, Lubella

    2009-01-01

    In certain data analyses (e.g., multiple discriminant analysis and multinomial log-linear modeling), classification decisions are made based on the estimated posterior probabilities that individuals belong to each of several distinct categories. In the Bayesian network literature, this type of classification is often accomplished by assigning…

  18. Impact of censoring on learning Bayesian networks in survival modelling.

    PubMed

    Stajduhar, Ivan; Dalbelo-Basić, Bojana; Bogunović, Nikola

    2009-11-01

    Bayesian networks are commonly used for presenting uncertainty and covariate interactions in an easily interpretable way. Because of their efficient inference and ability to represent causal relationships, they are an excellent choice for medical decision support systems in diagnosis, treatment, and prognosis. Although good procedures for learning Bayesian networks from data have been defined, their performance in learning from censored survival data has not been widely studied. In this paper, we explore how to use these procedures to learn about possible interactions between prognostic factors and their influence on the variate of interest. We study how censoring affects the probability of learning correct Bayesian network structures. Additionally, we analyse the potential usefulness of the learnt models for predicting the time-independent probability of an event of interest. We analysed the influence of censoring with a simulation on synthetic data sampled from randomly generated Bayesian networks. We used two well-known methods for learning Bayesian networks from data: a constraint-based method and a score-based method. We compared the performance of each method under different levels of censoring to those of the naive Bayes classifier and the proportional hazards model. We did additional experiments on several datasets from real-world medical domains. The machine-learning methods treated censored cases in the data as event-free. We report and compare results for several commonly used model evaluation metrics. On average, the proportional hazards method outperformed other methods in most censoring setups. As part of the simulation study, we also analysed structural similarities of the learnt networks. Heavy censoring, as opposed to no censoring, produces up to a 5% surplus and up to 10% missing total arcs. It also produces up to 50% missing arcs that should originally be connected to the variate of interest. Presented methods for learning Bayesian networks from data can be used to learn from censored survival data in the presence of light censoring (up to 20%) by treating censored cases as event-free. Given intermediate or heavy censoring, the learnt models become tuned to the majority class and would thus require a different approach.

  19. Bayesian analyses of time-interval data for environmental radiation monitoring.

    PubMed

    Luo, Peng; Sharp, Julia L; DeVol, Timothy A

    2013-01-01

    Time-interval (time difference between two consecutive pulses) analysis based on the principles of Bayesian inference was investigated for online radiation monitoring. Using experimental and simulated data, Bayesian analysis of time-interval data [Bayesian (ti)] was compared with Bayesian and a conventional frequentist analysis of counts in a fixed count time [Bayesian (cnt) and single interval test (SIT), respectively]. The performances of the three methods were compared in terms of average run length (ARL) and detection probability for several simulated detection scenarios. Experimental data were acquired with a DGF-4C system in list mode. Simulated data were obtained using Monte Carlo techniques to obtain a random sampling of the Poisson distribution. All statistical algorithms were developed using the R Project for statistical computing. Bayesian analysis of time-interval information provided a similar detection probability as Bayesian analysis of count information, but the authors were able to make a decision with fewer pulses at relatively higher radiation levels. In addition, for the cases with very short presence of the source (< count time), time-interval information is more sensitive to detect a change than count information since the source data is averaged by the background data over the entire count time. The relationships of the source time, change points, and modifications to the Bayesian approach for increasing detection probability are presented.

  20. Bayesian model selection techniques as decision support for shaping a statistical analysis plan of a clinical trial: An example from a vertigo phase III study with longitudinal count data as primary endpoint

    PubMed Central

    2012-01-01

    Background A statistical analysis plan (SAP) is a critical link between how a clinical trial is conducted and the clinical study report. To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques. To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP. Methods We focus on generalized linear mixed models (GLMMs) for the analysis of longitudinal count data. A series of distributions with over- and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial. To fit GLMMs we use a novel Bayesian computational approach based on integrated nested Laplace approximations (INLAs). The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. We evaluate competing GLMMs for longitudinal count data according to the deviance information criterion (DIC) or probability integral transform (PIT), and by using proper scoring rules (e.g. the logarithmic score). Results The instruments under study provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. Conclusions The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint. PMID:22962944

  1. Bayesian model selection techniques as decision support for shaping a statistical analysis plan of a clinical trial: an example from a vertigo phase III study with longitudinal count data as primary endpoint.

    PubMed

    Adrion, Christine; Mansmann, Ulrich

    2012-09-10

    A statistical analysis plan (SAP) is a critical link between how a clinical trial is conducted and the clinical study report. To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques. To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP. We focus on generalized linear mixed models (GLMMs) for the analysis of longitudinal count data. A series of distributions with over- and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial. To fit GLMMs we use a novel Bayesian computational approach based on integrated nested Laplace approximations (INLAs). The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. We evaluate competing GLMMs for longitudinal count data according to the deviance information criterion (DIC) or probability integral transform (PIT), and by using proper scoring rules (e.g. the logarithmic score). The instruments under study provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint.

  2. Spatio-Temporal History of HIV-1 CRF35_AD in Afghanistan and Iran.

    PubMed

    Eybpoosh, Sana; Bahrampour, Abbas; Karamouzian, Mohammad; Azadmanesh, Kayhan; Jahanbakhsh, Fatemeh; Mostafavi, Ehsan; Zolala, Farzaneh; Haghdoost, Ali Akbar

    2016-01-01

    HIV-1 Circulating Recombinant Form 35_AD (CRF35_AD) has an important position in the epidemiological profile of Afghanistan and Iran. Despite the presence of this clade in Afghanistan and Iran for over a decade, our understanding of its origin and dissemination patterns is limited. In this study, we performed a Bayesian phylogeographic analysis to reconstruct the spatio-temporal dispersion pattern of this clade using eligible CRF35_AD gag and pol sequences available in the Los Alamos HIV database (432 sequences available from Iran, 16 sequences available from Afghanistan, and a single CRF35_AD-like pol sequence available from USA). Bayesian Markov Chain Monte Carlo algorithm was implemented in BEAST v1.8.1. Between-country dispersion rates were tested with Bayesian stochastic search variable selection method and were considered significant where Bayes factor values were greater than three. The findings suggested that CRF35_AD sequences were genetically similar to parental sequences from Kenya and Uganda, and to a set of subtype A1 sequences available from Afghan refugees living in Pakistan. Our results also showed that across all phylogenies, Afghan and Iranian CRF35_AD sequences formed a monophyletic cluster (posterior clade credibility> 0.7). The divergence date of this cluster was estimated to be between 1990 and 1992. Within this cluster, a bidirectional dispersion of the virus was observed across Afghanistan and Iran. We could not clearly identify if Afghanistan or Iran first established or received this epidemic, as the root location of this cluster could not be robustly estimated. Three CRF35_AD sequences from Afghan refugees living in Pakistan nested among Afghan and Iranian CRF35_AD branches. However, the CRF35_AD-like sequence available from USA diverged independently from Kenyan subtype A1 sequences, suggesting it not to be a true CRF35_AD lineage. Potential factors contributing to viral exchange between Afghanistan and Iran could be injection drug networks and mass migration of Afghan refugees and labours to Iran, which calls for extensive preventive efforts.

  3. Spatio-Temporal History of HIV-1 CRF35_AD in Afghanistan and Iran

    PubMed Central

    Eybpoosh, Sana; Bahrampour, Abbas; Karamouzian, Mohammad; Azadmanesh, Kayhan; Jahanbakhsh, Fatemeh; Mostafavi, Ehsan; Zolala, Farzaneh; Haghdoost, Ali Akbar

    2016-01-01

    HIV-1 Circulating Recombinant Form 35_AD (CRF35_AD) has an important position in the epidemiological profile of Afghanistan and Iran. Despite the presence of this clade in Afghanistan and Iran for over a decade, our understanding of its origin and dissemination patterns is limited. In this study, we performed a Bayesian phylogeographic analysis to reconstruct the spatio-temporal dispersion pattern of this clade using eligible CRF35_AD gag and pol sequences available in the Los Alamos HIV database (432 sequences available from Iran, 16 sequences available from Afghanistan, and a single CRF35_AD-like pol sequence available from USA). Bayesian Markov Chain Monte Carlo algorithm was implemented in BEAST v1.8.1. Between-country dispersion rates were tested with Bayesian stochastic search variable selection method and were considered significant where Bayes factor values were greater than three. The findings suggested that CRF35_AD sequences were genetically similar to parental sequences from Kenya and Uganda, and to a set of subtype A1 sequences available from Afghan refugees living in Pakistan. Our results also showed that across all phylogenies, Afghan and Iranian CRF35_AD sequences formed a monophyletic cluster (posterior clade credibility> 0.7). The divergence date of this cluster was estimated to be between 1990 and 1992. Within this cluster, a bidirectional dispersion of the virus was observed across Afghanistan and Iran. We could not clearly identify if Afghanistan or Iran first established or received this epidemic, as the root location of this cluster could not be robustly estimated. Three CRF35_AD sequences from Afghan refugees living in Pakistan nested among Afghan and Iranian CRF35_AD branches. However, the CRF35_AD-like sequence available from USA diverged independently from Kenyan subtype A1 sequences, suggesting it not to be a true CRF35_AD lineage. Potential factors contributing to viral exchange between Afghanistan and Iran could be injection drug networks and mass migration of Afghan refugees and labours to Iran, which calls for extensive preventive efforts. PMID:27280293

  4. Program SPACECAP: software for estimating animal density using spatially explicit capture-recapture models

    USGS Publications Warehouse

    Gopalaswamy, Arjun M.; Royle, J. Andrew; Hines, James E.; Singh, Pallavi; Jathanna, Devcharan; Kumar, N. Samba; Karanth, K. Ullas

    2012-01-01

    1. The advent of spatially explicit capture-recapture models is changing the way ecologists analyse capture-recapture data. However, the advantages offered by these new models are not fully exploited because they can be difficult to implement. 2. To address this need, we developed a user-friendly software package, created within the R programming environment, called SPACECAP. This package implements Bayesian spatially explicit hierarchical models to analyse spatial capture-recapture data. 3. Given that a large number of field biologists prefer software with graphical user interfaces for analysing their data, SPACECAP is particularly useful as a tool to increase the adoption of Bayesian spatially explicit capture-recapture methods in practice.

  5. Bayesian generalized linear mixed modeling of Tuberculosis using informative priors.

    PubMed

    Ojo, Oluwatobi Blessing; Lougue, Siaka; Woldegerima, Woldegebriel Assefa

    2017-01-01

    TB is rated as one of the world's deadliest diseases and South Africa ranks 9th out of the 22 countries with hardest hit of TB. Although many pieces of research have been carried out on this subject, this paper steps further by inculcating past knowledge into the model, using Bayesian approach with informative prior. Bayesian statistics approach is getting popular in data analyses. But, most applications of Bayesian inference technique are limited to situations of non-informative prior, where there is no solid external information about the distribution of the parameter of interest. The main aim of this study is to profile people living with TB in South Africa. In this paper, identical regression models are fitted for classical and Bayesian approach both with non-informative and informative prior, using South Africa General Household Survey (GHS) data for the year 2014. For the Bayesian model with informative prior, South Africa General Household Survey dataset for the year 2011 to 2013 are used to set up priors for the model 2014.

  6. Bayesian Decision Support

    NASA Astrophysics Data System (ADS)

    Berliner, M.

    2017-12-01

    Bayesian statistical decision theory offers a natural framework for decision-policy making in the presence of uncertainty. Key advantages of the approach include efficient incorporation of information and observations. However, in complicated settings it is very difficult, perhaps essentially impossible, to formalize the mathematical inputs needed in the approach. Nevertheless, using the approach as a template is useful for decision support; that is, organizing and communicating our analyses. Bayesian hierarchical modeling is valuable in quantifying and managing uncertainty such cases. I review some aspects of the idea emphasizing statistical model development and use in the context of sea-level rise.

  7. A school based cluster randomised health education intervention trial for improving knowledge and attitudes related to Taenia solium cysticercosis and taeniasis in Mbulu district, northern Tanzania.

    PubMed

    Mwidunda, Sylvester A; Carabin, Hélène; Matuja, William B M; Winkler, Andrea S; Ngowi, Helena A

    2015-01-01

    Taenia solium causes significant economic and public health impacts in endemic countries. This study determined effectiveness of a health education intervention at improving school children's knowledge and attitudes related to T. solium cysticercosis and taeniasis in Tanzania. A cluster randomised controlled health education intervention trial was conducted in 60 schools (30 primary, 30 secondary) in Mbulu district. Baseline data were collected using a structured questionnaire in the 60 schools and group discussions in three other schools. The 60 schools stratified by baseline knowledge were randomised to receive the intervention or serve as control. The health education consisted of an address by a trained teacher, a video show and a leaflet given to each pupil. Two post-intervention re-assessments (immediately and 6 months post-intervention) were conducted in all schools and the third (12 months post-intervention) was conducted in 28 secondary schools. Data were analysed using Bayesian hierarchical log-binomial models for individual knowledge and attitude questions and Bayesian hierarchical linear regression models for scores. The overall score (percentage of correct answers) improved by about 10% in all schools after 6 months, but was slightly lower among secondary schools. Monitoring alone was associated with improvement in scores by about 6%. The intervention was linked to improvements in knowledge regarding taeniasis, porcine cysticercosis, human cysticercosis, epilepsy, the attitude of condemning infected meat but it reduced the attitude of contacting a veterinarian if a pig was found to be infected with cysticercosis. Monitoring alone was linked to an improvement in how best to raise pigs. This study demonstrates the potential value of school children as targets for health messages to control T. solium cysticercosis and taeniasis in endemic areas. Studies are needed to assess effectiveness of message transmission from children to parents and the general community and their impacts in improving behaviours facilitating disease transmission.

  8. A School Based Cluster Randomised Health Education Intervention Trial for Improving Knowledge and Attitudes Related to Taenia solium Cysticercosis and Taeniasis in Mbulu District, Northern Tanzania

    PubMed Central

    Mwidunda, Sylvester A.; Carabin, Hélène; Matuja, William B. M.; Winkler, Andrea S.; Ngowi, Helena A.

    2015-01-01

    Taenia solium causes significant economic and public health impacts in endemic countries. This study determined effectiveness of a health education intervention at improving school children’s knowledge and attitudes related to T. solium cysticercosis and taeniasis in Tanzania. A cluster randomised controlled health education intervention trial was conducted in 60 schools (30 primary, 30 secondary) in Mbulu district. Baseline data were collected using a structured questionnaire in the 60 schools and group discussions in three other schools. The 60 schools stratified by baseline knowledge were randomised to receive the intervention or serve as control. The health education consisted of an address by a trained teacher, a video show and a leaflet given to each pupil. Two post-intervention re-assessments (immediately and 6 months post-intervention) were conducted in all schools and the third (12 months post-intervention) was conducted in 28 secondary schools. Data were analysed using Bayesian hierarchical log-binomial models for individual knowledge and attitude questions and Bayesian hierarchical linear regression models for scores. The overall score (percentage of correct answers) improved by about 10% in all schools after 6 months, but was slightly lower among secondary schools. Monitoring alone was associated with improvement in scores by about 6%. The intervention was linked to improvements in knowledge regarding taeniasis, porcine cysticercosis, human cysticercosis, epilepsy, the attitude of condemning infected meat but it reduced the attitude of contacting a veterinarian if a pig was found to be infected with cysticercosis. Monitoring alone was linked to an improvement in how best to raise pigs. This study demonstrates the potential value of school children as targets for health messages to control T. solium cysticercosis and taeniasis in endemic areas. Studies are needed to assess effectiveness of message transmission from children to parents and the general community and their impacts in improving behaviours facilitating disease transmission. PMID:25719902

  9. Analyzing the relationship between sequence divergence and nodal support using Bayesian phylogenetic analyses.

    PubMed

    Makowsky, Robert; Cox, Christian L; Roelke, Corey; Chippindale, Paul T

    2010-11-01

    Determining the appropriate gene for phylogeny reconstruction can be a difficult process. Rapidly evolving genes tend to resolve recent relationships, but suffer from alignment issues and increased homoplasy among distantly related species. Conversely, slowly evolving genes generally perform best for deeper relationships, but lack sufficient variation to resolve recent relationships. We determine the relationship between sequence divergence and Bayesian phylogenetic reconstruction ability using both natural and simulated datasets. The natural data are based on 28 well-supported relationships within the subphylum Vertebrata. Sequences of 12 genes were acquired and Bayesian analyses were used to determine phylogenetic support for correct relationships. Simulated datasets were designed to determine whether an optimal range of sequence divergence exists across extreme phylogenetic conditions. Across all genes we found that an optimal range of divergence for resolving the correct relationships does exist, although this level of divergence expectedly depends on the distance metric. Simulated datasets show that an optimal range of sequence divergence exists across diverse topologies and models of evolution. We determine that a simple to measure property of genetic sequences (genetic distance) is related to phylogenic reconstruction ability in Bayesian analyses. This information should be useful for selecting the most informative gene to resolve any relationships, especially those that are difficult to resolve, as well as minimizing both cost and confounding information during project design. Copyright © 2010. Published by Elsevier Inc.

  10. Decentralized cooperative TOA/AOA target tracking for hierarchical wireless sensor networks.

    PubMed

    Chen, Ying-Chih; Wen, Chih-Yu

    2012-11-08

    This paper proposes a distributed method for cooperative target tracking in hierarchical wireless sensor networks. The concept of leader-based information processing is conducted to achieve object positioning, considering a cluster-based network topology. Random timers and local information are applied to adaptively select a sub-cluster for the localization task. The proposed energy-efficient tracking algorithm allows each sub-cluster member to locally estimate the target position with a Bayesian filtering framework and a neural networking model, and further performs estimation fusion in the leader node with the covariance intersection algorithm. This paper evaluates the merits and trade-offs of the protocol design towards developing more efficient and practical algorithms for object position estimation.

  11. Genetic homogeneity of the invasive lionfish across the Northwestern Atlantic and the Gulf of Mexico based on Single Nucleotide Polymorphisms.

    PubMed

    Pérez-Portela, R; Bumford, A; Coffman, B; Wedelich, S; Davenport, M; Fogg, A; Swenarton, M K; Coleman, F; Johnston, M A; Crawford, D L; Oleksiak, M F

    2018-03-22

    Despite the devastating impact of the lionfish (Pterois volitans) invasion on NW Atlantic ecosystems, little genetic information about the invasion process is available. We applied Genotyping by Sequencing techniques to identify 1,220 single nucleotide polymorphic sites (SNPs) from 162 lionfish samples collected between 2013 and 2015 from two areas chronologically identified as the first and last invaded areas in US waters: the east coast of Florida and the Gulf of Mexico. We used population genomic analyses, including phylogenetic reconstruction, Bayesian clustering, genetic distances, Discriminant Analyses of Principal Components, and coalescence simulations for detection of outlier SNPs, to understand genetic trends relevant to the lionfish's long-term persistence. We found no significant differences in genetic structure or diversity between the two areas (F ST p-values > 0.01, and t-test p-values > 0.05). In fact, our genomic analyses showed genetic homogeneity, with enough gene flow between the east coast of Florida and Gulf of Mexico to erase previous signals of genetic divergence detected between these areas, secondary spreading, and bottlenecks in the Gulf of Mexico. These findings suggest rapid genetic changes over space and time during the invasion, resulting in one panmictic population with no signs of divergence between areas due to local adaptation.

  12. Molecular evidence of hybrid zones of Cedrela (Meliaceae) in the Yungas of Northwestern Argentina.

    PubMed

    Zelener, Noga; Tosto, Daniela; de Oliveira, Luiz Orlando; Soldati, María Cristina; Inza, María Virginia; Fornes, Luis Fernando

    2016-09-01

    In the Yungas of Northwestern Argentina, three endangered species of Cedrela (C. angustifolia, C. saltensis, and C. balansae) follow altitudinal gradients of distribution with contact zones between them. We sampled 210 individuals from 20 populations that spanned most of Cedrela's geographical range in the Yungas, and used Amplified Fragment Length Polymorphism (AFLP) markers and DNA sequences of the nuclear Internal Transcribed Spacer (ITS) to investigate hybrid zones. Data analyses employed an array of complementary methods, including principal coordinate analyses, Bayesian clustering analyses, maximum likelihood tree-building, and network techniques. Both nuclear molecular systems - AFLP and ITS - provided insights into the evolutionary history of Cedrela in the Yungas in a congruent manner. We uncovered strong support for the occurrence of natural hybridization between C. balansae and C. saltensis. Additionally, we identified hybrid zones in areas of sympatry (at both the Calilegua National Park and the San Andrés farm) and in transition zones from 820 to 1100meters above sea level (localities of Pintascayo and Acambuco). There was no evidence for hybridization of either C. balansae or C. saltensis with C. angustifolia. The role of hybrid populations in conservation and use of genetic resources in the Yungas were discussed. Copyright © 2016 Elsevier Inc. All rights reserved.

  13. Are Student Evaluations of Teaching Effectiveness Valid for Measuring Student Learning Outcomes in Business Related Classes? A Neural Network and Bayesian Analyses

    ERIC Educational Resources Information Center

    Galbraith, Craig S.; Merrill, Gregory B.; Kline, Doug M.

    2012-01-01

    In this study we investigate the underlying relational structure between student evaluations of teaching effectiveness (SETEs) and achievement of student learning outcomes in 116 business related courses. Utilizing traditional statistical techniques, a neural network analysis and a Bayesian data reduction and classification algorithm, we find…

  14. Whole-genome sequencing and analyses identify high genetic heterogeneity, diversity and endemicity of rotavirus genotype P[6] strains circulating in Africa.

    PubMed

    Nyaga, Martin M; Tan, Yi; Seheri, Mapaseka L; Halpin, Rebecca A; Akopov, Asmik; Stucker, Karla M; Fedorova, Nadia B; Shrivastava, Susmita; Duncan Steele, A; Mwenda, Jason M; Pickett, Brett E; Das, Suman R; Jeffrey Mphahlele, M

    2018-05-18

    Rotavirus A (RVA) exhibits a wide genotype diversity globally. Little is known about the genetic composition of genotype P[6] from Africa. This study investigated possible evolutionary mechanisms leading to genetic diversity of genotype P[6] VP4 sequences. Phylogenetic analyses on 167 P[6] VP4 full-length sequences were conducted, which included six porcine-origin sequences. Of the 167 sequences, 57 were newly acquired through whole genome sequencing as part of this study. The other 110 sequences were all publicly-available global P[6] VP4 full-length sequences downloaded from GenBank. The strength of association between the phenotypic features and the phylogeny was also determined. A number of reassortment and mixed infections of RVA genotype P[6] strains were observed in this study. Phylogenetic analyses demostrated the extensive genetic diversity that exists among human P[6] strains, porcine-like strains, their concomitant clades/subclades and estimated that P[6] VP4 gene has a higher substitution rate with the mean of 1.05E-3 substitutions/site/year. Further, the phylogenetic analyses indicated that genotype P[6] strains were endemic in Africa, characterised by an extensive genetic diversity and long-time local evolution of the viruses. This was also supported by phylogeographic clustering and G-genotype clustering of the P[6] strains when Bayesian Tip-association Significance testing (BaTS) was applied, clearly supporting that the viruses evolved locally in Africa instead of spatial mixing among different regions. Overall, the results demonstrated that multiple mechanisms such as reassortment events, various mutations and possibly interspecies transmission account for the enormous diversity of genotype P[6] strains in Africa. These findings highlight the need for continued global surveillance of rotavirus diversity. Copyright © 2018 Elsevier B.V. All rights reserved.

  15. Habitat fragmentation in coastal southern California disrupts genetic connectivity in the cactus wren (Campylorhynchus brunneicapillus).

    PubMed

    Barr, Kelly R; Kus, Barbara E; Preston, Kristine L; Howell, Scarlett; Perkins, Emily; Vandergast, Amy G

    2015-05-01

    Achieving long-term persistence of species in urbanized landscapes requires characterizing population genetic structure to understand and manage the effects of anthropogenic disturbance on connectivity. Urbanization over the past century in coastal southern California has caused both precipitous loss of coastal sage scrub habitat and declines in populations of the cactus wren (Campylorhynchus brunneicapillus). Using 22 microsatellite loci, we found that remnant cactus wren aggregations in coastal southern California comprised 20 populations based on strict exact tests for population differentiation, and 12 genetic clusters with hierarchical Bayesian clustering analyses. Genetic structure patterns largely mirrored underlying habitat availability, with cluster and population boundaries coinciding with fragmentation caused primarily by urbanization. Using a habitat model we developed, we detected stronger associations between habitat-based distances and genetic distances than Euclidean geographic distance. Within populations, we detected a positive association between available local habitat and allelic richness and a negative association with relatedness. Isolation-by-distance patterns varied over the study area, which we attribute to temporal differences in anthropogenic landscape development. We also found that genetic bottleneck signals were associated with wildfire frequency. These results indicate that habitat fragmentation and alterations have reduced genetic connectivity and diversity of cactus wren populations in coastal southern California. Management efforts focused on improving connectivity among remaining populations may help to ensure population persistence. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.

  16. A Tutorial in Bayesian Potential Outcomes Mediation Analysis.

    PubMed

    Miočević, Milica; Gonzalez, Oscar; Valente, Matthew J; MacKinnon, David P

    2018-01-01

    Statistical mediation analysis is used to investigate intermediate variables in the relation between independent and dependent variables. Causal interpretation of mediation analyses is challenging because randomization of subjects to levels of the independent variable does not rule out the possibility of unmeasured confounders of the mediator to outcome relation. Furthermore, commonly used frequentist methods for mediation analysis compute the probability of the data given the null hypothesis, which is not the probability of a hypothesis given the data as in Bayesian analysis. Under certain assumptions, applying the potential outcomes framework to mediation analysis allows for the computation of causal effects, and statistical mediation in the Bayesian framework gives indirect effects probabilistic interpretations. This tutorial combines causal inference and Bayesian methods for mediation analysis so the indirect and direct effects have both causal and probabilistic interpretations. Steps in Bayesian causal mediation analysis are shown in the application to an empirical example.

  17. When mechanism matters: Bayesian forecasting using models of ecological diffusion

    USGS Publications Warehouse

    Hefley, Trevor J.; Hooten, Mevin B.; Russell, Robin E.; Walsh, Daniel P.; Powell, James A.

    2017-01-01

    Ecological diffusion is a theory that can be used to understand and forecast spatio-temporal processes such as dispersal, invasion, and the spread of disease. Hierarchical Bayesian modelling provides a framework to make statistical inference and probabilistic forecasts, using mechanistic ecological models. To illustrate, we show how hierarchical Bayesian models of ecological diffusion can be implemented for large data sets that are distributed densely across space and time. The hierarchical Bayesian approach is used to understand and forecast the growth and geographic spread in the prevalence of chronic wasting disease in white-tailed deer (Odocoileus virginianus). We compare statistical inference and forecasts from our hierarchical Bayesian model to phenomenological regression-based methods that are commonly used to analyse spatial occurrence data. The mechanistic statistical model based on ecological diffusion led to important ecological insights, obviated a commonly ignored type of collinearity, and was the most accurate method for forecasting.

  18. Applications of Bayesian Procrustes shape analysis to ensemble radar reflectivity nowcast verification

    NASA Astrophysics Data System (ADS)

    Fox, Neil I.; Micheas, Athanasios C.; Peng, Yuqiang

    2016-07-01

    This paper introduces the use of Bayesian full Procrustes shape analysis in object-oriented meteorological applications. In particular, the Procrustes methodology is used to generate mean forecast precipitation fields from a set of ensemble forecasts. This approach has advantages over other ensemble averaging techniques in that it can produce a forecast that retains the morphological features of the precipitation structures and present the range of forecast outcomes represented by the ensemble. The production of the ensemble mean avoids the problems of smoothing that result from simple pixel or cell averaging, while producing credible sets that retain information on ensemble spread. Also in this paper, the full Bayesian Procrustes scheme is used as an object verification tool for precipitation forecasts. This is an extension of a previously presented Procrustes shape analysis based verification approach into a full Bayesian format designed to handle the verification of precipitation forecasts that match objects from an ensemble of forecast fields to a single truth image. The methodology is tested on radar reflectivity nowcasts produced in the Warning Decision Support System - Integrated Information (WDSS-II) by varying parameters in the K-means cluster tracking scheme.

  19. A review and comparison of Bayesian and likelihood-based inferences in beta regression and zero-or-one-inflated beta regression.

    PubMed

    Liu, Fang; Eugenio, Evercita C

    2018-04-01

    Beta regression is an increasingly popular statistical technique in medical research for modeling of outcomes that assume values in (0, 1), such as proportions and patient reported outcomes. When outcomes take values in the intervals [0,1), (0,1], or [0,1], zero-or-one-inflated beta (zoib) regression can be used. We provide a thorough review on beta regression and zoib regression in the modeling, inferential, and computational aspects via the likelihood-based and Bayesian approaches. We demonstrate the statistical and practical importance of correctly modeling the inflation at zero/one rather than ad hoc replacing them with values close to zero/one via simulation studies; the latter approach can lead to biased estimates and invalid inferences. We show via simulation studies that the likelihood-based approach is computationally faster in general than MCMC algorithms used in the Bayesian inferences, but runs the risk of non-convergence, large biases, and sensitivity to starting values in the optimization algorithm especially with clustered/correlated data, data with sparse inflation at zero and one, and data that warrant regularization of the likelihood. The disadvantages of the regular likelihood-based approach make the Bayesian approach an attractive alternative in these cases. Software packages and tools for fitting beta and zoib regressions in both the likelihood-based and Bayesian frameworks are also reviewed.

  20. cosmoabc: Likelihood-free inference for cosmology

    NASA Astrophysics Data System (ADS)

    Ishida, Emille E. O.; Vitenti, Sandro D. P.; Penna-Lima, Mariana; Trindade, Arlindo M.; Cisewski, Jessi; M.; de Souza, Rafael; Cameron, Ewan; Busti, Vinicius C.

    2015-05-01

    Approximate Bayesian Computation (ABC) enables parameter inference for complex physical systems in cases where the true likelihood function is unknown, unavailable, or computationally too expensive. It relies on the forward simulation of mock data and comparison between observed and synthetic catalogs. cosmoabc is a Python Approximate Bayesian Computation (ABC) sampler featuring a Population Monte Carlo variation of the original ABC algorithm, which uses an adaptive importance sampling scheme. The code can be coupled to an external simulator to allow incorporation of arbitrary distance and prior functions. When coupled with the numcosmo library, it has been used to estimate posterior probability distributions over cosmological parameters based on measurements of galaxy clusters number counts without computing the likelihood function.

  1. Inferring the Growth of Massive Galaxies Using Bayesian Spectral Synthesis Modeling

    NASA Astrophysics Data System (ADS)

    Stillman, Coley Michael; Poremba, Megan R.; Moustakas, John

    2018-01-01

    The most massive galaxies in the universe are typically found at the centers of massive galaxy clusters. Studying these galaxies can provide valuable insight into the hierarchical growth of massive dark matter halos. One of the key challenges of measuring the stellar mass growth of massive galaxies is converting the measured light profiles into stellar mass. We use Prospector, a state-of-the-art Bayesian spectral synthesis modeling code, to infer the total stellar masses of a pilot sample of massive central galaxies selected from the Sloan Digital Sky Survey. We compare our stellar mass estimates to previous measurements, and present some of the quantitative diagnostics provided by Prospector.

  2. The Lineage-Specific Evolution of Aquaporin Gene Clusters Facilitated Tetrapod Terrestrial Adaptation

    PubMed Central

    Finn, Roderick Nigel; Chauvigné, François; Hlidberg, Jón Baldur; Cutler, Christopher P.; Cerdà, Joan

    2014-01-01

    A major physiological barrier for aquatic organisms adapting to terrestrial life is dessication in the aerial environment. This barrier was nevertheless overcome by the Devonian ancestors of extant Tetrapoda, but the origin of specific molecular mechanisms that solved this water problem remains largely unknown. Here we show that an ancient aquaporin gene cluster evolved specifically in the sarcopterygian lineage, and subsequently diverged into paralogous forms of AQP2, -5, or -6 to mediate water conservation in extant Tetrapoda. To determine the origin of these apomorphic genomic traits, we combined aquaporin sequencing from jawless and jawed vertebrates with broad taxon assembly of >2,000 transcripts amongst 131 deuterostome genomes and developed a model based upon Bayesian inference that traces their convergent roots to stem subfamilies in basal Metazoa and Prokaryota. This approach uncovered an unexpected diversity of aquaporins in every lineage investigated, and revealed that the vertebrate superfamily consists of 17 classes of aquaporins (Aqp0 - Aqp16). The oldest orthologs associated with water conservation in modern Tetrapoda are traced to a cluster of three aqp2-like genes in Actinistia that likely arose >500 Ma through duplication of an aqp0-like gene present in a jawless ancestor. In sea lamprey, we show that aqp0 first arose in a protocluster comprised of a novel aqp14 paralog and a fused aqp01 gene. To corroborate these findings, we conducted phylogenetic analyses of five syntenic nuclear receptor subfamilies, which, together with observations of extensive genome rearrangements, support the coincident loss of ancestral aqp2-like orthologs in Actinopterygii. We thus conclude that the divergence of sarcopterygian-specific aquaporin gene clusters was permissive for the evolution of water conservation mechanisms that facilitated tetrapod terrestrial adaptation. PMID:25426855

  3. Population genetic structure of Patagonian toothfish (Dissostichus eleginoides) in the Southeast Pacific and Southwest Atlantic Ocean

    PubMed Central

    Canales-Aguirre, Cristian B.; Galleguillos, Ricardo; Oyarzun, Fernanda X.; Hernández, Cristián E.

    2018-01-01

    Previous studies of population genetic structure in Dissostichus eleginoides have shown that oceanographic and geographic discontinuities drive in this species population differentiation. Studies have focused on the genetics of D. eleginoides in the Southern Ocean; however, there is little knowledge of their genetic variation along the South American continental shelf. In this study, we used a panel of six microsatellites to test whether D. eleginoides shows population genetic structuring in this region. We hypothesized that this species would show zero or very limited genetic structuring due to the habitat continuity along the South American shelf from Peru in the Pacific Ocean to the Falkland Islands in the Atlantic Ocean. We used Bayesian and traditional analyses to evaluate population genetic structure, and we estimated the number of putative migrants and effective population size. Consistent with our predictions, our results showed no significant genetic structuring among populations of the South American continental shelf but supported two significant and well-defined genetic clusters of D. eleginoides between regions (South American continental shelf and South Georgia clusters). Genetic connectivity between these two clusters was 11.3% of putative migrants from the South American cluster to the South Georgia Island and 0.7% in the opposite direction. Effective population size was higher in locations from the South American continental shelf as compared with the South Georgia Island. Overall, our results support that the continuity of the deep-sea habitat along the continental shelf and the biological features of the study species are plausible drivers of intraspecific population genetic structuring across the distribution of D. eleginoides on the South American continental shelf. PMID:29362690

  4. Bayesian Mass Estimates of the Milky Way: Including Measurement Uncertainties with Hierarchical Bayes

    NASA Astrophysics Data System (ADS)

    Eadie, Gwendolyn M.; Springford, Aaron; Harris, William E.

    2017-02-01

    We present a hierarchical Bayesian method for estimating the total mass and mass profile of the Milky Way Galaxy. The new hierarchical Bayesian approach further improves the framework presented by Eadie et al. and Eadie and Harris and builds upon the preliminary reports by Eadie et al. The method uses a distribution function f({ E },L) to model the Galaxy and kinematic data from satellite objects, such as globular clusters (GCs), to trace the Galaxy’s gravitational potential. A major advantage of the method is that it not only includes complete and incomplete data simultaneously in the analysis, but also incorporates measurement uncertainties in a coherent and meaningful way. We first test the hierarchical Bayesian framework, which includes measurement uncertainties, using the same data and power-law model assumed in Eadie and Harris and find the results are similar but more strongly constrained. Next, we take advantage of the new statistical framework and incorporate all possible GC data, finding a cumulative mass profile with Bayesian credible regions. This profile implies a mass within 125 kpc of 4.8× {10}11{M}⊙ with a 95% Bayesian credible region of (4.0{--}5.8)× {10}11{M}⊙ . Our results also provide estimates of the true specific energies of all the GCs. By comparing these estimated energies to the measured energies of GCs with complete velocity measurements, we observe that (the few) remote tracers with complete measurements may play a large role in determining a total mass estimate of the Galaxy. Thus, our study stresses the need for more remote tracers with complete velocity measurements.

  5. SLUG - stochastically lighting up galaxies - III. A suite of tools for simulated photometry, spectroscopy, and Bayesian inference with stochastic stellar populations

    NASA Astrophysics Data System (ADS)

    Krumholz, Mark R.; Fumagalli, Michele; da Silva, Robert L.; Rendahl, Theodore; Parra, Jonathan

    2015-09-01

    Stellar population synthesis techniques for predicting the observable light emitted by a stellar population have extensive applications in numerous areas of astronomy. However, accurate predictions for small populations of young stars, such as those found in individual star clusters, star-forming dwarf galaxies, and small segments of spiral galaxies, require that the population be treated stochastically. Conversely, accurate deductions of the properties of such objects also require consideration of stochasticity. Here we describe a comprehensive suite of modular, open-source software tools for tackling these related problems. These include the following: a greatly-enhanced version of the SLUG code introduced by da Silva et al., which computes spectra and photometry for stochastically or deterministically sampled stellar populations with nearly arbitrary star formation histories, clustering properties, and initial mass functions; CLOUDY_SLUG, a tool that automatically couples SLUG-computed spectra with the CLOUDY radiative transfer code in order to predict stochastic nebular emission; BAYESPHOT, a general-purpose tool for performing Bayesian inference on the physical properties of stellar systems based on unresolved photometry; and CLUSTER_SLUG and SFR_SLUG, a pair of tools that use BAYESPHOT on a library of SLUG models to compute the mass, age, and extinction of mono-age star clusters, and the star formation rate of galaxies, respectively. The latter two tools make use of an extensive library of pre-computed stellar population models, which are included in the software. The complete package is available at http://www.slugsps.com.

  6. The second molecular epidemiological study of HIV infection in Mongolia between 2010 and 2016.

    PubMed

    Jagdagsuren, Davaalkham; Hayashida, Tsunefusa; Takano, Misao; Gombo, Erdenetuya; Zayasaikhan, Setsen; Kanayama, Naomi; Tsuchiya, Kiyoto; Oka, Shinichi

    2017-01-01

    Our previous 2005-2009 molecular epidemiological study in Mongolia identified a hot spot of HIV-1 transmission in men who have sex with men (MSM). To control the infection, we collaborated with NGOs to promote safer sex and HIV testing since mid-2010. In this study, we carried out the second molecular epidemiological survey between 2010 and 2016 to determine the status of HIV-1 infection in Mongolia. The study included 143 new cases of HIV-1 infection. Viral RNA was extracted from stocked plasma samples and sequenced for the pol and the env regions using the Sanger method. Near-full length sequencing using MiSeq was performed in 3 patients who were suspected to be infected with recombinant HIV-1. Phylogenetic analysis was performed using the neighbor-joining method and Bayesian Markov chain Monte Carlo method. MSM was the main transmission route in the previous and current studies. However, heterosexual route showed a significant increase in recent years. Phylogenetic analysis documented three taxa; Mongolian B, Korean B, and CRF51_01B, though the former two were also observed in the previous study. CRF51_01B, which originated from Singapore and Malaysia, was confirmed by near-full length sequencing. Although these strains were mainly detected in MSM, they were also found in increasing numbers of heterosexual males and females. Bayesian phylogenetic analysis estimated transmission of CRF51_01B into Mongolia around early 2000s. An extended Bayesian skyline plot showed a rapid increase in the effective population size of Mongolian B cluster around 2004 and that of CRF51_01B cluster around 2011. HIV-1 infection might expand to the general population in Mongolia. Our study documented a new cluster of HIV-1 transmission, enhancing our understanding of the epidemiological status of HIV-1 in Mongolia.

  7. Hierarchical Bayesian modeling of heterogeneous variances in average daily weight gain of commercial feedlot cattle.

    PubMed

    Cernicchiaro, N; Renter, D G; Xiang, S; White, B J; Bello, N M

    2013-06-01

    Variability in ADG of feedlot cattle can affect profits, thus making overall returns more unstable. Hence, knowledge of the factors that contribute to heterogeneity of variances in animal performance can help feedlot managers evaluate risks and minimize profit volatility when making managerial and economic decisions in commercial feedlots. The objectives of the present study were to evaluate heteroskedasticity, defined as heterogeneity of variances, in ADG of cohorts of commercial feedlot cattle, and to identify cattle demographic factors at feedlot arrival as potential sources of variance heterogeneity, accounting for cohort- and feedlot-level information in the data structure. An operational dataset compiled from 24,050 cohorts from 25 U. S. commercial feedlots in 2005 and 2006 was used for this study. Inference was based on a hierarchical Bayesian model implemented with Markov chain Monte Carlo, whereby cohorts were modeled at the residual level and feedlot-year clusters were modeled as random effects. Forward model selection based on deviance information criteria was used to screen potentially important explanatory variables for heteroskedasticity at cohort- and feedlot-year levels. The Bayesian modeling framework was preferred as it naturally accommodates the inherently hierarchical structure of feedlot data whereby cohorts are nested within feedlot-year clusters. Evidence for heterogeneity of variance components of ADG was substantial and primarily concentrated at the cohort level. Feedlot-year specific effects were, by far, the greatest contributors to ADG heteroskedasticity among cohorts, with an estimated ∼12-fold change in dispersion between most and least extreme feedlot-year clusters. In addition, identifiable demographic factors associated with greater heterogeneity of cohort-level variance included smaller cohort sizes, fewer days on feed, and greater arrival BW, as well as feedlot arrival during summer months. These results support that heterogeneity of variances in ADG is prevalent in feedlot performance and indicate potential sources of heteroskedasticity. Further investigation of factors associated with heteroskedasticity in feedlot performance is warranted to increase consistency and uniformity in commercial beef cattle production and subsequent profitability.

  8. Pfhrp2-Deleted Plasmodium falciparum Parasites in the Democratic Republic of the Congo: A National Cross-sectional Survey.

    PubMed

    Parr, Jonathan B; Verity, Robert; Doctor, Stephanie M; Janko, Mark; Carey-Ewend, Kelly; Turman, Breanna J; Keeler, Corinna; Slater, Hannah C; Whitesell, Amy N; Mwandagalirwa, Kashamuka; Ghani, Azra C; Likwela, Joris L; Tshefu, Antoinette K; Emch, Michael; Juliano, Jonathan J; Meshnick, Steven R

    2017-07-01

    Rapid diagnostic tests (RDTs) account for more than two-thirds of malaria diagnoses in Africa. Deletions of the Plasmodium falciparum hrp2 (pfhrp2) gene cause false-negative RDT results and have never been investigated on a national level. Spread of pfhrp2-deleted P. falciparum mutants, resistant to detection by HRP2-based RDTs, would represent a serious threat to malaria elimination efforts. Using a nationally representative cross-sectional study of 7,137 children under five years of age from the Democratic Republic of Congo (DRC), we tested 783 subjects with RDT-/PCR+ results using PCR assays to detect and confirm deletions of the pfhrp2 gene. Spatial and population genetic analyses were employed to examine the distribution and evolution of these parasites. We identified 149 pfhrp2-deleted parasites, representing 6.4% of all P. falciparum infections country-wide (95% confidence interval 5.1-8.0%). Bayesian spatial analyses identified statistically significant clustering of pfhrp2 deletions near Kinshasa and Kivu. Population genetic analysis revealed significant genetic differentiation between wild-type and pfhrp2-deleted parasite populations (GST = .046, p ≤ .00001). Pfhrp2-deleted P. falciparum is a common cause of RDT-/PCR+ malaria among asymptomatic children in the DRC and appears to be clustered within select communities. Surveillance for these deletions is needed, and alternatives to HRP2-specific RDTs may be necessary. © The Author 2016. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail journals.permissions@oup.com.

  9. Collaborative learning framework for online stakeholder engagement.

    PubMed

    Khodyakov, Dmitry; Savitsky, Terrance D; Dalal, Siddhartha

    2016-08-01

    Public and stakeholder engagement can improve the quality of both research and policy decision making. However, such engagement poses significant methodological challenges in terms of collecting and analysing input from large, diverse groups. To explain how online approaches can facilitate iterative stakeholder engagement, to describe how input from large and diverse stakeholder groups can be analysed and to propose a collaborative learning framework (CLF) to interpret stakeholder engagement results. We use 'A National Conversation on Reducing the Burden of Suicide in the United States' as a case study of online stakeholder engagement and employ a Bayesian data modelling approach to develop a CLF. Our data modelling results identified six distinct stakeholder clusters that varied in the degree of individual articulation and group agreement and exhibited one of the three learning styles: learning towards consensus, learning by contrast and groupthink. Learning by contrast was the most common, or dominant, learning style in this study. Study results were used to develop a CLF, which helps explore multitude of stakeholder perspectives; identifies clusters of participants with similar shifts in beliefs; offers an empirically derived indicator of engagement quality; and helps determine the dominant learning style. The ability to detect learning by contrast helps illustrate differences in stakeholder perspectives, which may help policymakers, including Patient-Centered Outcomes Research Institute, make better decisions by soliciting and incorporating input from patients, caregivers, health-care providers and researchers. Study results have important implications for soliciting and incorporating input from stakeholders with different interests and perspectives. © 2015 The Authors. Health Expectations Published by John Wiley & Sons Ltd.

  10. The range of the mange: Spatiotemporal patterns of sarcoptic mange in red foxes (Vulpes vulpes) as revealed by camera trapping

    PubMed Central

    Odden, Morten; Linnell, John D. C.; Odden, John

    2017-01-01

    Sarcoptic mange is a widely distributed disease that affects numerous mammalian species. We used camera traps to investigate the apparent prevalence and spatiotemporal dynamics of sarcoptic mange in a red fox population in southeastern Norway. We monitored red foxes for five years using 305 camera traps distributed across an 18000 km2 area. A total of 6581 fox events were examined to visually identify mange compatible lesions. We investigated factors associated with the occurrence of mange by using logistic models within a Bayesian framework, whereas the spatiotemporal dynamics of the disease were analysed with space-time scan statistics. The apparent prevalence of the disease fluctuated over the study period with a mean of 3.15% and credible interval [1.25, 6.37], and our best logistic model explaining the presence of red foxes with mange-compatible lesions included time since the beginning of the study and the interaction between distance to settlement and season as explanatory variables. The scan analyses detected several potential clusters of the disease that varied in persistence and size, and the locations in the cluster with the highest probability were closer to human settlements than the other survey locations. Our results indicate that red foxes in an advanced stage of the disease are most likely found closer to human settlements during periods of low wild prey availability (winter). We discuss different potential causes. Furthermore, the disease appears to follow a pattern of small localized outbreaks rather than sporadic isolated events. PMID:28423011

  11. Genetic diversity and structure of Lolium perenne ssp. multiflorum in California vineyards and orchards indicate potential for spread of herbicide resistance via gene flow.

    PubMed

    Karn, Elizabeth; Jasieniuk, Marie

    2017-07-01

    Management of agroecosystems with herbicides imposes strong selection pressures on weedy plants leading to the evolution of resistance against those herbicides. Resistance to glyphosate in populations of Lolium perenne L. ssp. multiflorum is increasingly common in California, USA, causing economic losses and the loss of effective management tools. To gain insights into the recent evolution of glyphosate resistance in L. perenne in perennial cropping systems of northwest California and to inform management, we investigated the frequency of glyphosate resistance and the genetic diversity and structure of 14 populations. The sampled populations contained frequencies of resistant plants ranging from 10% to 89%. Analyses of neutral genetic variation using microsatellite markers indicated very high genetic diversity within all populations regardless of resistance frequency. Genetic variation was distributed predominantly among individuals within populations rather than among populations or sampled counties, as would be expected for a wide-ranging outcrossing weed species. Bayesian clustering analysis provided evidence of population structuring with extensive admixture between two genetic clusters or gene pools. High genetic diversity and admixture, and low differentiation between populations, strongly suggest the potential for spread of resistance through gene flow and the need for management that limits seed and pollen dispersal in L. perenne .

  12. Nuclear and plastid markers reveal the persistence of genetic identity: a new perspective on the evolutionary history of Petunia exserta.

    PubMed

    Segatto, Ana Lúcia Anversa; Cazé, Ana Luíza Ramos; Turchetto, Caroline; Klahre, Ulrich; Kuhlemeier, Cris; Bonatto, Sandro Luis; Freitas, Loreta Brandão

    2014-01-01

    Recently divergent species that can hybridize are ideal models for investigating the genetic exchanges that can occur while preserving the species boundaries. Petunia exserta is an endemic species from a very limited and specific area that grows exclusively in rocky shelters. These shaded spots are an inhospitable habitat for all other Petunia species, including the closely related and widely distributed species P. axillaris. Individuals with intermediate morphologic characteristics have been found near the rocky shelters and were believed to be putative hybrids between P. exserta and P. axillaris, suggesting a situation where Petunia exserta is losing its genetic identity. In the current study, we analyzed the plastid intergenic spacers trnS/trnG and trnH/psbA and six nuclear CAPS markers in a large sampling design of both species to understand the evolutionary process occurring in this biological system. Bayesian clustering methods, cpDNA haplotype networks, genetic diversity statistics, and coalescence-based analyses support a scenario where hybridization occurs while two genetic clusters corresponding to two species are maintained. Our results reinforce the importance of coupling differentially inherited markers with an extensive geographic sample to assess the evolutionary dynamics of recently diverged species that can hybridize. Copyright © 2013 Elsevier Inc. All rights reserved.

  13. HIV Type 1 Transmission Networks Among Men Having Sex with Men and Heterosexuals in Kenya

    PubMed Central

    Faria, Nuno Rodrigues; Hassan, Amin; Hamers, Raph L.; Mutua, Gaudensia; Anzala, Omu; Mandaliya, Kishor; Cane, Patricia; Berkley, James A.; Rinke de Wit, Tobias F.; Wallis, Carole; Graham, Susan M.; Price, Matthew A.; Coutinho, Roel A.; Sanders, Eduard J.

    2014-01-01

    Abstract We performed a molecular phylogenetic study on HIV-1 polymerase sequences of men who have sex with men (MSM) and heterosexual patient samples in Kenya to characterize any observed HIV-1 transmission networks. HIV-1 polymerase sequences were obtained from samples in Nairobi and coastal Kenya from 84 MSM, 226 other men, and 364 women from 2005 to 2010. Using Bayesian phylogenetics, we tested whether sequences clustered by sexual orientation and geographic location. In addition, we used trait diffusion analyses to identify significant epidemiological links and to quantify the number of transmissions between risk groups. Finally, we compared 84 MSM sequences with all HIV-1 sequences available online at GenBank. Significant clustering of sequences from MSM at both coastal Kenya and Nairobi was found, with evidence of HIV-1 transmission between both locations. Although a transmission pair between a coastal MSM and woman was confirmed, no significant HIV-1 transmission was evident between MSM and the comparison population for the predominant subtype A (60%). However, a weak but significant link was evident when studying all subtypes together. GenBank comparison did not reveal other important transmission links. Our data suggest infrequent intermingling of MSM and heterosexual HIV-1 epidemics in Kenya. PMID:23947948

  14. Phylogenetic positions of four hypotrichous ciliates (Protista, Ciliophora) based on SSU rRNA gene, with notes on their morphological characters.

    PubMed

    Yang, Caiting; Liu, An; Xu, Yusen; Xu, Yuan; Fan, Xinpeng; Al-Farraj, Saleh A; Ni, Bing; Gu, Fukang

    2015-08-18

     The morphology and infraciliature of the four hypotrichous ciliates; Rigidohymena inquieta (Stokes, 1887) Berger, 2011, Pattersoniella vitiphila Foissner, 1987, Notohymena australis Foissner & O' Donoghue, 1990, and Cyrtohymena (Cyrtohymenides) australis (Foissner, 1995) Foissner, 2004, collected from east China, were investigated by using live observation and protargol impregnation method. An improved diagnosis for R. inquieta was supplied based on descriptions of present and previous populations. New morphology and morphogenesis information based on Chinese populations of another three hypotrichids were also supplemented. The Small-subunit rRNA (SSU rRNA) gene sequences of the four species were characterized and their phylogenetic positions were revealed by means of Bayesian inference and Maximum-likelihood analysis. The analyses shows that R. inquieta clusters with other members of the subfamily Stylonychinae, which confirms the monophyly of the subfamily and verified R. inquieta as a separated species from R. candens though it differs from others mainly by body size. C. (C.) australis occupying the basal position of the clade which contains cyrtohymenids and some other groups, declines the idea of separating Cyrtohymena into two subgenus. Notohymena australis and China population of Pattersoniella vitiphila respectively clustering with their congeners correspond well with the systematics revealed by morphological similarities.

  15. Detecting the influence of ornamental Berberis thunbergii var. atropurpurea in invasive populations of Berberis thunbergii (Berberidaceae) using AFLP1.

    PubMed

    Lubell, Jessica D; Brand, Mark H; Lehrer, Jonathan M; Holsinger, Kent E

    2008-06-01

    Japanese barberry (Berberis thunbergii DC.) is a widespread invasive plant that remains an important landscape shrub represented by ornamental, purple-leaved forms of the botanical variety atropurpurea. These forms differ greatly in appearance from feral plants, bringing into question whether they contribute to invasive populations or whether the invasions represent self-sustaining populations derived from the initial introduction of the species in the late 19th century. In this study we used amplified fragment length polymorphism (AFLP) markers to determine whether genetic contributions from B. t. var. atropurpurea are found within naturalized Japanese barberry populations in southern New England. Bayesian clustering of AFLP genotypes and principal coordinate analysis distinguished B. t. var. atropurpurea genotypes from 85 plants representing five invasive populations. While a single feral plant resembled B. t. var. atropurpurea phenotypically and fell within the same genetic cluster, all other naturalized plants sampled were genetically distinct from the purple-leaved genotypes. Seven plants from two different sites possessed morphology consistent with Berberis vulgaris (common barberry) or B. ×ottawensis (B. thunbergii × B. vulgaris). Genetic analysis placed these plants in two clusters separate from B. thunbergii. Although the Bayesian analysis indicated some introgression of B. t. var. atropurpurea and B. vulgaris, these genotypes have had limited influence on extant feral populations of B. thunbergii.

  16. Signatures of selection in five Italian cattle breeds detected by a 54K SNP panel.

    PubMed

    Mancini, Giordano; Gargani, Maria; Chillemi, Giovanni; Nicolazzi, Ezequiel Luis; Marsan, Paolo Ajmone; Valentini, Alessio; Pariset, Lorraine

    2014-02-01

    In this study we used a medium density panel of SNP markers to perform population genetic analysis in five Italian cattle breeds. The BovineSNP50 BeadChip was used to genotype a total of 2,935 bulls of Piedmontese, Marchigiana, Italian Holstein, Italian Brown and Italian Pezzata Rossa breeds. To determine a genome-wide pattern of positive selection we mapped the F st values against genome location. The highest F st peaks were obtained on BTA6 and BTA13 where some candidate genes are located. We identified selection signatures peculiar of each breed which suggest selection for genes involved in milk or meat traits. The genetic structure was investigated by using a multidimensional scaling of the genetic distance matrix and a Bayesian approach implemented in the STRUCTURE software. The genotyping data showed a clear partitioning of the cattle genetic diversity into distinct breeds if a number of clusters equal to the number of populations were given. Assuming a lower number of clusters beef breeds group together. Both methods showed all five breeds separated in well defined clusters and the Bayesian approach assigned individuals to the breed of origin. The work is of interest not only because it enriches the knowledge on the process of evolution but also because the results generated could have implications for selective breeding programs.

  17. Π4U: A high performance computing framework for Bayesian uncertainty quantification of complex models

    NASA Astrophysics Data System (ADS)

    Hadjidoukas, P. E.; Angelikopoulos, P.; Papadimitriou, C.; Koumoutsakos, P.

    2015-03-01

    We present Π4U, an extensible framework, for non-intrusive Bayesian Uncertainty Quantification and Propagation (UQ+P) of complex and computationally demanding physical models, that can exploit massively parallel computer architectures. The framework incorporates Laplace asymptotic approximations as well as stochastic algorithms, along with distributed numerical differentiation and task-based parallelism for heterogeneous clusters. Sampling is based on the Transitional Markov Chain Monte Carlo (TMCMC) algorithm and its variants. The optimization tasks associated with the asymptotic approximations are treated via the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). A modified subset simulation method is used for posterior reliability measurements of rare events. The framework accommodates scheduling of multiple physical model evaluations based on an adaptive load balancing library and shows excellent scalability. In addition to the software framework, we also provide guidelines as to the applicability and efficiency of Bayesian tools when applied to computationally demanding physical models. Theoretical and computational developments are demonstrated with applications drawn from molecular dynamics, structural dynamics and granular flow.

  18. WebMOTIFS: automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches

    PubMed Central

    Romer, Katherine A.; Kayombya, Guy-Richard; Fraenkel, Ernest

    2007-01-01

    WebMOTIFS provides a web interface that facilitates the discovery and analysis of DNA-sequence motifs. Several studies have shown that the accuracy of motif discovery can be significantly improved by using multiple de novo motif discovery programs and using randomized control calculations to identify the most significant motifs or by using Bayesian approaches. WebMOTIFS makes it easy to apply these strategies. Using a single submission form, users can run several motif discovery programs and score, cluster and visualize the results. In addition, the Bayesian motif discovery program THEME can be used to determine the class of transcription factors that is most likely to regulate a set of sequences. Input can be provided as a list of gene or probe identifiers. Used with the default settings, WebMOTIFS accurately identifies biologically relevant motifs from diverse data in several species. WebMOTIFS is freely available at http://fraenkel.mit.edu/webmotifs. PMID:17584794

  19. Bayesian generalized linear mixed modeling of Tuberculosis using informative priors

    PubMed Central

    Woldegerima, Woldegebriel Assefa

    2017-01-01

    TB is rated as one of the world’s deadliest diseases and South Africa ranks 9th out of the 22 countries with hardest hit of TB. Although many pieces of research have been carried out on this subject, this paper steps further by inculcating past knowledge into the model, using Bayesian approach with informative prior. Bayesian statistics approach is getting popular in data analyses. But, most applications of Bayesian inference technique are limited to situations of non-informative prior, where there is no solid external information about the distribution of the parameter of interest. The main aim of this study is to profile people living with TB in South Africa. In this paper, identical regression models are fitted for classical and Bayesian approach both with non-informative and informative prior, using South Africa General Household Survey (GHS) data for the year 2014. For the Bayesian model with informative prior, South Africa General Household Survey dataset for the year 2011 to 2013 are used to set up priors for the model 2014. PMID:28257437

  20. Genetic structure of lake whitefish (Coregonus clupeaformis) in Lake Michigan

    USGS Publications Warehouse

    VanDeHey, J.A.; Sloss, Brian L.; Peeters, Paul J.; Sutton, T.M.

    2009-01-01

    Genetic relationships among lake whitefish (Coregonus clupeaformis) spawning aggregates in Lake Michigan were assessed and used to predict a stock or management unit (MU) model for the resource. We hypothesized that distinct spawning aggregates represented potential MUs and that differences at molecular markers underlie population differentiation. Genetic stock identification using 11 microsatellite loci indicated the presence of six genetic MUs. Resolved MUs corresponded to geographically proximate spawning aggregates clustering into genetic groups. Within MUs, analyses suggested that all but one delineated MU was a stable grouping (i.e., no between-population differences), with the exception being the Hog Island - Traverse Bay grouping. Elk Rapids was the most genetically divergent population within Lake Michigan. However, low F st values suggested that moderate to high levels of gene flow occur or have occurred in the past between MUs. Significant tests of isolation by distance and low pairwise Fst values potentially led to conflicting results between traditional analyses and a Bayesian approach. This data set could provide baseline data from which a comprehensive mixed-stock analysis could be performed, allowing for more efficient and effective management of this economically and socially important resource.

  1. The phylogenetic position of the Critically Endangered Saint Croix ground lizard Ameiva polops: revisiting molecular systematics of West Indian Ameiva.

    PubMed

    Hurtado, Luis A; Santamaria, Carlos A; Fitzgerald, Lee A

    2014-05-06

    The phylogenetic position of the critically endangered Saint Croix ground lizard Ameiva polops is presently unknown and several hypotheses have been proposed. We investigated the phylogenetic position of this species using molecular phylogenetic methods. We obtained sequences of DNA fragments of the mitochondrial ribosomal genes 12S rDNA and 16S rDNA for this species. We aligned these sequences with published sequences of other Ameiva species, which include most of the Ameiva species from the West Indies, three Ameiva species from Central America and South America, and one from the teiid lizard Tupinambis teguixin, which was used as outgroup. We conducted Maximum Likelihood and Bayesian phylogenetic analyses. The phylogenetic reconstructions among the different methods were very similar, supporting the monophyly of West Indian Ameiva and showing within this lineage, a basal polytomy of four clades that are separated geographically. Ameiva polops grouped in a cluster that included the other two Ameiva species found in the Puerto Rican Bank: A. wetmorei and A. exsul. A sister relationship between A. polops and A. wetmorei is suggested by our analyses. We compare our results with a previous study on molecular systematics of West Indian Ameiva. 

  2. Phylogeography of the Rock Shell Thais clavigera (Mollusca): Evidence for Long-Distance Dispersal in the Northwestern Pacific

    PubMed Central

    Jung, Daewui; Li, Qi; Kong, Ling-Feng; Ni, Gang; Nakano, Tomoyuki; Matsukuma, Akihiko; Kim, Sanghee; Park, Chungoo; Lee, Hyuk Je; Park, Joong-Ki

    2015-01-01

    The present-day genetic structure of a species reflects both historical demography and patterns of contemporary gene flow among populations. To precisely understand how these factors shape current population structure of the northwestern (NW) Pacific marine gastropod, Thais clavigera, we determined the partial nucleotide sequences of the mitochondrial COI gene for 602 individuals sampled from 29 localities spanning almost the whole distribution of T. clavigera in the NW Pacific Ocean (~3,700 km). Results from population genetic and demographic analyses (AMOVA, ΦST-statistics, haplotype networks, Tajima’s D, Fu’s FS, mismatch distribution, and Bayesian skyline plots) revealed a lack of genealogical branches or geographical clusters, and a high level of genetic (haplotype) diversity within each of studied population. Nevertheless, low but significant genetic structuring was detected among some geographical populations separated by the Changjiang River, suggesting the presence of geographical barriers to larval dispersal around this region. Several lines of evidence including significant negative Tajima’s D and Fu’s FS statistics values, the unimodally shaped mismatch distribution, and Bayesian skyline plots suggest a population expansion at marine isotope stage 11 (MIS 11; 400 ka), the longest and warmest interglacial interval during the Pleistocene epoch. The lack of genetic structure among the great majority of the NW Pacific T. clavigera populations may be attributable to high gene flow by current-driven long-distance dispersal of prolonged planktonic larval phase of this species. PMID:26171966

  3. Dreaming of Atmospheres

    NASA Astrophysics Data System (ADS)

    Waldmann, Ingo

    2016-10-01

    Radiative transfer retrievals have become the standard in modelling of exoplanetary transmission and emission spectra. Analysing currently available observations of exoplanetary atmospheres often invoke large and correlated parameter spaces that can be difficult to map or constrain.To address these issues, we have developed the Tau-REx (tau-retrieval of exoplanets) retrieval and the RobERt spectral recognition algorithms. Tau-REx is a bayesian atmospheric retrieval framework using Nested Sampling and cluster computing to fully map these large correlated parameter spaces. Nonetheless, data volumes can become prohibitively large and we must often select a subset of potential molecular/atomic absorbers in an atmosphere.In the era of open-source, automated and self-sufficient retrieval algorithms, such manual input should be avoided. User dependent input could, in worst case scenarios, lead to incomplete models and biases in the retrieval. The RobERt algorithm is build to address these issues. RobERt is a deep belief neural (DBN) networks trained to accurately recognise molecular signatures for a wide range of planets, atmospheric thermal profiles and compositions. Using these deep neural networks, we work towards retrieval algorithms that themselves understand the nature of the observed spectra, are able to learn from current and past data and make sensible qualitative preselections of atmospheric opacities to be used for the quantitative stage of the retrieval process.In this talk I will discuss how neural networks and Bayesian Nested Sampling can be used to solve highly degenerate spectral retrieval problems and what 'dreaming' neural networks can tell us about atmospheric characteristics.

  4. Bayesian spatial prediction of the site index in the study of the Missouri Ozark Forest Ecosystem Project

    Treesearch

    Xiaoqian Sun; Zhuoqiong He; John Kabrick

    2008-01-01

    This paper presents a Bayesian spatial method for analysing the site index data from the Missouri Ozark Forest Ecosystem Project (MOFEP). Based on ecological background and availability, we select three variables, the aspect class, the soil depth and the land type association as covariates for analysis. To allow great flexibility of the smoothness of the random field,...

  5. Univariate and bivariate likelihood-based meta-analysis methods performed comparably when marginal sensitivity and specificity were the targets of inference.

    PubMed

    Dahabreh, Issa J; Trikalinos, Thomas A; Lau, Joseph; Schmid, Christopher H

    2017-03-01

    To compare statistical methods for meta-analysis of sensitivity and specificity of medical tests (e.g., diagnostic or screening tests). We constructed a database of PubMed-indexed meta-analyses of test performance from which 2 × 2 tables for each included study could be extracted. We reanalyzed the data using univariate and bivariate random effects models fit with inverse variance and maximum likelihood methods. Analyses were performed using both normal and binomial likelihoods to describe within-study variability. The bivariate model using the binomial likelihood was also fit using a fully Bayesian approach. We use two worked examples-thoracic computerized tomography to detect aortic injury and rapid prescreening of Papanicolaou smears to detect cytological abnormalities-to highlight that different meta-analysis approaches can produce different results. We also present results from reanalysis of 308 meta-analyses of sensitivity and specificity. Models using the normal approximation produced sensitivity and specificity estimates closer to 50% and smaller standard errors compared to models using the binomial likelihood; absolute differences of 5% or greater were observed in 12% and 5% of meta-analyses for sensitivity and specificity, respectively. Results from univariate and bivariate random effects models were similar, regardless of estimation method. Maximum likelihood and Bayesian methods produced almost identical summary estimates under the bivariate model; however, Bayesian analyses indicated greater uncertainty around those estimates. Bivariate models produced imprecise estimates of the between-study correlation of sensitivity and specificity. Differences between methods were larger with increasing proportion of studies that were small or required a continuity correction. The binomial likelihood should be used to model within-study variability. Univariate and bivariate models give similar estimates of the marginal distributions for sensitivity and specificity. Bayesian methods fully quantify uncertainty and their ability to incorporate external evidence may be useful for imprecisely estimated parameters. Copyright © 2017 Elsevier Inc. All rights reserved.

  6. The utility of Bayesian predictive probabilities for interim monitoring of clinical trials

    PubMed Central

    Connor, Jason T.; Ayers, Gregory D; Alvarez, JoAnn

    2014-01-01

    Background Bayesian predictive probabilities can be used for interim monitoring of clinical trials to estimate the probability of observing a statistically significant treatment effect if the trial were to continue to its predefined maximum sample size. Purpose We explore settings in which Bayesian predictive probabilities are advantageous for interim monitoring compared to Bayesian posterior probabilities, p-values, conditional power, or group sequential methods. Results For interim analyses that address prediction hypotheses, such as futility monitoring and efficacy monitoring with lagged outcomes, only predictive probabilities properly account for the amount of data remaining to be observed in a clinical trial and have the flexibility to incorporate additional information via auxiliary variables. Limitations Computational burdens limit the feasibility of predictive probabilities in many clinical trial settings. The specification of prior distributions brings additional challenges for regulatory approval. Conclusions The use of Bayesian predictive probabilities enables the choice of logical interim stopping rules that closely align with the clinical decision making process. PMID:24872363

  7. Bayesian inference of a historical bottleneck in a heavily exploited marine mammal.

    PubMed

    Hoffman, J I; Grant, S M; Forcada, J; Phillips, C D

    2011-10-01

    Emerging Bayesian analytical approaches offer increasingly sophisticated means of reconstructing historical population dynamics from genetic data, but have been little applied to scenarios involving demographic bottlenecks. Consequently, we analysed a large mitochondrial and microsatellite dataset from the Antarctic fur seal Arctocephalus gazella, a species subjected to one of the most extreme examples of uncontrolled exploitation in history when it was reduced to the brink of extinction by the sealing industry during the late eighteenth and nineteenth centuries. Classical bottleneck tests, which exploit the fact that rare alleles are rapidly lost during demographic reduction, yielded ambiguous results. In contrast, a strong signal of recent demographic decline was detected using both Bayesian skyline plots and Approximate Bayesian Computation, the latter also allowing derivation of posterior parameter estimates that were remarkably consistent with historical observations. This was achieved using only contemporary samples, further emphasizing the potential of Bayesian approaches to address important problems in conservation and evolutionary biology. © 2011 Blackwell Publishing Ltd.

  8. [Alcohol consumption and positive alcohol expectancies in young adults: a typological approach using TwoStep cluster].

    PubMed

    Vautier, S; Jmel, S; Fourio, C; Moncany, D

    2007-09-01

    The present study investigates the heterogeneity of the population of young adult drinkers with respect to alcohol consumption and Positive Alcohol Expectancies (PAEs). Based on the positive relationship between both kinds of variables, PAE is commonly viewed as a potential motivational factor of alcoholic addiction. Empirical analyses based on the regression of alcohol consumption on PAEs suppose that the observations are statistically homogeneous with respect to the level of alcohol consumption, however. We explored the existence of moderate drinkers with a high PAE profile, and abusive drinkers with a low PAE profile. 1,017 young adult drinkers, mean age=23 +/- 2.84, with various educational levels, comprising 506 males and 511 females, were recruited as voluntary participants in a survey by undergraduate psychology students from the University of Toulouse Le Mirail. They completed a French version of the Alcohol Use Disorders Identifiction Test (AUDIT) and a French adaptation of the Alcohol Expectancy Questionnaire (AEQ). Three levels of alcohol consumption were defined using the AUDIT score, and six composite scores were obtained by averaging the relevant item-scores from the AEQ. The AEQ scores were interpreted as measurement of six kinds of PAEs, namely Global positive change, Sexual enhancement, Social and physical pleasure, Social assertiveness, Relaxation, and Arousal/Power. The TwoStep cluster methodology was used to explore the data. This methodology is convenient to deal with a mix of quantitative and qualitative variables, and it provides a classification model which is optimized through the use of an information criterion as Schwarz's Bayesian Information Criterion (BIC). The automatic clustering suggested five clusters, whose stability was ascertained until 75% of the sample size. Low drinkers (n=527) were split into one cluster of low PAEs (I1) and, interestingly, one cluster of high PAEs (I3, 46%). High drinkers (n=344) were split into one cluster of intermediate PAEs (II4) and one cluster of high PAEs (II5, 52%). Interestingly again, abusive drinkers (n=146) remained a single group (III2), exhibiting high PAEs. Clusters I3 and III3 comprised a significant proportion of males. Constraining the algorithm to find 6 clusters did not affect class III2, but split low drinkers into three clusters. Although the present results should be considered cautiously because of the novelty of TwoStep cluster methodology, they suggest a group of moderate drinkers with high PAEs. Also, abusive drinkers express high PAEs (except for 2 cases). Statistical homogeneity of moderate drinkers with respect to PAE variables appears as a dubious assumption.

  9. Clustering and Bayesian hierarchical modeling for the definition of informative prior distributions in hydrogeology

    NASA Astrophysics Data System (ADS)

    Cucchi, K.; Kawa, N.; Hesse, F.; Rubin, Y.

    2017-12-01

    In order to reduce uncertainty in the prediction of subsurface flow and transport processes, practitioners should use all data available. However, classic inverse modeling frameworks typically only make use of information contained in in-situ field measurements to provide estimates of hydrogeological parameters. Such hydrogeological information about an aquifer is difficult and costly to acquire. In this data-scarce context, the transfer of ex-situ information coming from previously investigated sites can be critical for improving predictions by better constraining the estimation procedure. Bayesian inverse modeling provides a coherent framework to represent such ex-situ information by virtue of the prior distribution and combine them with in-situ information from the target site. In this study, we present an innovative data-driven approach for defining such informative priors for hydrogeological parameters at the target site. Our approach consists in two steps, both relying on statistical and machine learning methods. The first step is data selection; it consists in selecting sites similar to the target site. We use clustering methods for selecting similar sites based on observable hydrogeological features. The second step is data assimilation; it consists in assimilating data from the selected similar sites into the informative prior. We use a Bayesian hierarchical model to account for inter-site variability and to allow for the assimilation of multiple types of site-specific data. We present the application and validation of the presented methods on an established database of hydrogeological parameters. Data and methods are implemented in the form of an open-source R-package and therefore facilitate easy use by other practitioners.

  10. RSQRT: AN HEURISTIC FOR ESTIMATING THE NUMBER OF CLUSTERS TO REPORT.

    PubMed

    Carlis, John; Bruso, Kelsey

    2012-03-01

    Clustering can be a valuable tool for analyzing large datasets, such as in e-commerce applications. Anyone who clusters must choose how many item clusters, K, to report. Unfortunately, one must guess at K or some related parameter. Elsewhere we introduced a strongly-supported heuristic, RSQRT, which predicts K as a function of the attribute or item count, depending on attribute scales. We conducted a second analysis where we sought confirmation of the heuristic, analyzing data sets from theUCImachine learning benchmark repository. For the 25 studies where sufficient detail was available, we again found strong support. Also, in a side-by-side comparison of 28 studies, RSQRT best-predicted K and the Bayesian information criterion (BIC) predicted K are the same. RSQRT has a lower cost of O(log log n) versus O(n(2)) for BIC, and is more widely applicable. Using RSQRT prospectively could be much better than merely guessing.

  11. RSQRT: AN HEURISTIC FOR ESTIMATING THE NUMBER OF CLUSTERS TO REPORT

    PubMed Central

    Bruso, Kelsey

    2012-01-01

    Clustering can be a valuable tool for analyzing large datasets, such as in e-commerce applications. Anyone who clusters must choose how many item clusters, K, to report. Unfortunately, one must guess at K or some related parameter. Elsewhere we introduced a strongly-supported heuristic, RSQRT, which predicts K as a function of the attribute or item count, depending on attribute scales. We conducted a second analysis where we sought confirmation of the heuristic, analyzing data sets from theUCImachine learning benchmark repository. For the 25 studies where sufficient detail was available, we again found strong support. Also, in a side-by-side comparison of 28 studies, RSQRT best-predicted K and the Bayesian information criterion (BIC) predicted K are the same. RSQRT has a lower cost of O(log log n) versus O(n2) for BIC, and is more widely applicable. Using RSQRT prospectively could be much better than merely guessing. PMID:22773923

  12. Simultaneous Force Regression and Movement Classification of Fingers via Surface EMG within a Unified Bayesian Framework.

    PubMed

    Baldacchino, Tara; Jacobs, William R; Anderson, Sean R; Worden, Keith; Rowson, Jennifer

    2018-01-01

    This contribution presents a novel methodology for myolectric-based control using surface electromyographic (sEMG) signals recorded during finger movements. A multivariate Bayesian mixture of experts (MoE) model is introduced which provides a powerful method for modeling force regression at the fingertips, while also performing finger movement classification as a by-product of the modeling algorithm. Bayesian inference of the model allows uncertainties to be naturally incorporated into the model structure. This method is tested using data from the publicly released NinaPro database which consists of sEMG recordings for 6 degree-of-freedom force activations for 40 intact subjects. The results demonstrate that the MoE model achieves similar performance compared to the benchmark set by the authors of NinaPro for finger force regression. Additionally, inherent to the Bayesian framework is the inclusion of uncertainty in the model parameters, naturally providing confidence bounds on the force regression predictions. Furthermore, the integrated clustering step allows a detailed investigation into classification of the finger movements, without incurring any extra computational effort. Subsequently, a systematic approach to assessing the importance of the number of electrodes needed for accurate control is performed via sensitivity analysis techniques. A slight degradation in regression performance is observed for a reduced number of electrodes, while classification performance is unaffected.

  13. Phylogeny of sipunculan worms: A combined analysis of four gene regions and morphology.

    PubMed

    Schulze, Anja; Cutler, Edward B; Giribet, Gonzalo

    2007-01-01

    The intra-phyletic relationships of sipunculan worms were analyzed based on DNA sequence data from four gene regions and 58 morphological characters. Initially we analyzed the data under direct optimization using parsimony as optimality criterion. An implied alignment resulting from the direct optimization analysis was subsequently utilized to perform a Bayesian analysis with mixed models for the different data partitions. For this we applied a doublet model for the stem regions of the 18S rRNA. Both analyses support monophyly of Sipuncula and most of the same clades within the phylum. The analyses differ with respect to the relationships among the major groups but whereas the deep nodes in the direct optimization analysis generally show low jackknife support, they are supported by 100% posterior probability in the Bayesian analysis. Direct optimization has been useful for handling sequences of unequal length and generating conservative phylogenetic hypotheses whereas the Bayesian analysis under mixed models provided high resolution in the basal nodes of the tree.

  14. An evaluation of Bayesian techniques for controlling model complexity and selecting inputs in a neural network for short-term load forecasting.

    PubMed

    Hippert, Henrique S; Taylor, James W

    2010-04-01

    Artificial neural networks have frequently been proposed for electricity load forecasting because of their capabilities for the nonlinear modelling of large multivariate data sets. Modelling with neural networks is not an easy task though; two of the main challenges are defining the appropriate level of model complexity, and choosing the input variables. This paper evaluates techniques for automatic neural network modelling within a Bayesian framework, as applied to six samples containing daily load and weather data for four different countries. We analyse input selection as carried out by the Bayesian 'automatic relevance determination', and the usefulness of the Bayesian 'evidence' for the selection of the best structure (in terms of number of neurones), as compared to methods based on cross-validation. Copyright 2009 Elsevier Ltd. All rights reserved.

  15. Spatial clustering of average risks and risk trends in Bayesian disease mapping.

    PubMed

    Anderson, Craig; Lee, Duncan; Dean, Nema

    2017-01-01

    Spatiotemporal disease mapping focuses on estimating the spatial pattern in disease risk across a set of nonoverlapping areal units over a fixed period of time. The key aim of such research is to identify areas that have a high average level of disease risk or where disease risk is increasing over time, thus allowing public health interventions to be focused on these areas. Such aims are well suited to the statistical approach of clustering, and while much research has been done in this area in a purely spatial setting, only a handful of approaches have focused on spatiotemporal clustering of disease risk. Therefore, this paper outlines a new modeling approach for clustering spatiotemporal disease risk data, by clustering areas based on both their mean risk levels and the behavior of their temporal trends. The efficacy of the methodology is established by a simulation study, and is illustrated by a study of respiratory disease risk in Glasgow, Scotland. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. Effect of Clustering Algorithm on Establishing Markov State Model for Molecular Dynamics Simulations.

    PubMed

    Li, Yan; Dong, Zigang

    2016-06-27

    Recently, the Markov state model has been applied for kinetic analysis of molecular dynamics simulations. However, discretization of the conformational space remains a primary challenge in model building, and it is not clear how the space decomposition by distinct clustering strategies exerts influence on the model output. In this work, different clustering algorithms are employed to partition the conformational space sampled in opening and closing of fatty acid binding protein 4 as well as inactivation and activation of the epidermal growth factor receptor. Various classifications are achieved, and Markov models are set up accordingly. On the basis of the models, the total net flux and transition rate are calculated between two distinct states. Our results indicate that geometric and kinetic clustering perform equally well. The construction and outcome of Markov models are heavily dependent on the data traits. Compared to other methods, a combination of Bayesian and hierarchical clustering is feasible in identification of metastable states.

  17. Evolution of the Selfing Syndrome in Arabis alpina (Brassicaceae).

    PubMed

    Tedder, Andrew; Carleial, Samuel; Gołębiewska, Martyna; Kappel, Christian; Shimizu, Kentaro K; Stift, Marc

    2015-01-01

    The transition from cross-fertilisation (outcrossing) to self-fertilisation (selfing) frequently coincides with changes towards a floral morphology that optimises self-pollination, the selfing syndrome. Population genetic studies have reported the existence of both outcrossing and selfing populations in Arabis alpina (Brassicaceae), which is an emerging model species for studying the molecular basis of perenniality and local adaptation. It is unknown whether its selfing populations have evolved a selfing syndrome. Using macro-photography, microscopy and automated cell counting, we compared floral syndromes (size, herkogamy, pollen and ovule numbers) between three outcrossing populations from the Apuan Alps and three selfing populations from the Western and Central Alps (Maritime Alps and Dolomites). In addition, we genotyped the plants for 12 microsatellite loci to confirm previous measures of diversity and inbreeding coefficients based on allozymes, and performed Bayesian clustering. Plants from the three selfing populations had markedly smaller flowers, less herkogamy and lower pollen production than plants from the three outcrossing populations, whereas pistil length and ovule number have remained constant. Compared to allozymes, microsatellite variation was higher, but revealed similar patterns of low diversity and high Fis in selfing populations. Bayesian clustering revealed two clusters. The first cluster contained the three outcrossing populations from the Apuan Alps, the second contained the three selfing populations from the Maritime Alps and Dolomites. We conclude that in comparison to three outcrossing populations, three populations with high selfing rates are characterised by a flower morphology that is closer to the selfing syndrome. The presence of outcrossing and selfing floral syndromes within a single species will facilitate unravelling the genetic basis of the selfing syndrome, and addressing which selective forces drive its evolution.

  18. Phylogenetic inferences of Nepenthes species in Peninsular Malaysia revealed by chloroplast (trnL intron) and nuclear (ITS) DNA sequences.

    PubMed

    Bunawan, Hamidun; Yen, Choong Chee; Yaakop, Salmah; Noor, Normah Mohd

    2017-01-26

    The chloroplastic trnL intron and the nuclear internal transcribed spacer (ITS) region were sequenced for 11 Nepenthes species recorded in Peninsular Malaysia to examine their phylogenetic relationship and to evaluate the usage of trnL intron and ITS sequences for phylogenetic reconstruction of this genus. Phylogeny reconstruction was carried out using neighbor-joining, maximum parsimony and Bayesian analyses. All the trees revealed two major clusters, a lowland group consisting of N. ampullaria, N. mirabilis, N. gracilis and N. rafflesiana, and another containing both intermediately distributed species (N. albomarginata and N. benstonei) and four highland species (N. sanguinea, N. macfarlanei, N. ramispina and N. alba). The trnL intron and ITS sequences proved to provide phylogenetic informative characters for deriving a phylogeny of Nepenthes species in Peninsular Malaysia. To our knowledge, this is the first molecular phylogenetic study of Nepenthes species occurring along an altitudinal gradient in Peninsular Malaysia.

  19. Geographic and Genetic Population Differentiation of the Amazonian Chocolate Tree (Theobroma cacao L)

    PubMed Central

    Motamayor, Juan C.; Lachenaud, Philippe; da Silva e Mota, Jay Wallace; Loor, Rey; Kuhn, David N.; Brown, J. Steven; Schnell, Raymond J.

    2008-01-01

    Numerous collecting expeditions of Theobroma cacao L. germplasm have been undertaken in Latin-America. However, most of this germplasm has not contributed to cacao improvement because its relationship to cultivated selections was poorly understood. Germplasm labeling errors have impeded breeding and confounded the interpretation of diversity analyses. To improve the understanding of the origin, classification, and population differentiation within the species, 1241 accessions covering a large geographic sampling were genotyped with 106 microsatellite markers. After discarding mislabeled samples, 10 genetic clusters, as opposed to the two genetic groups traditionally recognized within T. cacao, were found by applying Bayesian statistics. This leads us to propose a new classification of the cacao germplasm that will enhance its management. The results also provide new insights into the diversification of Amazon species in general, with the pattern of differentiation of the populations studied supporting the palaeoarches hypothesis of species diversification. The origin of the traditional cacao cultivars is also enlightened in this study. PMID:18827930

  20. Geographic and genetic population differentiation of the Amazonian chocolate tree (Theobroma cacao L).

    PubMed

    Motamayor, Juan C; Lachenaud, Philippe; da Silva E Mota, Jay Wallace; Loor, Rey; Kuhn, David N; Brown, J Steven; Schnell, Raymond J

    2008-10-01

    Numerous collecting expeditions of Theobroma cacao L. germplasm have been undertaken in Latin-America. However, most of this germplasm has not contributed to cacao improvement because its relationship to cultivated selections was poorly understood. Germplasm labeling errors have impeded breeding and confounded the interpretation of diversity analyses. To improve the understanding of the origin, classification, and population differentiation within the species, 1241 accessions covering a large geographic sampling were genotyped with 106 microsatellite markers. After discarding mislabeled samples, 10 genetic clusters, as opposed to the two genetic groups traditionally recognized within T. cacao, were found by applying Bayesian statistics. This leads us to propose a new classification of the cacao germplasm that will enhance its management. The results also provide new insights into the diversification of Amazon species in general, with the pattern of differentiation of the populations studied supporting the palaeoarches hypothesis of species diversification. The origin of the traditional cacao cultivars is also enlightened in this study.

  1. Estimating the Effective Sample Size of Tree Topologies from Bayesian Phylogenetic Analyses

    PubMed Central

    Lanfear, Robert; Hua, Xia; Warren, Dan L.

    2016-01-01

    Bayesian phylogenetic analyses estimate posterior distributions of phylogenetic tree topologies and other parameters using Markov chain Monte Carlo (MCMC) methods. Before making inferences from these distributions, it is important to assess their adequacy. To this end, the effective sample size (ESS) estimates how many truly independent samples of a given parameter the output of the MCMC represents. The ESS of a parameter is frequently much lower than the number of samples taken from the MCMC because sequential samples from the chain can be non-independent due to autocorrelation. Typically, phylogeneticists use a rule of thumb that the ESS of all parameters should be greater than 200. However, we have no method to calculate an ESS of tree topology samples, despite the fact that the tree topology is often the parameter of primary interest and is almost always central to the estimation of other parameters. That is, we lack a method to determine whether we have adequately sampled one of the most important parameters in our analyses. In this study, we address this problem by developing methods to estimate the ESS for tree topologies. We combine these methods with two new diagnostic plots for assessing posterior samples of tree topologies, and compare their performance on simulated and empirical data sets. Combined, the methods we present provide new ways to assess the mixing and convergence of phylogenetic tree topologies in Bayesian MCMC analyses. PMID:27435794

  2. The evolutionary relationships and age of Homo naledi: An assessment using dated Bayesian phylogenetic methods.

    PubMed

    Dembo, Mana; Radovčić, Davorka; Garvin, Heather M; Laird, Myra F; Schroeder, Lauren; Scott, Jill E; Brophy, Juliet; Ackermann, Rebecca R; Musiba, Chares M; de Ruiter, Darryl J; Mooers, Arne Ø; Collard, Mark

    2016-08-01

    Homo naledi is a recently discovered species of fossil hominin from South Africa. A considerable amount is already known about H. naledi but some important questions remain unanswered. Here we report a study that addressed two of them: "Where does H. naledi fit in the hominin evolutionary tree?" and "How old is it?" We used a large supermatrix of craniodental characters for both early and late hominin species and Bayesian phylogenetic techniques to carry out three analyses. First, we performed a dated Bayesian analysis to generate estimates of the evolutionary relationships of fossil hominins including H. naledi. Then we employed Bayes factor tests to compare the strength of support for hypotheses about the relationships of H. naledi suggested by the best-estimate trees. Lastly, we carried out a resampling analysis to assess the accuracy of the age estimate for H. naledi yielded by the dated Bayesian analysis. The analyses strongly supported the hypothesis that H. naledi forms a clade with the other Homo species and Australopithecus sediba. The analyses were more ambiguous regarding the position of H. naledi within the (Homo, Au. sediba) clade. A number of hypotheses were rejected, but several others were not. Based on the available craniodental data, Homo antecessor, Asian Homo erectus, Homo habilis, Homo floresiensis, Homo sapiens, and Au. sediba could all be the sister taxon of H. naledi. According to the dated Bayesian analysis, the most likely age for H. naledi is 912 ka. This age estimate was supported by the resampling analysis. Our findings have a number of implications. Most notably, they support the assignment of the new specimens to Homo, cast doubt on the claim that H. naledi is simply a variant of H. erectus, and suggest H. naledi is younger than has been previously proposed. Copyright © 2016 Elsevier Ltd. All rights reserved.

  3. Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering

    DTIC Science & Technology

    2005-08-04

    describe a four-band magnetic resonance image (MRI) consisting of 23,712 pixels of a brain with a tumor 2. Because of the size of the dataset, it is not...the Royal Statistical Society, Series B 56, 363–375. Figueiredo, M. A. T. and A. K. Jain (2002). Unsupervised learning of finite mixture models. IEEE...20 5.4 Brain MRI

  4. A bayesian approach to classification criteria for spectacled eiders

    USGS Publications Warehouse

    Taylor, B.L.; Wade, P.R.; Stehn, R.A.; Cochrane, J.F.

    1996-01-01

    To facilitate decisions to classify species according to risk of extinction, we used Bayesian methods to analyze trend data for the Spectacled Eider, an arctic sea duck. Trend data from three independent surveys of the Yukon-Kuskokwim Delta were analyzed individually and in combination to yield posterior distributions for population growth rates. We used classification criteria developed by the recovery team for Spectacled Eiders that seek to equalize errors of under- or overprotecting the species. We conducted both a Bayesian decision analysis and a frequentist (classical statistical inference) decision analysis. Bayesian decision analyses are computationally easier, yield basically the same results, and yield results that are easier to explain to nonscientists. With the exception of the aerial survey analysis of the 10 most recent years, both Bayesian and frequentist methods indicated that an endangered classification is warranted. The discrepancy between surveys warrants further research. Although the trend data are abundance indices, we used a preliminary estimate of absolute abundance to demonstrate how to calculate extinction distributions using the joint probability distributions for population growth rate and variance in growth rate generated by the Bayesian analysis. Recent apparent increases in abundance highlight the need for models that apply to declining and then recovering species.

  5. Bayesian statistics as a new tool for spectral analysis - I. Application for the determination of basic parameters of massive stars

    NASA Astrophysics Data System (ADS)

    Mugnes, J.-M.; Robert, C.

    2015-11-01

    Spectral analysis is a powerful tool to investigate stellar properties and it has been widely used for decades now. However, the methods considered to perform this kind of analysis are mostly based on iteration among a few diagnostic lines to determine the stellar parameters. While these methods are often simple and fast, they can lead to errors and large uncertainties due to the required assumptions. Here, we present a method based on Bayesian statistics to find simultaneously the best combination of effective temperature, surface gravity, projected rotational velocity, and microturbulence velocity, using all the available spectral lines. Different tests are discussed to demonstrate the strength of our method, which we apply to 54 mid-resolution spectra of field and cluster B stars obtained at the Observatoire du Mont-Mégantic. We compare our results with those found in the literature. Differences are seen which are well explained by the different methods used. We conclude that the B-star microturbulence velocities are often underestimated. We also confirm the trend that B stars in clusters are on average faster rotators than field B stars.

  6. Bayesian Hierarchical Grouping: perceptual grouping as mixture estimation

    PubMed Central

    Froyen, Vicky; Feldman, Jacob; Singh, Manish

    2015-01-01

    We propose a novel framework for perceptual grouping based on the idea of mixture models, called Bayesian Hierarchical Grouping (BHG). In BHG we assume that the configuration of image elements is generated by a mixture of distinct objects, each of which generates image elements according to some generative assumptions. Grouping, in this framework, means estimating the number and the parameters of the mixture components that generated the image, including estimating which image elements are “owned” by which objects. We present a tractable implementation of the framework, based on the hierarchical clustering approach of Heller and Ghahramani (2005). We illustrate it with examples drawn from a number of classical perceptual grouping problems, including dot clustering, contour integration, and part decomposition. Our approach yields an intuitive hierarchical representation of image elements, giving an explicit decomposition of the image into mixture components, along with estimates of the probability of various candidate decompositions. We show that BHG accounts well for a diverse range of empirical data drawn from the literature. Because BHG provides a principled quantification of the plausibility of grouping interpretations over a wide range of grouping problems, we argue that it provides an appealing unifying account of the elusive Gestalt notion of Prägnanz. PMID:26322548

  7. Bayesian analysis of volcanic eruptions

    NASA Astrophysics Data System (ADS)

    Ho, Chih-Hsiang

    1990-10-01

    The simple Poisson model generally gives a good fit to many volcanoes for volcanic eruption forecasting. Nonetheless, empirical evidence suggests that volcanic activity in successive equal time-periods tends to be more variable than a simple Poisson with constant eruptive rate. An alternative model is therefore examined in which eruptive rate(λ) for a given volcano or cluster(s) of volcanoes is described by a gamma distribution (prior) rather than treated as a constant value as in the assumptions of a simple Poisson model. Bayesian analysis is performed to link two distributions together to give the aggregate behavior of the volcanic activity. When the Poisson process is expanded to accomodate a gamma mixing distribution on λ, a consequence of this mixed (or compound) Poisson model is that the frequency distribution of eruptions in any given time-period of equal length follows the negative binomial distribution (NBD). Applications of the proposed model and comparisons between the generalized model and simple Poisson model are discussed based on the historical eruptive count data of volcanoes Mauna Loa (Hawaii) and Etna (Italy). Several relevant facts lead to the conclusion that the generalized model is preferable for practical use both in space and time.

  8. Nonparametric Bayesian clustering to detect bipolar methylated genomic loci.

    PubMed

    Wu, Xiaowei; Sun, Ming-An; Zhu, Hongxiao; Xie, Hehuang

    2015-01-16

    With recent development in sequencing technology, a large number of genome-wide DNA methylation studies have generated massive amounts of bisulfite sequencing data. The analysis of DNA methylation patterns helps researchers understand epigenetic regulatory mechanisms. Highly variable methylation patterns reflect stochastic fluctuations in DNA methylation, whereas well-structured methylation patterns imply deterministic methylation events. Among these methylation patterns, bipolar patterns are important as they may originate from allele-specific methylation (ASM) or cell-specific methylation (CSM). Utilizing nonparametric Bayesian clustering followed by hypothesis testing, we have developed a novel statistical approach to identify bipolar methylated genomic regions in bisulfite sequencing data. Simulation studies demonstrate that the proposed method achieves good performance in terms of specificity and sensitivity. We used the method to analyze data from mouse brain and human blood methylomes. The bipolar methylated segments detected are found highly consistent with the differentially methylated regions identified by using purified cell subsets. Bipolar DNA methylation often indicates epigenetic heterogeneity caused by ASM or CSM. With allele-specific events filtered out or appropriately taken into account, our proposed approach sheds light on the identification of cell-specific genes/pathways under strong epigenetic control in a heterogeneous cell population.

  9. A Revised Velocity for the Globular Cluster GC-98 in the Ultra Diffuse Galaxy NGC 1052-DF2

    NASA Astrophysics Data System (ADS)

    van Dokkum, Pieter; Cohen, Yotam; Danieli, Shany; Romanowsky, Aaron; Abraham, Roberto; Brodie, Jean; Conroy, Charlie; Kruijssen, J. M. Diederik; Lokhorst, Deborah; Merritt, Allison; Mowla, Lamiya; Zhang, Jielai

    2018-06-01

    We recently published velocity measurements of luminous globular clusters in the galaxy NGC1052-DF2, concluding that it lies far off the canonical stellar mass - halo mass relation. Here we present a revised velocity for one of the globular clusters, GC-98, and a revised velocity dispersion measurement for the galaxy. We find that the intrinsic dispersion $\\sigma=5.6^{+5.2}_{-3.8}$ km/s using Approximate Bayesian Computation, or $\\sigma=7.8^{+5.2}_{-2.2}$ km/s using the likelihood. The expected dispersion from the stars alone is ~7 km/s. Responding to a request from the Editors of ApJ Letters and RNAAS, we also briefly comment on the recent analysis of our measurements by Martin et al. (2018).

  10. Predation-related costs and benefits of conspecific attraction in songbirds--an agent-based approach.

    PubMed

    Szymkowiak, Jakub; Kuczyński, Lechosław

    2015-01-01

    Songbirds that follow a conspecific attraction strategy in the habitat selection process prefer to settle in habitat patches already occupied by other individuals. This largely affects the patterns of their spatio-temporal distribution and leads to clustered breeding. Although making informed settlement decisions is expected to be beneficial for individuals, such territory clusters may potentially provide additional fitness benefits (e.g., through the dilution effect) or costs (e.g., possibly facilitating nest localization if predators respond functionally to prey distribution). Thus, we hypothesized that the fitness consequences of following a conspecific attraction strategy may largely depend on the composition of the predator community. We developed an agent-based model in which we simulated the settling behavior of birds that use a conspecific attraction strategy and breed in a multi-predator landscape with predators that exhibited different foraging strategies. Moreover, we investigated whether Bayesian updating of prior settlement decisions according to the perceived predation risk may improve the fitness of birds that rely on conspecific cues. Our results provide evidence that the fitness consequences of conspecific attraction are predation-related. We found that in landscapes dominated by predators able to respond functionally to prey distribution, clustered breeding led to fitness costs. However, this cost could be reduced if birds performed Bayesian updating of prior settlement decisions and perceived nesting with too many neighbors as a threat. Our results did not support the hypothesis that in landscapes dominated by incidental predators, clustered breeding as a byproduct of conspecific attraction provides fitness benefits through the dilution effect. We suggest that this may be due to the spatial scale of songbirds' aggregative behavior. In general, we provide evidence that when considering the fitness consequences of conspecific attraction for songbirds, one should expect a trade-off between the benefits of making informed decisions and the costs of clustering.

  11. Predation-Related Costs and Benefits of Conspecific Attraction in Songbirds—An Agent-Based Approach

    PubMed Central

    Szymkowiak, Jakub; Kuczyński, Lechosław

    2015-01-01

    Songbirds that follow a conspecific attraction strategy in the habitat selection process prefer to settle in habitat patches already occupied by other individuals. This largely affects the patterns of their spatio-temporal distribution and leads to clustered breeding. Although making informed settlement decisions is expected to be beneficial for individuals, such territory clusters may potentially provide additional fitness benefits (e.g., through the dilution effect) or costs (e.g., possibly facilitating nest localization if predators respond functionally to prey distribution). Thus, we hypothesized that the fitness consequences of following a conspecific attraction strategy may largely depend on the composition of the predator community. We developed an agent-based model in which we simulated the settling behavior of birds that use a conspecific attraction strategy and breed in a multi-predator landscape with predators that exhibited different foraging strategies. Moreover, we investigated whether Bayesian updating of prior settlement decisions according to the perceived predation risk may improve the fitness of birds that rely on conspecific cues. Our results provide evidence that the fitness consequences of conspecific attraction are predation-related. We found that in landscapes dominated by predators able to respond functionally to prey distribution, clustered breeding led to fitness costs. However, this cost could be reduced if birds performed Bayesian updating of prior settlement decisions and perceived nesting with too many neighbors as a threat. Our results did not support the hypothesis that in landscapes dominated by incidental predators, clustered breeding as a byproduct of conspecific attraction provides fitness benefits through the dilution effect. We suggest that this may be due to the spatial scale of songbirds’ aggregative behavior. In general, we provide evidence that when considering the fitness consequences of conspecific attraction for songbirds, one should expect a trade-off between the benefits of making informed decisions and the costs of clustering. PMID:25790479

  12. Analysis of Spectral-type A/B Stars in Five Open Clusters

    NASA Astrophysics Data System (ADS)

    Wilhelm, Ronald J.; Rafuil Islam, M.

    2014-01-01

    We have obtained low resolution (R = 1000) spectroscopy of N=68, spectral-type A/B stars in five nearby open star clusters using the McDonald Observatory, 2.1m telescope. The sample of blue stars in various clusters were selected to test our new technique for determining interstellar reddening and distances in areas where interstellar reddening is high. We use a Bayesian approach to find the posterior distribution for Teff, Logg and [Fe/H] from a combination of reddened, photometric colors and spectroscopic line strengths. We will present calibration results for this technique using open cluster star data with known reddening and distances. Preliminary results suggest our technique can produce both reddening and distance determinations to within 10% of cluster values. Our technique opens the possibility of determining distances for blue stars at low Galactic latitudes where extinction can be large and differential. We will also compare our stellar parameter determinations to previously reported MK spectral classifications and discuss the probability that some of our stars are not members of their reported clusters.

  13. Analysis of the population structure of Macrolophus pygmaeus (Rambur) (Hemiptera: Miridae) in the Palaearctic region using microsatellite markers

    PubMed Central

    Sanchez, Juan Antonio; Spina, Michelangelo La; Perera, Omaththage P

    2012-01-01

    Macrolophus pygmaeus (Rambur) (Hemiptera: Miridae) is widely distributed throughout the Palaearctic region. The aim was to explain the current geographic distribution of the species by investigating its genetic population structure. Samples of M. pygmaeus were collected in 15 localities through its range of distribution. A sample from a commercial producer was also analyzed. A total of 367 M. pygmaeus were genotyped for nine microsatellite loci. Isolation by distance was tested by Mantel's test. The molecular structure of M. pygmaeus populations was inferred by UPGMA, AMOVA, Principal component and Bayesian analyses. The average number of alleles per locus per population was 5.5 (range: 3.1–7.8). Istanbul (Turkey) and Nimes (France) had the lowest (0.291) and the highest (0.626) expected heterozygosity (He), respectively. There was an increase in He from the Canary Islands to Nimes, and a progressive decrease thereafter. A significant negative correlation was found between allelic richness and He, and the distance of each population to the easternmost locality (Canary Islands). Significant linkage disequilibrium was observed in the populations from Turkey. FST (0.004–0.334) indicated a high population differentiation, with isolation by distance supported by a high correlation. Bayesian analyses, PCA, and UPGMA pointed to three main clusters: (1) Greece and Turkey, (2) Italy and France, and (3) southern Iberia and the Canary Islands. The recent evolutionary history of M. pygmaeus is inferred from the data as follows: (1) the reduction in the geographic distribution of the species to the Iberian, Italian, and Balkan peninsulas, and possibly southern France, during glaciations and re-colonization of northern Europe from its southern refuges; (2) the maintenance of high diversity in Iberia and Italy (and possibly southern France) during contraction periods, and bottlenecks in the Balkans; (3) introgression of the Italian–French lineage in northern Spain, naturally or through trade. PMID:23301179

  14. Analysis of the population structure of Macrolophus pygmaeus (Rambur) (Hemiptera: Miridae) in the Palaearctic region using microsatellite markers.

    PubMed

    Sanchez, Juan Antonio; Spina, Michelangelo La; Perera, Omaththage P

    2012-12-01

    Macrolophus pygmaeus (Rambur) (Hemiptera: Miridae) is widely distributed throughout the Palaearctic region. The aim was to explain the current geographic distribution of the species by investigating its genetic population structure. Samples of M. pygmaeus were collected in 15 localities through its range of distribution. A sample from a commercial producer was also analyzed. A total of 367 M. pygmaeus were genotyped for nine microsatellite loci. Isolation by distance was tested by Mantel's test. The molecular structure of M. pygmaeus populations was inferred by UPGMA, AMOVA, Principal component and Bayesian analyses. The average number of alleles per locus per population was 5.5 (range: 3.1-7.8). Istanbul (Turkey) and Nimes (France) had the lowest (0.291) and the highest (0.626) expected heterozygosity (H(e)), respectively. There was an increase in H(e) from the Canary Islands to Nimes, and a progressive decrease thereafter. A significant negative correlation was found between allelic richness and H(e), and the distance of each population to the easternmost locality (Canary Islands). Significant linkage disequilibrium was observed in the populations from Turkey. F(ST) (0.004-0.334) indicated a high population differentiation, with isolation by distance supported by a high correlation. Bayesian analyses, PCA, and UPGMA pointed to three main clusters: (1) Greece and Turkey, (2) Italy and France, and (3) southern Iberia and the Canary Islands. The recent evolutionary history of M. pygmaeus is inferred from the data as follows: (1) the reduction in the geographic distribution of the species to the Iberian, Italian, and Balkan peninsulas, and possibly southern France, during glaciations and re-colonization of northern Europe from its southern refuges; (2) the maintenance of high diversity in Iberia and Italy (and possibly southern France) during contraction periods, and bottlenecks in the Balkans; (3) introgression of the Italian-French lineage in northern Spain, naturally or through trade.

  15. Genetic Structure in a Small Pelagic Fish Coincides with a Marine Protected Area: Seascape Genetics in Patagonian Fjords

    PubMed Central

    Ferrada-Fuentes, Sandra; Galleguillos, Ricardo; Hernández, Cristián E.

    2016-01-01

    Marine environmental variables can play an important role in promoting population genetic differentiation in marine organisms. Although fjord ecosystems have attracted much attention due to the great oscillation of environmental variables that produce heterogeneous habitats, species inhabiting this kind of ecosystem have received less attention. In this study, we used Sprattus fuegensis, a small pelagic species that populates the inner waters of the continental shelf, channels and fjords of Chilean Patagonia and Argentina, as a model species to test whether environmental variables of fjords relate to population genetic structure. A total of 282 individuals were analyzed from Chilean Patagonia with eight microsatellite loci. Bayesian and non-Bayesian analyses were conducted to describe the genetic variability of S. fuegensis and whether it shows spatial genetic structure. Results showed two well-differentiated genetic clusters along the Chilean Patagonia distribution (i.e. inside the embayment area called TicToc, and the rest of the fjords), but no spatial isolation by distance (IBD) pattern was found with a Mantel test analysis. Temperature and nitrate were correlated to the expected heterozygosities and explained the allelic frequency variation of data in the redundancy analyses. These results suggest that the singular genetic differences found in S. fuegensis from inside TicToc Bay (East of the Corcovado Gulf) are the result of larvae retention bya combination of oceanographic mesoscale processes (i.e. the west wind drift current reaches the continental shelf exactly in this zone), and the local geographical configuration (i.e. embayment area, islands, archipelagos). We propose that these features generated an isolated area in the Patagonian fjords that promoted genetic differentiation by drift and a singular biodiversity, adding support to the existence of the largest marine protected area (MPA) of continental Chile, which is the Tic-Toc MPA. PMID:27505009

  16. The Gaia-ESO Survey: open clusters in Gaia-DR1 . A way forward to stellar age calibration

    NASA Astrophysics Data System (ADS)

    Randich, S.; Tognelli, E.; Jackson, R.; Jeffries, R. D.; Degl'Innocenti, S.; Pancino, E.; Re Fiorentin, P.; Spagna, A.; Sacco, G.; Bragaglia, A.; Magrini, L.; Prada Moroni, P. G.; Alfaro, E.; Franciosini, E.; Morbidelli, L.; Roccatagliata, V.; Bouy, H.; Bravi, L.; Jiménez-Esteban, F. M.; Jordi, C.; Zari, E.; Tautvaišiene, G.; Drazdauskas, A.; Mikolaitis, S.; Gilmore, G.; Feltzing, S.; Vallenari, A.; Bensby, T.; Koposov, S.; Korn, A.; Lanzafame, A.; Smiljanic, R.; Bayo, A.; Carraro, G.; Costado, M. T.; Heiter, U.; Hourihane, A.; Jofré, P.; Lewis, J.; Monaco, L.; Prisinzano, L.; Sbordone, L.; Sousa, S. G.; Worley, C. C.; Zaggia, S.

    2018-05-01

    Context. Determination and calibration of the ages of stars, which heavily rely on stellar evolutionary models, are very challenging, while representing a crucial aspect in many astrophysical areas. Aims: We describe the methodologies that, taking advantage of Gaia-DR1 and the Gaia-ESO Survey data, enable the comparison of observed open star cluster sequences with stellar evolutionary models. The final, long-term goal is the exploitation of open clusters as age calibrators. Methods: We perform a homogeneous analysis of eight open clusters using the Gaia-DR1 TGAS catalogue for bright members and information from the Gaia-ESO Survey for fainter stars. Cluster membership probabilities for the Gaia-ESO Survey targets are derived based on several spectroscopic tracers. The Gaia-ESO Survey also provides the cluster chemical composition. We obtain cluster parallaxes using two methods. The first one relies on the astrometric selection of a sample of bona fide members, while the other one fits the parallax distribution of a larger sample of TGAS sources. Ages and reddening values are recovered through a Bayesian analysis using the 2MASS magnitudes and three sets of standard models. Lithium depletion boundary (LDB) ages are also determined using literature observations and the same models employed for the Bayesian analysis. Results: For all but one cluster, parallaxes derived by us agree with those presented in Gaia Collaboration (2017, A&A, 601, A19), while a discrepancy is found for NGC 2516; we provide evidence supporting our own determination. Inferred cluster ages are robust against models and are generally consistent with literature values. Conclusions: The systematic parallax errors inherent in the Gaia DR1 data presently limit the precision of our results. Nevertheless, we have been able to place these eight clusters onto the same age scale for the first time, with good agreement between isochronal and LDB ages where there is overlap. Our approach appears promising and demonstrates the potential of combining Gaia and ground-based spectroscopic datasets. Based on observations collected with the FLAMES instrument at VLT/UT2 telescope (Paranal Observatory, ESO, Chile), for the Gaia-ESO Large Public Spectroscopic Survey (188.B-3002, 193.B-0936).Additional tables are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/612/A99

  17. Population structure, genetic diversity and downy mildew resistance among Ocimum species germplasm.

    PubMed

    Pyne, Robert M; Honig, Josh A; Vaiciunas, Jennifer; Wyenandt, Christian A; Simon, James E

    2018-04-23

    The basil (Ocimum spp.) genus maintains a rich diversity of phenotypes and aromatic volatiles through natural and artificial outcrossing. Characterization of population structure and genetic diversity among a representative sample of this genus is severely lacking. Absence of such information has slowed breeding efforts and the development of sweet basil (Ocimum basilicum L.) with resistance to the worldwide downy mildew epidemic, caused by the obligate oomycete Peronospora belbahrii. In an effort to improve classification of relationships 20 EST-SSR markers with species-level transferability were developed and used to resolve relationships among a diverse panel of 180 Ocimum spp. accessions with varying response to downy mildew. Results obtained from nested Bayesian model-based clustering, analysis of molecular variance and unweighted pair group method using arithmetic average (UPGMA) analyses were synergized to provide an updated phylogeny of the Ocimum genus. Three (major) and seven (sub) population (cluster) models were identified and well-supported (P < 0.001) by PhiPT (Φ PT ) values of 0.433 and 0.344, respectively. Allelic frequency among clusters supported previously developed hypotheses of allopolyploid genome structure. Evidence of cryptic population structure was demonstrated for the k1 O. basilicum cluster suggesting prevalence of gene flow. UPGMA analysis provided best resolution for the 36-accession, DM resistant k3 cluster with consistently strong bootstrap support. Although the k3 cluster is a rich source of DM resistance introgression of resistance into the commercially important k1 accessions is impeded by reproductive barriers as demonstrated by multiple sterile F1 hybrids. The k2 cluster located between k1 and k3, represents a source of transferrable tolerance evidenced by fertile backcross progeny. The 90-accession k1 cluster was largely susceptible to downy mildew with accession 'MRI' representing the only source of DM resistance. High levels of genetic diversity support the observed phenotypic diversity among Ocimum spp. accessions. EST-SSRs provided a robust evaluation of molecular diversity and can be used for additional studies to increase resolution of genetic relationships in the Ocimum genus. Elucidation of population structure and genetic relationships among Ocimum spp. germplasm provide the foundation for improved DM resistance breeding strategies and more rapid response to future disease outbreaks.

  18. Phylogeography of speciation: allopatric divergence and secondary contact between outcrossing and selfing Clarkia.

    PubMed

    Pettengill, James B; Moeller, David A

    2012-09-01

    The origins of hybrid zones between parapatric taxa have been of particular interest for understanding the evolution of reproductive isolation and the geographic context of species divergence. One challenge has been to distinguish between allopatric divergence (followed by secondary contact) versus primary intergradation (parapatric speciation) as alternative divergence histories. Here, we use complementary phylogeographic and population genetic analyses to investigate the recent divergence of two subspecies of Clarkia xantiana and the formation of a hybrid zone within the narrow region of sympatry. We tested alternative phylogeographic models of divergence using approximate Bayesian computation (ABC) and found strong support for a secondary contact model and little support for a model allowing for gene flow throughout the divergence process (i.e. primary intergradation). Two independent methods for inferring the ancestral geography of each subspecies, one based on probabilistic character state reconstructions and the other on palaeo-distribution modelling, also support a model of divergence in allopatry and range expansion leading to secondary contact. The membership of individuals to genetic clusters suggests geographic substructure within each taxon where allopatric and sympatric samples are primarily found in separate clusters. We also observed coincidence and concordance of genetic clines across three types of molecular markers, which suggests that there is a strong barrier to gene flow. Taken together, our results provide evidence for allopatric divergence followed by range expansion leading to secondary contact. The location of refugial populations and the directionality of range expansion are consistent with expectations based on climate change since the last glacial maximum. Our approach also illustrates the utility of combining phylogeographic hypothesis testing with species distribution modelling and fine-scale population genetic analyses for inferring the geography of the divergence process. © 2012 Blackwell Publishing Ltd.

  19. Historical isolation and contemporary gene flow drive population diversity of the brown alga Sargassum thunbergii along the coast of China.

    PubMed

    Li, Jing-Jing; Hu, Zi-Min; Sun, Zhong-Min; Yao, Jian-Ting; Liu, Fu-Li; Fresia, Pablo; Duan, De-Lin

    2017-12-07

    Long-term survival in isolated marginal seas of the China coast during the late Pleistocene ice ages is widely believed to be an important historical factor contributing to population genetic structure in coastal marine species. Whether or not contemporary factors (e.g. long-distance dispersal via coastal currents) continue to shape diversity gradients in marine organisms with high dispersal capability remains poorly understood. Our aim was to explore how historical and contemporary factors influenced the genetic diversity and distribution of the brown alga Sargassum thunbergii, which can drift on surface water, leading to long-distance dispersal. We used 11 microsatellites and the plastid RuBisCo spacer to evaluate the genetic diversity of 22 Sargassum thunbergii populations sampled along the China coast. Population structure and differentiation was inferred based on genotype clustering and pairwise F ST and allele-frequency analyses. Integrated genetic analyses revealed two genetic clusters in S. thunbergii that dominated in the Yellow-Bohai Sea (YBS) and East China Sea (ECS) respectively. Higher levels of genetic diversity and variation were detected among populations in the YBS than in the ECS. Bayesian coalescent theory was used to estimate contemporary and historical gene flow. High levels of contemporary gene flow were detected from the YBS (north) to the ECS (south), whereas low levels of historical gene flow occurred between the two regions. Our results suggest that the deep genetic divergence in S. thunbergii along the China coast may result from long-term geographic isolation during glacial periods. The dispersal of S. thunbergii driven by coastal currents may facilitate the admixture between southern and northern regimes. Our findings exemplify how both historical and contemporary forces are needed to understand phylogeographical patterns in coastal marine species with long-distance dispersal.

  20. Phylogenetic Analyses: A Toolbox Expanding towards Bayesian Methods

    PubMed Central

    Aris-Brosou, Stéphane; Xia, Xuhua

    2008-01-01

    The reconstruction of phylogenies is becoming an increasingly simple activity. This is mainly due to two reasons: the democratization of computing power and the increased availability of sophisticated yet user-friendly software. This review describes some of the latest additions to the phylogenetic toolbox, along with some of their theoretical and practical limitations. It is shown that Bayesian methods are under heavy development, as they offer the possibility to solve a number of long-standing issues and to integrate several steps of the phylogenetic analyses into a single framework. Specific topics include not only phylogenetic reconstruction, but also the comparison of phylogenies, the detection of adaptive evolution, and the estimation of divergence times between species. PMID:18483574

  1. Spatiotemporal dynamics of HIV-1 transmission in France (1999-2014) and impact of targeted prevention strategies.

    PubMed

    Chaillon, Antoine; Essat, Asma; Frange, Pierre; Smith, Davey M; Delaugerre, Constance; Barin, Francis; Ghosn, Jade; Pialoux, Gilles; Robineau, Olivier; Rouzioux, Christine; Goujard, Cécile; Meyer, Laurence; Chaix, Marie-Laure

    2017-02-21

    Characterizing HIV-1 transmission networks can be important in understanding the evolutionary patterns and geospatial spread of the epidemic. We reconstructed the broad molecular epidemiology of HIV from individuals with primary HIV-1 infection (PHI) enrolled in France in the ANRS PRIMO C06 cohort over 15 years. Sociodemographic, geographic, clinical, biological and pol sequence data from 1356 patients were collected between 1999 and 2014. Network analysis was performed to infer genetic relationships, i.e. clusters of transmission, between HIV-1 sequences. Bayesian coalescent-based methods were used to examine the temporal and spatial dynamics of identified clusters from different regions in France. We also evaluated the use of network information to target prevention efforts. Participants were mostly Caucasian (85.9%) and men (86.7%) who reported sex with men (MSM, 71.4%). Overall, 387 individuals (28.5%) were involved in clusters: 156 patients (11.5%) in 78 dyads and 231 participants (17%) in 42 larger clusters (median size: 4, range 3-41). Compared to individuals with single PHI (n = 969), those in clusters were more frequently men (95.9 vs 83%, p < 0.01), MSM (85.8 vs 65.6%, p < 0.01) and infected with CRF02_AG (20.4 vs 13.4%, p < 0.01). Reconstruction of viral migrations across time suggests that Paris area was the major hub of dissemination of both subtype B and CRF02_AG epidemics. By targeting clustering individuals belonging to the identified active transmission network before 2010, 60 of the 143 onward transmissions could have been prevented. These analyses support the hypothesis of a recent and rapid rise of CRF02_AG within the French HIV-1 epidemic among MSM. Combined with a short turnaround time for sample processing, targeting prevention efforts based on phylogenetic monitoring may be an efficient way to deliver prevention interventions but would require near real time targeted interventions on the identified index cases and their partners.

  2. Bayesian sensitivity analysis methods to evaluate bias due to misclassification and missing data using informative priors and external validation data.

    PubMed

    Luta, George; Ford, Melissa B; Bondy, Melissa; Shields, Peter G; Stamey, James D

    2013-04-01

    Recent research suggests that the Bayesian paradigm may be useful for modeling biases in epidemiological studies, such as those due to misclassification and missing data. We used Bayesian methods to perform sensitivity analyses for assessing the robustness of study findings to the potential effect of these two important sources of bias. We used data from a study of the joint associations of radiotherapy and smoking with primary lung cancer among breast cancer survivors. We used Bayesian methods to provide an operational way to combine both validation data and expert opinion to account for misclassification of the two risk factors and missing data. For comparative purposes we considered a "full model" that allowed for both misclassification and missing data, along with alternative models that considered only misclassification or missing data, and the naïve model that ignored both sources of bias. We identified noticeable differences between the four models with respect to the posterior distributions of the odds ratios that described the joint associations of radiotherapy and smoking with primary lung cancer. Despite those differences we found that the general conclusions regarding the pattern of associations were the same regardless of the model used. Overall our results indicate a nonsignificantly decreased lung cancer risk due to radiotherapy among nonsmokers, and a mildly increased risk among smokers. We described easy to implement Bayesian methods to perform sensitivity analyses for assessing the robustness of study findings to misclassification and missing data. Copyright © 2012 Elsevier Ltd. All rights reserved.

  3. A Bayesian hierarchical diffusion model decomposition of performance in Approach–Avoidance Tasks

    PubMed Central

    Krypotos, Angelos-Miltiadis; Beckers, Tom; Kindt, Merel; Wagenmakers, Eric-Jan

    2015-01-01

    Common methods for analysing response time (RT) tasks, frequently used across different disciplines of psychology, suffer from a number of limitations such as the failure to directly measure the underlying latent processes of interest and the inability to take into account the uncertainty associated with each individual's point estimate of performance. Here, we discuss a Bayesian hierarchical diffusion model and apply it to RT data. This model allows researchers to decompose performance into meaningful psychological processes and to account optimally for individual differences and commonalities, even with relatively sparse data. We highlight the advantages of the Bayesian hierarchical diffusion model decomposition by applying it to performance on Approach–Avoidance Tasks, widely used in the emotion and psychopathology literature. Model fits for two experimental data-sets demonstrate that the model performs well. The Bayesian hierarchical diffusion model overcomes important limitations of current analysis procedures and provides deeper insight in latent psychological processes of interest. PMID:25491372

  4. Calibrating the Planck cluster mass scale with CLASH

    NASA Astrophysics Data System (ADS)

    Penna-Lima, M.; Bartlett, J. G.; Rozo, E.; Melin, J.-B.; Merten, J.; Evrard, A. E.; Postman, M.; Rykoff, E.

    2017-08-01

    We determine the mass scale of Planck galaxy clusters using gravitational lensing mass measurements from the Cluster Lensing And Supernova survey with Hubble (CLASH). We have compared the lensing masses to the Planck Sunyaev-Zeldovich (SZ) mass proxy for 21 clusters in common, employing a Bayesian analysis to simultaneously fit an idealized CLASH selection function and the distribution between the measured observables and true cluster mass. We used a tiered analysis strategy to explicitly demonstrate the importance of priors on weak lensing mass accuracy. In the case of an assumed constant bias, bSZ, between true cluster mass, M500, and the Planck mass proxy, MPL, our analysis constrains 1-bSZ = 0.73 ± 0.10 when moderate priors on weak lensing accuracy are used, including a zero-mean Gaussian with standard deviation of 8% to account for possible bias in lensing mass estimations. Our analysis explicitly accounts for possible selection bias effects in this calibration sourced by the CLASH selection function. Our constraint on the cluster mass scale is consistent with recent results from the Weighing the Giants program and the Canadian Cluster Comparison Project. It is also consistent, at 1.34σ, with the value needed to reconcile the Planck SZ cluster counts with Planck's base ΛCDM model fit to the primary cosmic microwave background anisotropies.

  5. Self-optimized construction of transition rate matrices from accelerated atomistic simulations with Bayesian uncertainty quantification

    NASA Astrophysics Data System (ADS)

    Swinburne, Thomas D.; Perez, Danny

    2018-05-01

    A massively parallel method to build large transition rate matrices from temperature-accelerated molecular dynamics trajectories is presented. Bayesian Markov model analysis is used to estimate the expected residence time in the known state space, providing crucial uncertainty quantification for higher-scale simulation schemes such as kinetic Monte Carlo or cluster dynamics. The estimators are additionally used to optimize where exploration is performed and the degree of temperature acceleration on the fly, giving an autonomous, optimal procedure to explore the state space of complex systems. The method is tested against exactly solvable models and used to explore the dynamics of C15 interstitial defects in iron. Our uncertainty quantification scheme allows for accurate modeling of the evolution of these defects over timescales of several seconds.

  6. Data set for phylogenetic tree and RAMPAGE Ramachandran plot analysis of SODs in Gossypium raimondii and G. arboreum.

    PubMed

    Wang, Wei; Xia, Minxuan; Chen, Jie; Deng, Fenni; Yuan, Rui; Zhang, Xiaopei; Shen, Fafu

    2016-12-01

    The data presented in this paper is supporting the research article "Genome-Wide Analysis of Superoxide Dismutase Gene Family in Gossypium raimondii and G. arboreum" [1]. In this data article, we present phylogenetic tree showing dichotomy with two different clusters of SODs inferred by the Bayesian method of MrBayes (version 3.2.4), "Bayesian phylogenetic inference under mixed models" [2], Ramachandran plots of G. raimondii and G. arboreum SODs, the protein sequence used to generate 3D sructure of proteins and the template accession via SWISS-MODEL server, "SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information." [3] and motif sequences of SODs identified by InterProScan (version 4.8) with the Pfam database, "Pfam: the protein families database" [4].

  7. Inference on cancer screening exam accuracy using population-level administrative data.

    PubMed

    Jiang, H; Brown, P E; Walter, S D

    2016-01-15

    This paper develops a model for cancer screening and cancer incidence data, accommodating the partially unobserved disease status, clustered data structures, general covariate effects, and dependence between exams. The true unobserved cancer and detection status of screening participants are treated as latent variables, and a Markov Chain Monte Carlo algorithm is used to estimate the Bayesian posterior distributions of the diagnostic error rates and disease prevalence. We show how the Bayesian approach can be used to draw inferences about screening exam properties and disease prevalence while allowing for the possibility of conditional dependence between two exams. The techniques are applied to the estimation of the diagnostic accuracy of mammography and clinical breast examination using data from the Ontario Breast Screening Program in Canada. Copyright © 2015 John Wiley & Sons, Ltd.

  8. Bayesian B-spline mapping for dynamic quantitative traits.

    PubMed

    Xing, Jun; Li, Jiahan; Yang, Runqing; Zhou, Xiaojing; Xu, Shizhong

    2012-04-01

    Owing to their ability and flexibility to describe individual gene expression at different time points, random regression (RR) analyses have become a popular procedure for the genetic analysis of dynamic traits whose phenotypes are collected over time. Specifically, when modelling the dynamic patterns of gene expressions in the RR framework, B-splines have been proved successful as an alternative to orthogonal polynomials. In the so-called Bayesian B-spline quantitative trait locus (QTL) mapping, B-splines are used to characterize the patterns of QTL effects and individual-specific time-dependent environmental errors over time, and the Bayesian shrinkage estimation method is employed to estimate model parameters. Extensive simulations demonstrate that (1) in terms of statistical power, Bayesian B-spline mapping outperforms the interval mapping based on the maximum likelihood; (2) for the simulated dataset with complicated growth curve simulated by B-splines, Legendre polynomial-based Bayesian mapping is not capable of identifying the designed QTLs accurately, even when higher-order Legendre polynomials are considered and (3) for the simulated dataset using Legendre polynomials, the Bayesian B-spline mapping can find the same QTLs as those identified by Legendre polynomial analysis. All simulation results support the necessity and flexibility of B-spline in Bayesian mapping of dynamic traits. The proposed method is also applied to a real dataset, where QTLs controlling the growth trajectory of stem diameters in Populus are located.

  9. Analysis of 41 plant genomes supports a wave of successful genome duplications in association with the Cretaceous–Paleogene boundary

    PubMed Central

    Vanneste, Kevin; Baele, Guy; Maere, Steven; Van de Peer, Yves

    2014-01-01

    Ancient whole-genome duplications (WGDs), also referred to as paleopolyploidizations, have been reported in most evolutionary lineages. Their attributed role remains a major topic of discussion, ranging from an evolutionary dead end to a road toward evolutionary success, with evidence supporting both fates. Previously, based on dating WGDs in a limited number of plant species, we found a clustering of angiosperm paleopolyploidizations around the Cretaceous–Paleogene (K–Pg) extinction event about 66 million years ago. Here we revisit this finding, which has proven controversial, by combining genome sequence information for many more plant lineages and using more sophisticated analyses. We include 38 full genome sequences and three transcriptome assemblies in a Bayesian evolutionary analysis framework that incorporates uncorrelated relaxed clock methods and fossil uncertainty. In accordance with earlier findings, we demonstrate a strongly nonrandom pattern of genome duplications over time with many WGDs clustering around the K–Pg boundary. We interpret these results in the context of recent studies on invasive polyploid plant species, and suggest that polyploid establishment is promoted during times of environmental stress. We argue that considering the evolutionary potential of polyploids in light of the environmental and ecological conditions present around the time of polyploidization could mitigate the stark contrast in the proposed evolutionary fates of polyploids. PMID:24835588

  10. Analysis of the genetic diversity and structure across a wide range of germplasm reveals prominent gene flow in apple at the European level.

    PubMed

    Urrestarazu, Jorge; Denancé, Caroline; Ravon, Elisa; Guyader, Arnaud; Guisnel, Rémi; Feugey, Laurence; Poncet, Charles; Lateur, Marc; Houben, Patrick; Ordidge, Matthew; Fernandez-Fernandez, Felicidad; Evans, Kate M; Paprstein, Frantisek; Sedlak, Jiri; Nybom, Hilde; Garkava-Gustavsson, Larisa; Miranda, Carlos; Gassmann, Jennifer; Kellerhals, Markus; Suprun, Ivan; Pikunova, Anna V; Krasova, Nina G; Torutaeva, Elnura; Dondini, Luca; Tartarini, Stefano; Laurens, François; Durel, Charles-Eric

    2016-06-08

    The amount and structure of genetic diversity in dessert apple germplasm conserved at a European level is mostly unknown, since all diversity studies conducted in Europe until now have been performed on regional or national collections. Here, we applied a common set of 16 SSR markers to genotype more than 2,400 accessions across 14 collections representing three broad European geographic regions (North + East, West and South) with the aim to analyze the extent, distribution and structure of variation in the apple genetic resources in Europe. A Bayesian model-based clustering approach showed that diversity was organized in three groups, although these were only moderately differentiated (FST = 0.031). A nested Bayesian clustering approach allowed identification of subgroups which revealed internal patterns of substructure within the groups, allowing a finer delineation of the variation into eight subgroups (FST = 0.044). The first level of stratification revealed an asymmetric division of the germplasm among the three groups, and a clear association was found with the geographical regions of origin of the cultivars. The substructure revealed clear partitioning of genetic groups among countries, but also interesting associations between subgroups and breeding purposes of recent cultivars or particular usage such as cider production. Additional parentage analyses allowed us to identify both putative parents of more than 40 old and/or local cultivars giving interesting insights in the pedigree of some emblematic cultivars. The variation found at group and subgroup levels may reflect a combination of historical processes of migration/selection and adaptive factors to diverse agricultural environments that, together with genetic drift, have resulted in extensive genetic variation but limited population structure. The European dessert apple germplasm represents an important source of genetic diversity with a strong historical and patrimonial value. The present work thus constitutes a decisive step in the field of conservation genetics. Moreover, the obtained data can be used for defining a European apple core collection useful for further identification of genomic regions associated with commercially important horticultural traits in apple through genome-wide association studies.

  11. BEASTling: A software tool for linguistic phylogenetics using BEAST 2

    PubMed Central

    Forkel, Robert; Kaiping, Gereon A.; Atkinson, Quentin D.

    2017-01-01

    We present a new open source software tool called BEASTling, designed to simplify the preparation of Bayesian phylogenetic analyses of linguistic data using the BEAST 2 platform. BEASTling transforms comparatively short and human-readable configuration files into the XML files used by BEAST to specify analyses. By taking advantage of Creative Commons-licensed data from the Glottolog language catalog, BEASTling allows the user to conveniently filter datasets using names for recognised language families, to impose monophyly constraints so that inferred language trees are backward compatible with Glottolog classifications, or to assign geographic location data to languages for phylogeographic analyses. Support for the emerging cross-linguistic linked data format (CLDF) permits easy incorporation of data published in cross-linguistic linked databases into analyses. BEASTling is intended to make the power of Bayesian analysis more accessible to historical linguists without strong programming backgrounds, in the hopes of encouraging communication and collaboration between those developing computational models of language evolution (who are typically not linguists) and relevant domain experts. PMID:28796784

  12. BEASTling: A software tool for linguistic phylogenetics using BEAST 2.

    PubMed

    Maurits, Luke; Forkel, Robert; Kaiping, Gereon A; Atkinson, Quentin D

    2017-01-01

    We present a new open source software tool called BEASTling, designed to simplify the preparation of Bayesian phylogenetic analyses of linguistic data using the BEAST 2 platform. BEASTling transforms comparatively short and human-readable configuration files into the XML files used by BEAST to specify analyses. By taking advantage of Creative Commons-licensed data from the Glottolog language catalog, BEASTling allows the user to conveniently filter datasets using names for recognised language families, to impose monophyly constraints so that inferred language trees are backward compatible with Glottolog classifications, or to assign geographic location data to languages for phylogeographic analyses. Support for the emerging cross-linguistic linked data format (CLDF) permits easy incorporation of data published in cross-linguistic linked databases into analyses. BEASTling is intended to make the power of Bayesian analysis more accessible to historical linguists without strong programming backgrounds, in the hopes of encouraging communication and collaboration between those developing computational models of language evolution (who are typically not linguists) and relevant domain experts.

  13. Phylogenetic relationships of South American lizards of the genus Stenocercus (Squamata: Iguania): A new approach using a general mixture model for gene sequence data.

    PubMed

    Torres-Carvajal, Omar; Schulte, James A; Cadle, John E

    2006-04-01

    The South American iguanian lizard genus Stenocercus includes 54 species occurring mostly in the Andes and adjacent lowland areas from northern Venezuela and Colombia to central Argentina at elevations of 0-4000m. Small taxon or character sampling has characterized all phylogenetic analyses of Stenocercus, which has long been recognized as sister taxon to the Tropidurus Group. In this study, we use mtDNA sequence data to perform phylogenetic analyses that include 32 species of Stenocercus and 12 outgroup taxa. Monophyly of this genus is strongly supported by maximum parsimony and Bayesian analyses. Evolutionary relationships within Stenocercus are further analyzed with a Bayesian implementation of a general mixture model, which accommodates variability in the pattern of evolution across sites. These analyses indicate a basal split of Stenocercus into two clades, one of which receives very strong statistical support. In addition, we test previous hypotheses using non-parametric and parametric statistical methods, and provide a phylogenetic classification for Stenocercus.

  14. Unravelling the genetic differentiation among varieties of the Neotropical savanna tree Hancornia speciosa Gomes.

    PubMed

    Collevatti, Rosane G; Rodrigues, Eduardo E; Vitorino, Luciana C; Lima-Ribeiro, Matheus S; Chaves, Lázaro J; Telles, Mariana P C

    2018-04-20

    Spatial distribution of species genetic diversity is often driven by geographical distance (isolation by distance) or environmental conditions (isolation by environment), especially under climate change scenarios such as Quaternary glaciations. Here, we used coalescent analyses coupled with ecological niche modelling (ENM), spatially explicit quantile regression analyses and the multiple matrix regression with randomization (MMRR) approach to unravel the patterns of genetic differentiation in the widely distributed Neotropical savanna tree, Hancornia speciosa (Apocynaceae). Due to its high morphological differentiation, the species was originally classified into six botanical varieties by Monachino, and has recently been recognized as only two varieties by Flora do Brasil 2020. Thus, H. speciosa is a good biological model for learning about evolution of phenotypic plasticity under genetic and ecological effects, and predicting their responses to changing environmental conditions. We sampled 28 populations (777 individuals) of Monachino's four varieties of H. speciosa and used seven microsatellite loci to genotype them. Bayesian clustering showed five distinct genetic groups (K = 5) with high admixture among Monachino's varieties, mainly among populations in the central area of the species geographical range. Genetic differentiation among Monachino's varieties was lower than the genetic differentiation among populations within varieties, with higher within-population inbreeding. A high historical connectivity among populations of the central Cerrado shown by coalescent analyses may explain the high admixture among varieties. In addition, areas of higher climatic suitability also presented higher genetic diversity in such a way that the wide historical refugium across central Brazil might have promoted the long-term connectivity among populations. Yet, FST was significantly related to geographic distances, but not to environmental distances, and coalescent analyses and ENM predicted a demographical scenario of quasi-stability through time. Our findings show that demographical history and isolation by distance, but not isolation by environment, drove genetic differentiation of populations. Finally, the genetic clusters do not support the two recently recognized botanical varieties of H. speciosa, but partially support Monachino's classification at least for the four sampled varieties, similar to morphological variation.

  15. Deep phylogeographic divergence and cytonuclear discordance in the grasshopper Oedaleus decorus.

    PubMed

    Kindler, Eveline; Arlettaz, Raphaël; Heckel, Gerald

    2012-11-01

    The grasshopper Oedaleus decorus is a thermophilic insect with a large, mostly south-Palaearctic distribution range, stretching from the Mediterranean regions in Europe to Central-Asia and China. In this study, we analyzed the extent of phylogenetic divergence and the recent evolutionary history of the species based on 274 specimens from 26 localities across the distribution range in Europe. Phylogenetic relationships were determined using sequences of two mitochondrial loci (ctr, ND2) with neighbour-joining and Bayesian methods. Additionally, genetic differentiation was analyzed based on mitochondrial DNA and 11 microsatellite markers using F-statistics, model-free multivariate and model-based Bayesian clustering approaches. Phylogenetic analyses detected consistently two highly divergent, allopatrically distributed lineages within O. decorus. The divergence among these Western and Eastern lineages meeting in the region of the Alps was similar to the divergence of each lineage to the sister species O. asiaticus. Genetic differentiation for ctr was extremely high between Western and Eastern grasshopper populations (F(ct)=0.95). Microsatellite markers detected much lower but nevertheless very significant genetic structure among population samples. The nuclear data also demonstrated a case of cytonuclear discordance because the affiliation with mitochondrial lineages was incongruent in Northern Italy. Taken together these results provide evidence of an ancient separation within Oedaleus and either historical introgression of mtDNA among lineages and/or ongoing sex-specific gene flow in this grasshopper. Our study stresses the importance of multilocus approaches for unravelling the history and status of taxa of uncertain evolutionary divergence. Copyright © 2012 Elsevier Inc. All rights reserved.

  16. Tackling an intractable problem: Can greater taxon sampling help resolve relationships within the Stenopelmatoidea (Orthoptera: Ensifera)?

    USGS Publications Warehouse

    Vandergast, Amy; Weissman, David B; Wood, Dustin; Rentz, David C F; Bazelet, Corinna S; Ueshima, Norihiro

    2017-01-01

    The relationships among and within the families that comprise the orthopteran superfamily Stenopelmatoidea (suborder Ensifera) remain poorly understood. We developed a phylogenetic hypothesis based on Bayesian analysis of two nuclear ribosomal and one mitochondrial gene for 118 individuals (84 de novo and 34 from GenBank). These included Gryllacrididae from North, Central, and South America, South Africa and Madagascar, Australia and Papua New Guinea; Stenopelmatidae from North and Central America and South Africa; Anostostomatidae from North and Central America, Papua New Guinea, New Zealand, Australia, and South Africa; members of the Australian endemic Cooloola (three species); and a representative of Lezina from the Middle East. We also included representatives of all other major ensiferan families: Prophalangopsidae, Rhaphidophoridae, Schizodactylidae, Tettigoniidae, Gryllidae, Gryllotalpidae and Myrmecophilidae and representatives of the suborder Caelifera as outgroups. Bayesian analyses of concatenated sequence data supported a clade of Stenopelmatoidea inclusive of all analyzed members of Gryllacrididae, Stenopelmatidae, Anostostomatidae, Lezina and Cooloola. We found Gryllacrididae worldwide to be monophyletic, while we did not recover a monophyletic Stenopelmatidae nor Anostostomatidae. Australian Cooloola clustered in a clade composed of Australian, New Zealand, and some (but not all) North American Anostostomatidae. Lezina was included in a clade of New World Anostostomatidae. Finally, we compiled and compared karyotypes and sound production characteristics for each supported group. Chromosome number, centromere position, drumming, and stridulation differed among some groups, but also show variation within groups. This preliminary trait information may contribute toward future studies of trait evolution. Despite greater taxon sampling within Stenopelmatoidea than previous efforts, some relationships among the families examined continue to remain elusive.

  17. Spatial variation in anthropogenic mortality induces a source-sink system in a hunted mesopredator.

    PubMed

    Minnie, Liaan; Zalewski, Andrzej; Zalewska, Hanna; Kerley, Graham I H

    2018-04-01

    Lethal carnivore management is a prevailing strategy to reduce livestock predation. Intensity of lethal management varies according to land-use, where carnivores are more intensively hunted on farms relative to reserves. Variations in hunting intensity may result in the formation of a source-sink system where carnivores disperse from high-density to low-density areas. Few studies quantify dispersal between supposed sources and sinks-a fundamental requirement for source-sink systems. We used the black-backed jackal (Canis mesomelas) as a model to determine if heterogeneous anthropogenic mortality induces a source-sink system. We analysed 12 microsatellite loci from 554 individuals from lightly hunted and previously unhunted reserves, as well as heavily hunted livestock- and game farms. Bayesian genotype assignment showed that jackal populations displayed a hierarchical population structure. We identified two genetically distinct populations at the regional level and nine distinct subpopulations at the local level, with each cluster corresponding to distinct land-use types separated by various dispersal barriers. Migration, estimated using Bayesian multilocus genotyping, between reserves and farms was asymmetric and heterogeneous anthropogenic mortality induced source-sink dynamics via compensatory immigration. Additionally some heavily hunted populations also acted as source populations, exporting individuals to other heavily hunted populations. This indicates that heterogeneous anthropogenic mortality results in the formation of a complex series of interconnected sources and sinks. Thus, lethal management of mesopredators may not be an effective long-term strategy in reducing livestock predation, as dispersal and, more importantly, compensatory immigration may continue to affect population reduction efforts as long as dispersal from other areas persists.

  18. A Systematic Bayesian Integration of Epidemiological and Genetic Data

    PubMed Central

    Lau, Max S. Y.; Marion, Glenn; Streftaris, George; Gibson, Gavin

    2015-01-01

    Genetic sequence data on pathogens have great potential to inform inference of their transmission dynamics ultimately leading to better disease control. Where genetic change and disease transmission occur on comparable timescales additional information can be inferred via the joint analysis of such genetic sequence data and epidemiological observations based on clinical symptoms and diagnostic tests. Although recently introduced approaches represent substantial progress, for computational reasons they approximate genuine joint inference of disease dynamics and genetic change in the pathogen population, capturing partially the joint epidemiological-evolutionary dynamics. Improved methods are needed to fully integrate such genetic data with epidemiological observations, for achieving a more robust inference of the transmission tree and other key epidemiological parameters such as latent periods. Here, building on current literature, a novel Bayesian framework is proposed that infers simultaneously and explicitly the transmission tree and unobserved transmitted pathogen sequences. Our framework facilitates the use of realistic likelihood functions and enables systematic and genuine joint inference of the epidemiological-evolutionary process from partially observed outbreaks. Using simulated data it is shown that this approach is able to infer accurately joint epidemiological-evolutionary dynamics, even when pathogen sequences and epidemiological data are incomplete, and when sequences are available for only a fraction of exposures. These results also characterise and quantify the value of incomplete and partial sequence data, which has important implications for sampling design, and demonstrate the abilities of the introduced method to identify multiple clusters within an outbreak. The framework is used to analyse an outbreak of foot-and-mouth disease in the UK, enhancing current understanding of its transmission dynamics and evolutionary process. PMID:26599399

  19. A Bayesian network approach to the database search problem in criminal proceedings

    PubMed Central

    2012-01-01

    Background The ‘database search problem’, that is, the strengthening of a case - in terms of probative value - against an individual who is found as a result of a database search, has been approached during the last two decades with substantial mathematical analyses, accompanied by lively debate and centrally opposing conclusions. This represents a challenging obstacle in teaching but also hinders a balanced and coherent discussion of the topic within the wider scientific and legal community. This paper revisits and tracks the associated mathematical analyses in terms of Bayesian networks. Their derivation and discussion for capturing probabilistic arguments that explain the database search problem are outlined in detail. The resulting Bayesian networks offer a distinct view on the main debated issues, along with further clarity. Methods As a general framework for representing and analyzing formal arguments in probabilistic reasoning about uncertain target propositions (that is, whether or not a given individual is the source of a crime stain), this paper relies on graphical probability models, in particular, Bayesian networks. This graphical probability modeling approach is used to capture, within a single model, a series of key variables, such as the number of individuals in a database, the size of the population of potential crime stain sources, and the rarity of the corresponding analytical characteristics in a relevant population. Results This paper demonstrates the feasibility of deriving Bayesian network structures for analyzing, representing, and tracking the database search problem. The output of the proposed models can be shown to agree with existing but exclusively formulaic approaches. Conclusions The proposed Bayesian networks allow one to capture and analyze the currently most well-supported but reputedly counter-intuitive and difficult solution to the database search problem in a way that goes beyond the traditional, purely formulaic expressions. The method’s graphical environment, along with its computational and probabilistic architectures, represents a rich package that offers analysts and discussants with additional modes of interaction, concise representation, and coherent communication. PMID:22849390

  20. Evaluation of a Partial Genome Screening of Two Asthma Susceptibility Regions Using Bayesian Network Based Bayesian Multilevel Analysis of Relevance

    PubMed Central

    Antal, Péter; Kiszel, Petra Sz.; Gézsi, András; Hadadi, Éva; Virág, Viktor; Hajós, Gergely; Millinghoffer, András; Nagy, Adrienne; Kiss, András; Semsei, Ágnes F.; Temesi, Gergely; Melegh, Béla; Kisfali, Péter; Széll, Márta; Bikov, András; Gálffy, Gabriella; Tamási, Lilla; Falus, András; Szalai, Csaba

    2012-01-01

    Genetic studies indicate high number of potential factors related to asthma. Based on earlier linkage analyses we selected the 11q13 and 14q22 asthma susceptibility regions, for which we designed a partial genome screening study using 145 SNPs in 1201 individuals (436 asthmatic children and 765 controls). The results were evaluated with traditional frequentist methods and we applied a new statistical method, called Bayesian network based Bayesian multilevel analysis of relevance (BN-BMLA). This method uses Bayesian network representation to provide detailed characterization of the relevance of factors, such as joint significance, the type of dependency, and multi-target aspects. We estimated posteriors for these relations within the Bayesian statistical framework, in order to estimate the posteriors whether a variable is directly relevant or its association is only mediated. With frequentist methods one SNP (rs3751464 in the FRMD6 gene) provided evidence for an association with asthma (OR = 1.43(1.2–1.8); p = 3×10−4). The possible role of the FRMD6 gene in asthma was also confirmed in an animal model and human asthmatics. In the BN-BMLA analysis altogether 5 SNPs in 4 genes were found relevant in connection with asthma phenotype: PRPF19 on chromosome 11, and FRMD6, PTGER2 and PTGDR on chromosome 14. In a subsequent step a partial dataset containing rhinitis and further clinical parameters was used, which allowed the analysis of relevance of SNPs for asthma and multiple targets. These analyses suggested that SNPs in the AHNAK and MS4A2 genes were indirectly associated with asthma. This paper indicates that BN-BMLA explores the relevant factors more comprehensively than traditional statistical methods and extends the scope of strong relevance based methods to include partial relevance, global characterization of relevance and multi-target relevance. PMID:22432035

  1. BAYESIAN LARGE-SCALE MULTIPLE REGRESSION WITH SUMMARY STATISTICS FROM GENOME-WIDE ASSOCIATION STUDIES1

    PubMed Central

    Zhu, Xiang; Stephens, Matthew

    2017-01-01

    Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors, they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a “Regression with Summary Statistics” (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously unreported loci that show evidence for association with height in our analyses. Software is available at https://github.com/stephenslab/rss. PMID:29399241

  2. Deep Learning Neural Networks and Bayesian Neural Networks in Data Analysis

    NASA Astrophysics Data System (ADS)

    Chernoded, Andrey; Dudko, Lev; Myagkov, Igor; Volkov, Petr

    2017-10-01

    Most of the modern analyses in high energy physics use signal-versus-background classification techniques of machine learning methods and neural networks in particular. Deep learning neural network is the most promising modern technique to separate signal and background and now days can be widely and successfully implemented as a part of physical analysis. In this article we compare Deep learning and Bayesian neural networks application as a classifiers in an instance of top quark analysis.

  3. A Bayesian Approach to the Overlap Analysis of Epidemiologically Linked Traits.

    PubMed

    Asimit, Jennifer L; Panoutsopoulou, Kalliope; Wheeler, Eleanor; Berndt, Sonja I; Cordell, Heather J; Morris, Andrew P; Zeggini, Eleftheria; Barroso, Inês

    2015-12-01

    Diseases often cooccur in individuals more often than expected by chance, and may be explained by shared underlying genetic etiology. A common approach to genetic overlap analyses is to use summary genome-wide association study data to identify single-nucleotide polymorphisms (SNPs) that are associated with multiple traits at a selected P-value threshold. However, P-values do not account for differences in power, whereas Bayes' factors (BFs) do, and may be approximated using summary statistics. We use simulation studies to compare the power of frequentist and Bayesian approaches with overlap analyses, and to decide on appropriate thresholds for comparison between the two methods. It is empirically illustrated that BFs have the advantage over P-values of a decreasing type I error rate as study size increases for single-disease associations. Consequently, the overlap analysis of traits from different-sized studies encounters issues in fair P-value threshold selection, whereas BFs are adjusted automatically. Extensive simulations show that Bayesian overlap analyses tend to have higher power than those that assess association strength with P-values, particularly in low-power scenarios. Calibration tables between BFs and P-values are provided for a range of sample sizes, as well as an approximation approach for sample sizes that are not in the calibration table. Although P-values are sometimes thought more intuitive, these tables assist in removing the opaqueness of Bayesian thresholds and may also be used in the selection of a BF threshold to meet a certain type I error rate. An application of our methods is used to identify variants associated with both obesity and osteoarthritis. © 2015 The Authors. *Genetic Epidemiology published by Wiley Periodicals, Inc.

  4. Effects of additional data on Bayesian clustering.

    PubMed

    Yamazaki, Keisuke

    2017-10-01

    Hierarchical probabilistic models, such as mixture models, are used for cluster analysis. These models have two types of variables: observable and latent. In cluster analysis, the latent variable is estimated, and it is expected that additional information will improve the accuracy of the estimation of the latent variable. Many proposed learning methods are able to use additional data; these include semi-supervised learning and transfer learning. However, from a statistical point of view, a complex probabilistic model that encompasses both the initial and additional data might be less accurate due to having a higher-dimensional parameter. The present paper presents a theoretical analysis of the accuracy of such a model and clarifies which factor has the greatest effect on its accuracy, the advantages of obtaining additional data, and the disadvantages of increasing the complexity. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. Dark Energy Survey Year 1 Results: redshift distributions of the weak-lensing source galaxies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hoyle, B.; Gruen, D.; Bernstein, G. M.

    We describe the derivation and validation of redshift distribution estimates and their uncertainties for the galaxies used as weak lensing sources in the Dark Energy Survey (DES) Year 1 cosmological analyses. The Bayesian Photometric Redshift (BPZ) code is used to assign galaxies to four redshift bins between z=0.2 and 1.3, and to produce initial estimates of the lensing-weighted redshift distributionsmore » $$n^i_{PZ}(z)$$ for bin i. Accurate determination of cosmological parameters depends critically on knowledge of $n^i$ but is insensitive to bin assignments or redshift errors for individual galaxies. The cosmological analyses allow for shifts $$n^i(z)=n^i_{PZ}(z-\\Delta z^i)$$ to correct the mean redshift of $n^i(z)$ for biases in $$n^i_{\\rm PZ}$$. The $$\\Delta z^i$$ are constrained by comparison of independently estimated 30-band photometric redshifts of galaxies in the COSMOS field to BPZ estimates made from the DES griz fluxes, for a sample matched in fluxes, pre-seeing size, and lensing weight to the DES weak-lensing sources. In companion papers, the $$\\Delta z^i$$ are further constrained by the angular clustering of the source galaxies around red galaxies with secure photometric redshifts at 0.15« less

  6. Dark Energy Survey Year 1 Results: redshift distributions of the weak-lensing source galaxies

    DOE PAGES

    Hoyle, B.; Gruen, D.; Bernstein, G. M.; ...

    2018-04-18

    We describe the derivation and validation of redshift distribution estimates and their uncertainties for the galaxies used as weak lensing sources in the Dark Energy Survey (DES) Year 1 cosmological analyses. The Bayesian Photometric Redshift (BPZ) code is used to assign galaxies to four redshift bins between z=0.2 and 1.3, and to produce initial estimates of the lensing-weighted redshift distributionsmore » $$n^i_{PZ}(z)$$ for bin i. Accurate determination of cosmological parameters depends critically on knowledge of $n^i$ but is insensitive to bin assignments or redshift errors for individual galaxies. The cosmological analyses allow for shifts $$n^i(z)=n^i_{PZ}(z-\\Delta z^i)$$ to correct the mean redshift of $n^i(z)$ for biases in $$n^i_{\\rm PZ}$$. The $$\\Delta z^i$$ are constrained by comparison of independently estimated 30-band photometric redshifts of galaxies in the COSMOS field to BPZ estimates made from the DES griz fluxes, for a sample matched in fluxes, pre-seeing size, and lensing weight to the DES weak-lensing sources. In companion papers, the $$\\Delta z^i$$ are further constrained by the angular clustering of the source galaxies around red galaxies with secure photometric redshifts at 0.15« less

  7. Dark Energy Survey Year 1 Results: Redshift distributions of the weak lensing source galaxies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hoyle, B.; et al.

    2017-08-04

    We describe the derivation and validation of redshift distribution estimates and their uncertainties for the galaxies used as weak lensing sources in the Dark Energy Survey (DES) Year 1 cosmological analyses. The Bayesian Photometric Redshift (BPZ) code is used to assign galaxies to four redshift bins between z=0.2 and 1.3, and to produce initial estimates of the lensing-weighted redshift distributionsmore » $$n^i_{PZ}(z)$$ for bin i. Accurate determination of cosmological parameters depends critically on knowledge of $n^i$ but is insensitive to bin assignments or redshift errors for individual galaxies. The cosmological analyses allow for shifts $$n^i(z)=n^i_{PZ}(z-\\Delta z^i)$$ to correct the mean redshift of $n^i(z)$ for biases in $$n^i_{\\rm PZ}$$. The $$\\Delta z^i$$ are constrained by comparison of independently estimated 30-band photometric redshifts of galaxies in the COSMOS field to BPZ estimates made from the DES griz fluxes, for a sample matched in fluxes, pre-seeing size, and lensing weight to the DES weak-lensing sources. In companion papers, the $$\\Delta z^i$$ are further constrained by the angular clustering of the source galaxies around red galaxies with secure photometric redshifts at 0.15« less

  8. Comparison of sperm motility subpopulation structure among wild anadromous and farmed male Atlantic salmon (Salmo salar) parr using a CASA system.

    PubMed

    Caldeira, Carina; García-Molina, Almudena; Valverde, Anthony; Bompart, Daznia; Hassane, Megan; Martin, Patrick; Soler, Carles

    2018-04-13

    Atlantic salmon (Salmo salar) is an endangered freshwater species that needs help to recover its wild stocks. However, the priority in aquaculture is to obtain successful fertilisation and genetic variability to secure the revival of the species. The aims of the present work were to study sperm subpopulation structure and motility patterns in wild anadromous males and farmed male Atlantic salmon parr. Salmon sperm samples were collected from wild anadromous salmon (WS) and two generations of farmed parr males. Sperm samples were collected from sexually mature males and sperm motility was analysed at different times after activation (5 and 35s). Differences among the three groups were analysed using statistical techniques based on Cluster analysis the Bayesian method. Atlantic salmon were found to have three sperm subpopulations, and the spermatozoa in ejaculates of mature farmed parr males had a higher velocity and larger size than those of WS males. This could be an adaptation to high sperm competition because salmonid species are naturally adapted to this process. Motility analysis enables us to identify sperm subpopulations, and it may be useful to correlate these sperm subpopulations with fertilisation ability to test whether faster-swimming spermatozoa have a higher probability of success.

  9. A comment on priors for Bayesian occupancy models.

    PubMed

    Northrup, Joseph M; Gerber, Brian D

    2018-01-01

    Understanding patterns of species occurrence and the processes underlying these patterns is fundamental to the study of ecology. One of the more commonly used approaches to investigate species occurrence patterns is occupancy modeling, which can account for imperfect detection of a species during surveys. In recent years, there has been a proliferation of Bayesian modeling in ecology, which includes fitting Bayesian occupancy models. The Bayesian framework is appealing to ecologists for many reasons, including the ability to incorporate prior information through the specification of prior distributions on parameters. While ecologists almost exclusively intend to choose priors so that they are "uninformative" or "vague", such priors can easily be unintentionally highly informative. Here we report on how the specification of a "vague" normally distributed (i.e., Gaussian) prior on coefficients in Bayesian occupancy models can unintentionally influence parameter estimation. Using both simulated data and empirical examples, we illustrate how this issue likely compromises inference about species-habitat relationships. While the extent to which these informative priors influence inference depends on the data set, researchers fitting Bayesian occupancy models should conduct sensitivity analyses to ensure intended inference, or employ less commonly used priors that are less informative (e.g., logistic or t prior distributions). We provide suggestions for addressing this issue in occupancy studies, and an online tool for exploring this issue under different contexts.

  10. Impact of socioeconomic inequalities on geographic disparities in cancer incidence: comparison of methods for spatial disease mapping.

    PubMed

    Goungounga, Juste Aristide; Gaudart, Jean; Colonna, Marc; Giorgi, Roch

    2016-10-12

    The reliability of spatial statistics is often put into question because real spatial variations may not be found, especially in heterogeneous areas. Our objective was to compare empirically different cluster detection methods. We assessed their ability to find spatial clusters of cancer cases and evaluated the impact of the socioeconomic status (e.g., the Townsend index) on cancer incidence. Moran's I, the empirical Bayes index (EBI), and Potthoff-Whittinghill test were used to investigate the general clustering. The local cluster detection methods were: i) the spatial oblique decision tree (SpODT); ii) the spatial scan statistic of Kulldorff (SaTScan); and, iii) the hierarchical Bayesian spatial modeling (HBSM) in a univariate and multivariate setting. These methods were used with and without introducing the Townsend index of socioeconomic deprivation known to be related to the distribution of cancer incidence. Incidence data stemmed from the Cancer Registry of Isère and were limited to prostate, lung, colon-rectum, and bladder cancers diagnosed between 1999 and 2007 in men only. The study found a spatial heterogeneity (p < 0.01) and an autocorrelation for prostate (EBI = 0.02; p = 0.001), lung (EBI = 0.01; p = 0.019) and bladder (EBI = 0.007; p = 0.05) cancers. After introduction of the Townsend index, SaTScan failed in finding cancers clusters. This introduction changed the results obtained with the other methods. SpODT identified five spatial classes (p < 0.05): four in the Western and one in the Northern parts of the study area (standardized incidence ratios: 1.68, 1.39, 1.14, 1.12, and 1.16, respectively). In the univariate setting, the Bayesian smoothing method found the same clusters as the two other methods (RR >1.2). The multivariate HBSM found a spatial correlation between lung and bladder cancers (r = 0.6). In spatial analysis of cancer incidence, SpODT and HBSM may be used not only for cluster detection but also for searching for confounding or etiological factors in small areas. Moreover, the multivariate HBSM offers a flexible and meaningful modeling of spatial variations; it shows plausible previously unknown associations between various cancers.

  11. Spatial clustering of metal and metalloid mixtures in unregulated water sources on the Navajo Nation - Arizona, New Mexico, and Utah, USA.

    PubMed

    Hoover, Joseph H; Coker, Eric; Barney, Yolanda; Shuey, Chris; Lewis, Johnnye

    2018-08-15

    Contaminant mixtures are identified regularly in public and private drinking water supplies throughout the United States; however, the complex and often correlated nature of mixtures makes identification of relevant combinations challenging. This study employed a Bayesian clustering method to identify subgroups of water sources with similar metal and metalloid profiles. Additionally, a spatial scan statistic assessed spatial clustering of these subgroups and a human health metric was applied to investigate potential for human toxicity. These methods were applied to a dataset comprised of metal and metalloid measurements from unregulated water sources located on the Navajo Nation, in the southwest United States. Results indicated distinct subgroups of water sources with similar contaminant profiles and that some of these subgroups were spatially clustered. Several profiles had metal and metalloid concentrations that may have potential for human toxicity including arsenic, uranium, lead, manganese, and selenium. This approach may be useful for identifying mixtures in water sources, spatially evaluating the clusters, and help inform toxicological research investigating mixtures. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  12. Making Sense of a Negative Clinical Trial Result: A Bayesian Analysis of a Clinical Trial of Lorazepam and Diazepam for Pediatric Status Epilepticus.

    PubMed

    Chamberlain, Daniel B; Chamberlain, James M

    2017-01-01

    We demonstrate the application of a Bayesian approach to a recent negative clinical trial result. A Bayesian analysis of such a trial can provide a more useful interpretation of results and can incorporate previous evidence. This was a secondary analysis of the efficacy and safety results of the Pediatric Seizure Study, a randomized clinical trial of lorazepam versus diazepam for pediatric status epilepticus. We included the published results from the only prospective pediatric study of status in a Bayesian hierarchic model, and we performed sensitivity analyses on the amount of pooling between studies. We evaluated 3 summary analyses for the results: superiority, noninferiority (margin <-10%), and practical equivalence (within ±10%). Consistent with the original study's classic analysis of study results, we did not demonstrate superiority of lorazepam over diazepam. There is a 95% probability that the true efficacy of lorazepam is in the range of 66% to 80%. For both the efficacy and safety outcomes, there was greater than 95% probability that lorazepam is noninferior to diazepam, and there was greater than 90% probability that the 2 medications are practically equivalent. The results were largely driven by the current study because of the sample sizes of our study (n=273) and the previous pediatric study (n=61). Because Bayesian analysis estimates the probability of one or more hypotheses, such an approach can provide more useful information about the meaning of the results of a negative trial outcome. In the case of pediatric status epilepticus, it is highly likely that lorazepam is noninferior and practically equivalent to diazepam. Copyright © 2016 American College of Emergency Physicians. Published by Elsevier Inc. All rights reserved.

  13. Bayesian analysis of heterogeneous treatment effects for patient-centered outcomes research.

    PubMed

    Henderson, Nicholas C; Louis, Thomas A; Wang, Chenguang; Varadhan, Ravi

    2016-01-01

    Evaluation of heterogeneity of treatment effect (HTE) is an essential aspect of personalized medicine and patient-centered outcomes research. Our goal in this article is to promote the use of Bayesian methods for subgroup analysis and to lower the barriers to their implementation by describing the ways in which the companion software beanz can facilitate these types of analyses. To advance this goal, we describe several key Bayesian models for investigating HTE and outline the ways in which they are well-suited to address many of the commonly cited challenges in the study of HTE. Topics highlighted include shrinkage estimation, model choice, sensitivity analysis, and posterior predictive checking. A case study is presented in which we demonstrate the use of the methods discussed.

  14. Bayesian analysis of non-homogeneous Markov chains: application to mental health data.

    PubMed

    Sung, Minje; Soyer, Refik; Nhan, Nguyen

    2007-07-10

    In this paper we present a formal treatment of non-homogeneous Markov chains by introducing a hierarchical Bayesian framework. Our work is motivated by the analysis of correlated categorical data which arise in assessment of psychiatric treatment programs. In our development, we introduce a Markovian structure to describe the non-homogeneity of transition patterns. In doing so, we introduce a logistic regression set-up for Markov chains and incorporate covariates in our model. We present a Bayesian model using Markov chain Monte Carlo methods and develop inference procedures to address issues encountered in the analyses of data from psychiatric treatment programs. Our model and inference procedures are implemented to some real data from a psychiatric treatment study. Copyright 2006 John Wiley & Sons, Ltd.

  15. Automated flow cytometric analysis across large numbers of samples and cell types.

    PubMed

    Chen, Xiaoyi; Hasan, Milena; Libri, Valentina; Urrutia, Alejandra; Beitz, Benoît; Rouilly, Vincent; Duffy, Darragh; Patin, Étienne; Chalmond, Bernard; Rogge, Lars; Quintana-Murci, Lluis; Albert, Matthew L; Schwikowski, Benno

    2015-04-01

    Multi-parametric flow cytometry is a key technology for characterization of immune cell phenotypes. However, robust high-dimensional post-analytic strategies for automated data analysis in large numbers of donors are still lacking. Here, we report a computational pipeline, called FlowGM, which minimizes operator input, is insensitive to compensation settings, and can be adapted to different analytic panels. A Gaussian Mixture Model (GMM)-based approach was utilized for initial clustering, with the number of clusters determined using Bayesian Information Criterion. Meta-clustering in a reference donor permitted automated identification of 24 cell types across four panels. Cluster labels were integrated into FCS files, thus permitting comparisons to manual gating. Cell numbers and coefficient of variation (CV) were similar between FlowGM and conventional gating for lymphocyte populations, but notably FlowGM provided improved discrimination of "hard-to-gate" monocyte and dendritic cell (DC) subsets. FlowGM thus provides rapid high-dimensional analysis of cell phenotypes and is amenable to cohort studies. Copyright © 2015. Published by Elsevier Inc.

  16. Intrabreed Stratification Related to Divergent Selection Regimes in Purebred Dogs May Affect the Interpretation of Genetic Association Studies

    PubMed Central

    Chang, Melanie L.; Yokoyama, Jennifer S.; Branson, Nick; Dyer, Donna J.; Hitte, Christophe; Overall, Karen L.

    2009-01-01

    Until recently, canine genetic research has not focused on population structure within breeds, which may confound the results of case–control studies by introducing spurious correlations between phenotype and genotype that reflect population history. Intrabreed structure may exist when geographical origin or divergent selection regimes influence the choices of potential mates for breeding dogs. We present evidence for intrabreed stratification from a genome-wide marker survey in a sample of unrelated dogs. We genotyped 76 Border Collies, 49 Australian Shepherds, 17 German Shepherd Dogs, and 17 Portuguese Water Dogs for our primary analyses using Affymetrix Canine v2.0 single-nucleotide polymorphism (SNP) arrays. Subsets of autosomal markers were examined using clustering algorithms to facilitate assignment of individuals to populations and estimation of the number of populations represented in the sample. SNPs passing stringent quality control filters were employed for explicitly phylogenetic analyses reconstructing relationships between individuals using maximum parsimony and Bayesian methods. We used simulation studies to explore the possible effects of intrabreed stratification on genome-wide association studies. These analyses demonstrate significant stratification in at least one of our primary breeds of interest, the Border Collie. Demographic and pedigree data suggest that this population substructure may result from geographic isolation or divergent selection regimes practiced by breeders with different breeding program goals. Simulation studies indicate that such stratification could result in false discovery rates significant enough to confound genome-wide association analyses. Intrabreed stratification should be accounted for when designing and interpreting the results of case–control association studies using purebred dogs.

  17. Phylogeny of Neoparamoeba strains isolated from marine fish and invertebrates as inferred from SSU rDNA sequences.

    PubMed

    Dyková, Iva; Nowak, Barbara; Pecková, Hana; Fiala, Ivan; Crosbie, Philip; Dvoráková, Helena

    2007-02-08

    We characterised 9 strains selected from primary isolates referable to Paramoeba/Neoparamoeba spp. Based on ultrastructural study, 5 strains isolated from fish (amoebic gill disease [AGD]-affected Atlantic salmon and dead southern bluefin tuna), 1 strain from netting of a floating sea cage and 3 strains isolated from invertebrates (sea urchins and crab) were assigned to the genus Neoparamoeba Page, 1987. Phylogenetic analyses based on SSU rDNA sequences revealed affiliations of newly introduced and previously analysed Neoparamoeba strains. Three strains from the invertebrates and 2 out of 3 strains from gills of southern bluefin tunas were members of the N. branchiphila clade, while the remaining, fish-isolated strains, as well as the fish cage strain, clustered within the clade of N. pemaquidensis. These findings and previous reports point to the possibility that N. pemaquidensis and N. branchiphila can affect both fish and invertebrates. A new potential fish host, southern bluefin tuna, was included in the list of farmed fish endangered by N. branchiphila. The sequence of P. eilhardi (Culture Collection of Algae and Protozoa [CCAP] strain 1560/2) appeared in all analyses among sequences of strain representatives of Neoparamoeba species, in a position well supported by bootstrap value, Bremer index and Bayesian posterior probability. Our research shows that isolation of additional strains from invertebrates and further analyses of relations between molecular data and morphological characters of the genera Paramoeba and Neoparamoeba are required. This complexity needs to be considered when attempting to define molecular markers for identification of Paramoeba/Neoparamoeba species in tissues of fish and invertebrates.

  18. Detecting introgressive hybridization between free-ranging domestic dogs and wild wolves (Canis lupus) by admixture linkage disequilibrium analysis.

    PubMed

    Verardi, A; Lucchini, V; Randi, E

    2006-09-01

    Occasional crossbreeding between free-ranging domestic dogs and wild wolves (Canis lupus) has been detected in some European countries by mitochondrial DNA sequencing and genotyping unlinked microsatellite loci. Maternal and unlinked genomic markers, however, might underestimate the extent of introgressive hybridization, and their impacts on the preservation of wild wolf gene pools. In this study, we genotyped 220 presumed Italian wolves, 85 dogs and 7 known hybrids at 16 microsatellites belonging to four different linkage groups (plus four unlinked microsatellites). Population clustering and individual assignments were performed using a Bayesian procedure implemented in structure 2.1, which models the gametic disequilibrium arising between linked loci during admixtures, aiming to trace hybridization events further back in time and infer the population of origin of chromosomal blocks. Results indicate that (i) linkage disequilibrium was higher in wolves than in dogs; (ii) 11 out of 220 wolves (5.0%) were likely admixed, a proportion that is significantly higher than one admixed genotype in 107 wolves found previously in a study using unlinked markers; (iii) posterior maximum-likelihood estimates of the recombination parameter r revealed that introgression in Italian wolves is not recent, but could have continued for the last 70 (+/- 20) generations, corresponding to approximately 140-210 years. Bayesian clustering showed that, despite some admixture, wolf and dog gene pools remain sharply distinct (the average proportions of membership to wolf and dog clusters were Q(w) = 0.95 and Q(d) = 0.98, respectively), suggesting that hybridization was not frequent, and that introgression in nature is counteracted by behavioural or selective constraints.

  19. Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap.

    PubMed

    Zhou, Hanzhi; Elliott, Michael R; Raghunathan, Trivellore E

    2016-06-01

    Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in "Delta-V," a key crash severity measure.

  20. Evaluating for a geospatial relationship between radon levels and thyroid cancer in Pennsylvania.

    PubMed

    Goyal, Neerav; Camacho, Fabian; Mangano, Joseph; Goldenberg, David

    2015-01-01

    To determine whether there is an association between radon levels and the rise in incidence of thyroid cancer in Pennsylvania. Epidemiological study of the state of Pennsylvania. We used information from the Pennsylvania Cancer Registry and the Pennsylvania Department of Energy. From the registry, information regarding thyroid incidence by county and zip code was recorded. Information regarding radon levels per county was recorded from the state. Poisson regression models were fit predicting county-level thyroid incidence and change as a function of radon/lagged radon levels. To account for measurement error in the radon levels, a Bayesian Model extending the Poisson models was fit. Geospatial clustering analysis was also performed. No association was noted between cumulative radon levels and thyroid incidence. In the Poisson modeling, no significant association was noted between county radon level and thyroid cancer incidence (P = .23). Looking for a lag between the radon level and its effect, no significant effect was seen with a lag of 0 to 6 years between exposure and effect (P = .063 to P = .59). The Bayesian models also failed to show a statistically significant association. A cluster of high thyroid cancer incidence was found in western Pennsylvania. Through a variety of models, no association was elicited between annual radon levels recorded in Pennsylvania and the rising incidence of thyroid cancer. However, a cluster of thyroid cancer incidence was found in western Pennsylvania. Further studies may be helpful in looking for other exposures or associations. © 2014 The American Laryngological, Rhinological and Otological Society, Inc.

  1. Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap

    PubMed Central

    Zhou, Hanzhi; Elliott, Michael R.; Raghunathan, Trivellore E.

    2017-01-01

    Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in “Delta-V,” a key crash severity measure. PMID:29226161

  2. Short-Term Dynamic and Local Epidemiological Trends in the South American HIV-1B Epidemic.

    PubMed

    Junqueira, Dennis Maletich; de Medeiros, Rubia Marília; Gräf, Tiago; Almeida, Sabrina Esteves de Matos

    2016-01-01

    The human displacement and sexual behavior are the main factors driving the HIV-1 pandemic to the current profile. The intrinsic structure of the HIV transmission among different individuals has valuable importance for the understanding of the epidemic and for the public health response. The aim of this study was to characterize the HIV-1 subtype B (HIV-1B) epidemic in South America through the identification of transmission links and infer trends about geographical patterns and median time of transmission between individuals. Sequences of the protease and reverse transcriptase coding regions from 4,810 individuals were selected from GenBank. Maximum likelihood phylogenies were inferred and submitted to ClusterPicker to identify transmission links. Bayesian analyses were applied only for clusters including ≥5 dated samples in order to estimate the median maximum inter-transmission interval. This study analyzed sequences sampled from 12 South American countries, from individuals of different exposure categories, under different antiretroviral profiles, and from a wide period of time (1989-2013). Continentally, Brazil, Argentina and Venezuela were revealed important sites for the spread of HIV-1B among countries inside South America. Of note, from all the clusters identified about 70% of the HIV-1B infections are primarily occurring among individuals living in the same geographic region. In addition, these transmissions seem to occur early after the infection of an individual, taking in average 2.39 years (95% CI 1.48-3.30) to succeed. Homosexual/Bisexual individuals transmit the virus as quickly as almost half time of that estimated for the general population sampled here. Public health services can be broadly benefitted from this kind of information whether to focus on specific programs of response to the epidemic whether as guiding of prevention campaigns to specific risk groups.

  3. Toward a DNA Taxonomy of Alpine Rhithrogena (Ephemeroptera: Heptageniidae) Using a Mixed Yule-Coalescent Analysis of Mitochondrial and Nuclear DNA

    PubMed Central

    Vuataz, Laurent; Sartori, Michel; Wagner, André; Monaghan, Michael T.

    2011-01-01

    Aquatic larvae of many Rhithrogena mayflies (Ephemeroptera) inhabit sensitive Alpine environments. A number of species are on the IUCN Red List and many recognized species have restricted distributions and are of conservation interest. Despite their ecological and conservation importance, ambiguous morphological differences among closely related species suggest that the current taxonomy may not accurately reflect the evolutionary diversity of the group. Here we examined the species status of nearly 50% of European Rhithrogena diversity using a widespread sampling scheme of Alpine species that included 22 type localities, general mixed Yule-coalescent (GMYC) model analysis of one standard mtDNA marker and one newly developed nDNA marker, and morphological identification where possible. Using sequences from 533 individuals from 144 sampling localities, we observed significant clustering of the mitochondrial (cox1) marker into 31 GMYC species. Twenty-one of these could be identified based on the presence of topotypes (expertly identified specimens from the species' type locality) or unambiguous morphology. These results strongly suggest the presence of both cryptic diversity and taxonomic oversplitting in Rhithrogena. Significant clustering was not detected with protein-coding nuclear PEPCK, although nine GMYC species were congruent with well supported terminal clusters of nDNA. Lack of greater congruence in the two data sets may be the result of incomplete sorting of ancestral polymorphism. Bayesian phylogenetic analyses of both gene regions recovered four of the six recognized Rhithrogena species groups in our samples as monophyletic. Future development of more nuclear markers would facilitate multi-locus analysis of unresolved, closely related species pairs. The DNA taxonomy developed here lays the groundwork for a future revision of the important but cryptic Rhithrogena genus in Europe. PMID:21611178

  4. Bayes in biological anthropology.

    PubMed

    Konigsberg, Lyle W; Frankenberg, Susan R

    2013-12-01

    In this article, we both contend and illustrate that biological anthropologists, particularly in the Americas, often think like Bayesians but act like frequentists when it comes to analyzing a wide variety of data. In other words, while our research goals and perspectives are rooted in probabilistic thinking and rest on prior knowledge, we often proceed to use statistical hypothesis tests and confidence interval methods unrelated (or tenuously related) to the research questions of interest. We advocate for applying Bayesian analyses to a number of different bioanthropological questions, especially since many of the programming and computational challenges to doing so have been overcome in the past two decades. To facilitate such applications, this article explains Bayesian principles and concepts, and provides concrete examples of Bayesian computer simulations and statistics that address questions relevant to biological anthropology, focusing particularly on bioarchaeology and forensic anthropology. It also simultaneously reviews the use of Bayesian methods and inference within the discipline to date. This article is intended to act as primer to Bayesian methods and inference in biological anthropology, explaining the relationships of various methods to likelihoods or probabilities and to classical statistical models. Our contention is not that traditional frequentist statistics should be rejected outright, but that there are many situations where biological anthropology is better served by taking a Bayesian approach. To this end it is hoped that the examples provided in this article will assist researchers in choosing from among the broad array of statistical methods currently available. Copyright © 2013 Wiley Periodicals, Inc.

  5. Bayesian Retrieval of Complete Posterior PDFs of Oceanic Rain Rate From Microwave Observations

    NASA Technical Reports Server (NTRS)

    Chiu, J. Christine; Petty, Grant W.

    2005-01-01

    This paper presents a new Bayesian algorithm for retrieving surface rain rate from Tropical Rainfall Measurements Mission (TRMM) Microwave Imager (TMI) over the ocean, along with validations against estimates from the TRMM Precipitation Radar (PR). The Bayesian approach offers a rigorous basis for optimally combining multichannel observations with prior knowledge. While other rain rate algorithms have been published that are based at least partly on Bayesian reasoning, this is believed to be the first self-contained algorithm that fully exploits Bayes Theorem to yield not just a single rain rate, but rather a continuous posterior probability distribution of rain rate. To advance our understanding of theoretical benefits of the Bayesian approach, we have conducted sensitivity analyses based on two synthetic datasets for which the true conditional and prior distribution are known. Results demonstrate that even when the prior and conditional likelihoods are specified perfectly, biased retrievals may occur at high rain rates. This bias is not the result of a defect of the Bayesian formalism but rather represents the expected outcome when the physical constraint imposed by the radiometric observations is weak, due to saturation effects. It is also suggested that the choice of the estimators and the prior information are both crucial to the retrieval. In addition, the performance of our Bayesian algorithm is found to be comparable to that of other benchmark algorithms in real-world applications, while having the additional advantage of providing a complete continuous posterior probability distribution of surface rain rate.

  6. Tmax Determined Using a Bayesian Estimation Deconvolution Algorithm Applied to Bolus Tracking Perfusion Imaging: A Digital Phantom Validation Study.

    PubMed

    Uwano, Ikuko; Sasaki, Makoto; Kudo, Kohsuke; Boutelier, Timothé; Kameda, Hiroyuki; Mori, Futoshi; Yamashita, Fumio

    2017-01-10

    The Bayesian estimation algorithm improves the precision of bolus tracking perfusion imaging. However, this algorithm cannot directly calculate Tmax, the time scale widely used to identify ischemic penumbra, because Tmax is a non-physiological, artificial index that reflects the tracer arrival delay (TD) and other parameters. We calculated Tmax from the TD and mean transit time (MTT) obtained by the Bayesian algorithm and determined its accuracy in comparison with Tmax obtained by singular value decomposition (SVD) algorithms. The TD and MTT maps were generated by the Bayesian algorithm applied to digital phantoms with time-concentration curves that reflected a range of values for various perfusion metrics using a global arterial input function. Tmax was calculated from the TD and MTT using constants obtained by a linear least-squares fit to Tmax obtained from the two SVD algorithms that showed the best benchmarks in a previous study. Correlations between the Tmax values obtained by the Bayesian and SVD methods were examined. The Bayesian algorithm yielded accurate TD and MTT values relative to the true values of the digital phantom. Tmax calculated from the TD and MTT values with the least-squares fit constants showed excellent correlation (Pearson's correlation coefficient = 0.99) and agreement (intraclass correlation coefficient = 0.99) with Tmax obtained from SVD algorithms. Quantitative analyses of Tmax values calculated from Bayesian-estimation algorithm-derived TD and MTT from a digital phantom correlated and agreed well with Tmax values determined using SVD algorithms.

  7. Bayesian network meta-analysis for cluster randomized trials with binary outcomes.

    PubMed

    Uhlmann, Lorenz; Jensen, Katrin; Kieser, Meinhard

    2017-06-01

    Network meta-analysis is becoming a common approach to combine direct and indirect comparisons of several treatment arms. In recent research, there have been various developments and extensions of the standard methodology. Simultaneously, cluster randomized trials are experiencing an increased popularity, especially in the field of health services research, where, for example, medical practices are the units of randomization but the outcome is measured at the patient level. Combination of the results of cluster randomized trials is challenging. In this tutorial, we examine and compare different approaches for the incorporation of cluster randomized trials in a (network) meta-analysis. Furthermore, we provide practical insight on the implementation of the models. In simulation studies, it is shown that some of the examined approaches lead to unsatisfying results. However, there are alternatives which are suitable to combine cluster randomized trials in a network meta-analysis as they are unbiased and reach accurate coverage rates. In conclusion, the methodology can be extended in such a way that an adequate inclusion of the results obtained in cluster randomized trials becomes feasible. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  8. Finding Groups Using Model-based Cluster Analysis: Heterogeneous Emotional Self-regulatory Processes and Heavy Alcohol Use Risk

    PubMed Central

    Mun, Eun-Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.

    2010-01-01

    Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of non-nested models using the Bayesian Information Criterion (BIC) to compare multiple models and identify the optimum number of clusters. The current study clustered 36 young men and women based on their baseline heart rate (HR) and HR variability (HRV), chronic alcohol use, and reasons for drinking. Two cluster groups were identified and labeled High Alcohol Risk and Normative groups. Compared to the Normative group, individuals in the High Alcohol Risk group had higher levels of alcohol use and more strongly endorsed disinhibition and suppression reasons for use. The High Alcohol Risk group showed significant HRV changes in response to positive and negative emotional and appetitive picture cues, compared to neutral cues. In contrast, the Normative group showed a significant HRV change only to negative cues. Findings suggest that the individuals with autonomic self-regulatory difficulties may be more susceptible to heavy alcohol use and use alcohol for emotional regulation. PMID:18331138

  9. A multimembership catalogue for 1876 open clusters using UCAC4 data

    NASA Astrophysics Data System (ADS)

    Sampedro, L.; Dias, W. S.; Alfaro, E. J.; Monteiro, H.; Molino, A.

    2017-10-01

    The main objective of this work is to determine the cluster members of 1876 open clusters, using positions and proper motions of the astrometric fourth United States Naval Observatory (USNO) CCD Astrograph Catalog (UCAC4). For this purpose, we apply three different methods, all based on a Bayesian approach, but with different formulations: a purely parametric method, another completely non-parametric algorithm and a third, recently developed by Sampedro & Alfaro, using both formulations at different steps of the whole process. The first and second statistical moments of the members' phase-space subspace, obtained after applying the three methods, are compared for every cluster. Although, on average, the three methods yield similar results, there are also specific differences between them, as well as for some particular clusters. The comparison with other published catalogues shows good agreement. We have also estimated, for the first time, the mean proper motion for a sample of 18 clusters. The results are organized in a single catalogue formed by two main files, one with the most relevant information for each cluster, partially including that in UCAC4, and the other showing the individual membership probabilities for each star in the cluster area. The final catalogue, with an interface design that enables an easy interaction with the user, is available in electronic format at the Stellar Systems Group (SSG-IAA) web site (http://ssg.iaa.es/en/content/sampedro-cluster-catalog).

  10. Estimating multilevel logistic regression models when the number of clusters is low: a comparison of different statistical software procedures.

    PubMed

    Austin, Peter C

    2010-04-22

    Multilevel logistic regression models are increasingly being used to analyze clustered data in medical, public health, epidemiological, and educational research. Procedures for estimating the parameters of such models are available in many statistical software packages. There is currently little evidence on the minimum number of clusters necessary to reliably fit multilevel regression models. We conducted a Monte Carlo study to compare the performance of different statistical software procedures for estimating multilevel logistic regression models when the number of clusters was low. We examined procedures available in BUGS, HLM, R, SAS, and Stata. We found that there were qualitative differences in the performance of different software procedures for estimating multilevel logistic models when the number of clusters was low. Among the likelihood-based procedures, estimation methods based on adaptive Gauss-Hermite approximations to the likelihood (glmer in R and xtlogit in Stata) or adaptive Gaussian quadrature (Proc NLMIXED in SAS) tended to have superior performance for estimating variance components when the number of clusters was small, compared to software procedures based on penalized quasi-likelihood. However, only Bayesian estimation with BUGS allowed for accurate estimation of variance components when there were fewer than 10 clusters. For all statistical software procedures, estimation of variance components tended to be poor when there were only five subjects per cluster, regardless of the number of clusters.

  11. Spatial heterogeneity and risk factors for stunting among children under age five in Ethiopia: A Bayesian geo-statistical model.

    PubMed

    Hagos, Seifu; Hailemariam, Damen; WoldeHanna, Tasew; Lindtjørn, Bernt

    2017-01-01

    Understanding the spatial distribution of stunting and underlying factors operating at meso-scale is of paramount importance for intervention designing and implementations. Yet, little is known about the spatial distribution of stunting and some discrepancies are documented on the relative importance of reported risk factors. Therefore, the present study aims at exploring the spatial distribution of stunting at meso- (district) scale, and evaluates the effect of spatial dependency on the identification of risk factors and their relative contribution to the occurrence of stunting and severe stunting in a rural area of Ethiopia. A community based cross sectional study was conducted to measure the occurrence of stunting and severe stunting among children aged 0-59 months. Additionally, we collected relevant information on anthropometric measures, dietary habits, parent and child-related demographic and socio-economic status. Latitude and longitude of surveyed households were also recorded. Local Anselin Moran's I was calculated to investigate the spatial variation of stunting prevalence and identify potential local pockets (hotspots) of high prevalence. Finally, we employed a Bayesian geo-statistical model, which accounted for spatial dependency structure in the data, to identify potential risk factors for stunting in the study area. Overall, the prevalence of stunting and severe stunting in the district was 43.7% [95%CI: 40.9, 46.4] and 21.3% [95%CI: 19.5, 23.3] respectively. We identified statistically significant clusters of high prevalence of stunting (hotspots) in the eastern part of the district and clusters of low prevalence (cold spots) in the western. We found out that the inclusion of spatial structure of the data into the Bayesian model has shown to improve the fit for stunting model. The Bayesian geo-statistical model indicated that the risk of stunting increased as the child's age increased (OR 4.74; 95% Bayesian credible interval [BCI]:3.35-6.58) and among boys (OR 1.28; 95%BCI; 1.12-1.45). However, maternal education and household food security were found to be protective against stunting and severe stunting. Stunting prevalence may vary across space at different scale. For this, it's important that nutrition studies and, more importantly, control interventions take into account this spatial heterogeneity in the distribution of nutritional deficits and their underlying associated factors. The findings of this study also indicated that interventions integrating household food insecurity in nutrition programs in the district might help to avert the burden of stunting.

  12. Identifying food deserts and swamps based on relative healthy food access: a spatio-temporal Bayesian approach.

    PubMed

    Luan, Hui; Law, Jane; Quick, Matthew

    2015-12-30

    Obesity and other adverse health outcomes are influenced by individual- and neighbourhood-scale risk factors, including the food environment. At the small-area scale, past research has analysed spatial patterns of food environments for one time period, overlooking how food environments change over time. Further, past research has infrequently analysed relative healthy food access (RHFA), a measure that is more representative of food purchasing and consumption behaviours than absolute outlet density. This research applies a Bayesian hierarchical model to analyse the spatio-temporal patterns of RHFA in the Region of Waterloo, Canada, from 2011 to 2014 at the small-area level. RHFA is calculated as the proportion of healthy food outlets (healthy outlets/healthy + unhealthy outlets) within 4-km from each small-area. This model measures spatial autocorrelation of RHFA, temporal trend of RHFA for the study region, and spatio-temporal trends of RHFA for small-areas. For the study region, a significant decreasing trend in RHFA is observed (-0.024), suggesting that food swamps have become more prevalent during the study period. For small-areas, significant decreasing temporal trends in RHFA were observed for all small-areas. Specific small-areas located in south Waterloo, north Kitchener, and southeast Cambridge exhibited the steepest decreasing spatio-temporal trends and are classified as spatio-temporal food swamps. This research demonstrates a Bayesian spatio-temporal modelling approach to analyse RHFA at the small-area scale. Results suggest that food swamps are more prevalent than food deserts in the Region of Waterloo. Analysing spatio-temporal trends of RHFA improves understanding of local food environment, highlighting specific small-areas where policies should be targeted to increase RHFA and reduce risk factors of adverse health outcomes such as obesity.

  13. Molecular Systematics of the Cape Parrot (Poicephalus robustus): Implications for Taxonomy and Conservation

    PubMed Central

    Coetzer, Willem G.; Downs, Colleen T.; Perrin, Mike R.; Willows-Munro, Sandi

    2015-01-01

    The taxonomic position of the Cape Parrot (Poicephalus robustus robustus) has been the focus of much debate. A number of authors suggest that the Cape Parrot should be viewed as a distinct species separate from the other two P. robustus subspecies (P. r. fuscicollis and P. r. suahelicus). These recommendations were based on morphological, ecological, and behavioural assessments. In this study we investigated the validity of these recommendations using multilocus DNA analyses. We genotyped 138 specimens from five Poicephalus species (P. cryptoxanthus, P. gulielmi, P. meyeri, P. robustus, and P. rueppellii) using 11 microsatellite loci. Additionally, two mitochondrial (cytochrome oxidase I gene and 16S ribosomal RNA) and one nuclear intron (intron 7 of the β-fibrinogen gene) markers were amplified and sequenced. Bayesian clustering analysis and pairwise FST analysis of microsatellite data identified P. r. robustus as genetically distinct from the other P. robustus subspecies. Phylogenetic and molecular clock analyses on sequence data also supported the microsatellite analyses, placing P. r. robustus in a distinct clade separate from the other P. robustus subspecies. Molecular clock analysis places the most recent common ancestor between P. r. robustus and P. r. fuscicollis / P. r. suahelicus at 2.13 to 2.67 million years ago. Our results all support previous recommendations to elevate the Cape Parrot to species level. This will facilitate better planning and implementation of international and local conservation management strategies for the Cape Parrot. PMID:26267261

  14. Genetic patterns of habitat fragmentation and past climate-change effects in the Mediterranean high-mountain plant Armeria caespitosa (Plumbaginaceae).

    PubMed

    García-Fernández, Alfredo; Iriondo, Jose M; Escudero, Adrián; Aguilar, Javier Fuertes; Feliner, Gonzalo Nieto

    2013-08-01

    Mountain plants are among the species most vulnerable to global warming, because of their isolation, narrow geographic distribution, and limited geographic range shifts. Stochastic and selective processes can act on the genome, modulating genetic structure and diversity. Fragmentation and historical processes also have a great influence on current genetic patterns, but the spatial and temporal contexts of these processes are poorly known. We aimed to evaluate the microevolutionary processes that may have taken place in Mediterranean high-mountain plants in response to changing historical environmental conditions. Genetic structure, diversity, and loci under selection were analyzed using AFLP markers in 17 populations distributed over the whole geographic range of Armeria caespitosa, an endemic plant that inhabits isolated mountains (Sierra de Guadarrama, Spain). Differences in altitude, geographic location, and climate conditions were considered in the analyses, because they may play an important role in selective and stochastic processes. Bayesian clustering approaches identified nine genetic groups, although some discrepancies in assignment were found between alternative analyses. Spatially explicit analyses showed a weak relationship between genetic parameters and spatial or environmental distances. However, a large proportion of outlier loci were detected, and some outliers were related to environmental variables. A. caespitosa populations exhibit spatial patterns of genetic structure that cannot be explained by the isolation-by-distance model. Shifts along the altitude gradient in response to Pleistocene climatic oscillations and environmentally mediated selective forces might explain the resulting structure and genetic diversity values found.

  15. Plastome sequences and exploration of tree-space help to resolve the phylogeny of riceflowers (Thymelaeaceae: Pimelea).

    PubMed

    Foster, Charles S P; Henwood, Murray J; Ho, Simon Y W

    2018-05-25

    Data sets comprising small numbers of genetic markers are not always able to resolve phylogenetic relationships. This has frequently been the case in molecular systematic studies of plants, with many analyses being based on sequence data from only two or three chloroplast genes. An example of this comes from the riceflowers Pimelea Banks & Sol. ex Gaertn. (Thymelaeaceae), a large genus of flowering plants predominantly distributed in Australia. Despite the considerable morphological variation in the genus, low sequence divergence in chloroplast markers has led to the phylogeny of Pimelea remaining largely uncertain. In this study, we resolve the backbone of the phylogeny of Pimelea in comprehensive Bayesian and maximum-likelihood analyses of plastome sequences from 41 taxa. However, some relationships received only moderate to poor support, and the Pimelea clade contained extremely short internal branches. By using topology-clustering analyses, we demonstrate that conflicting phylogenetic signals can be found across the trees estimated from individual chloroplast protein-coding genes. A relaxed-clock dating analysis reveals that Pimelea arose in the mid-Miocene, with most divergences within the genus occurring during a subsequent rapid diversification. Our new phylogenetic estimate offers better resolution and is more strongly supported than previous estimates, providing a platform for future taxonomic revisions of both Pimelea and the broader subfamily. Our study has demonstrated the substantial improvements in phylogenetic resolution that can be achieved using plastome-scale data sets in plant molecular systematics. Copyright © 2018 Elsevier Inc. All rights reserved.

  16. A Bayesian approach to estimating variance components within a multivariate generalizability theory framework.

    PubMed

    Jiang, Zhehan; Skorupski, William

    2017-12-12

    In many behavioral research areas, multivariate generalizability theory (mG theory) has been typically used to investigate the reliability of certain multidimensional assessments. However, traditional mG-theory estimation-namely, using frequentist approaches-has limits, leading researchers to fail to take full advantage of the information that mG theory can offer regarding the reliability of measurements. Alternatively, Bayesian methods provide more information than frequentist approaches can offer. This article presents instructional guidelines on how to implement mG-theory analyses in a Bayesian framework; in particular, BUGS code is presented to fit commonly seen designs from mG theory, including single-facet designs, two-facet crossed designs, and two-facet nested designs. In addition to concrete examples that are closely related to the selected designs and the corresponding BUGS code, a simulated dataset is provided to demonstrate the utility and advantages of the Bayesian approach. This article is intended to serve as a tutorial reference for applied researchers and methodologists conducting mG-theory studies.

  17. Bayesian generalized least squares regression with application to log Pearson type 3 regional skew estimation

    NASA Astrophysics Data System (ADS)

    Reis, D. S.; Stedinger, J. R.; Martins, E. S.

    2005-10-01

    This paper develops a Bayesian approach to analysis of a generalized least squares (GLS) regression model for regional analyses of hydrologic data. The new approach allows computation of the posterior distributions of the parameters and the model error variance using a quasi-analytic approach. Two regional skew estimation studies illustrate the value of the Bayesian GLS approach for regional statistical analysis of a shape parameter and demonstrate that regional skew models can be relatively precise with effective record lengths in excess of 60 years. With Bayesian GLS the marginal posterior distribution of the model error variance and the corresponding mean and variance of the parameters can be computed directly, thereby providing a simple but important extension of the regional GLS regression procedures popularized by Tasker and Stedinger (1989), which is sensitive to the likely values of the model error variance when it is small relative to the sampling error in the at-site estimator.

  18. A Bayesian Approach to More Stable Estimates of Group-Level Effects in Contextual Studies.

    PubMed

    Zitzmann, Steffen; Lüdtke, Oliver; Robitzsch, Alexander

    2015-01-01

    Multilevel analyses are often used to estimate the effects of group-level constructs. However, when using aggregated individual data (e.g., student ratings) to assess a group-level construct (e.g., classroom climate), the observed group mean might not provide a reliable measure of the unobserved latent group mean. In the present article, we propose a Bayesian approach that can be used to estimate a multilevel latent covariate model, which corrects for the unreliable assessment of the latent group mean when estimating the group-level effect. A simulation study was conducted to evaluate the choice of different priors for the group-level variance of the predictor variable and to compare the Bayesian approach with the maximum likelihood approach implemented in the software Mplus. Results showed that, under problematic conditions (i.e., small number of groups, predictor variable with a small ICC), the Bayesian approach produced more accurate estimates of the group-level effect than the maximum likelihood approach did.

  19. MODEL-FREE MULTI-PROBE LENSING RECONSTRUCTION OF CLUSTER MASS PROFILES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Umetsu, Keiichi

    2013-05-20

    Lens magnification by galaxy clusters induces characteristic spatial variations in the number counts of background sources, amplifying their observed fluxes and expanding the area of sky, the net effect of which, known as magnification bias, depends on the intrinsic faint-end slope of the source luminosity function. The bias is strongly negative for red galaxies, dominated by the geometric area distortion, whereas it is mildly positive for blue galaxies, enhancing the blue counts toward the cluster center. We generalize the Bayesian approach of Umetsu et al. for reconstructing projected cluster mass profiles, by incorporating multiple populations of background sources for magnification-biasmore » measurements and combining them with complementary lens-distortion measurements, effectively breaking the mass-sheet degeneracy and improving the statistical precision of cluster mass measurements. The approach can be further extended to include strong-lensing projected mass estimates, thus allowing for non-parametric absolute mass determinations in both the weak and strong regimes. We apply this method to our recent CLASH lensing measurements of MACS J1206.2-0847, and demonstrate how combining multi-probe lensing constraints can improve the reconstruction of cluster mass profiles. This method will also be useful for a stacked lensing analysis, combining all lensing-related effects in the cluster regime, for a definitive determination of the averaged mass profile.« less

  20. Hierarchical Bayesian Spatio–Temporal Analysis of Climatic and Socio–Economic Determinants of Rocky Mountain Spotted Fever

    PubMed Central

    Raghavan, Ram K.; Goodin, Douglas G.; Neises, Daniel; Anderson, Gary A.; Ganta, Roman R.

    2016-01-01

    This study aims to examine the spatio-temporal dynamics of Rocky Mountain spotted fever (RMSF) prevalence in four contiguous states of Midwestern United States, and to determine the impact of environmental and socio–economic factors associated with this disease. Bayesian hierarchical models were used to quantify space and time only trends and spatio–temporal interaction effect in the case reports submitted to the state health departments in the region. Various socio–economic, environmental and climatic covariates screened a priori in a bivariate procedure were added to a main–effects Bayesian model in progressive steps to evaluate important drivers of RMSF space-time patterns in the region. Our results show a steady increase in RMSF incidence over the study period to newer geographic areas, and the posterior probabilities of county-specific trends indicate clustering of high risk counties in the central and southern parts of the study region. At the spatial scale of a county, the prevalence levels of RMSF is influenced by poverty status, average relative humidity, and average land surface temperature (>35°C) in the region, and the relevance of these factors in the context of climate–change impacts on tick–borne diseases are discussed. PMID:26942604

  1. Hierarchical Bayesian Spatio-Temporal Analysis of Climatic and Socio-Economic Determinants of Rocky Mountain Spotted Fever.

    PubMed

    Raghavan, Ram K; Goodin, Douglas G; Neises, Daniel; Anderson, Gary A; Ganta, Roman R

    2016-01-01

    This study aims to examine the spatio-temporal dynamics of Rocky Mountain spotted fever (RMSF) prevalence in four contiguous states of Midwestern United States, and to determine the impact of environmental and socio-economic factors associated with this disease. Bayesian hierarchical models were used to quantify space and time only trends and spatio-temporal interaction effect in the case reports submitted to the state health departments in the region. Various socio-economic, environmental and climatic covariates screened a priori in a bivariate procedure were added to a main-effects Bayesian model in progressive steps to evaluate important drivers of RMSF space-time patterns in the region. Our results show a steady increase in RMSF incidence over the study period to newer geographic areas, and the posterior probabilities of county-specific trends indicate clustering of high risk counties in the central and southern parts of the study region. At the spatial scale of a county, the prevalence levels of RMSF is influenced by poverty status, average relative humidity, and average land surface temperature (>35°C) in the region, and the relevance of these factors in the context of climate-change impacts on tick-borne diseases are discussed.

  2. Framework for network modularization and Bayesian network analysis to investigate the perturbed metabolic network

    PubMed Central

    2011-01-01

    Background Genome-scale metabolic network models have contributed to elucidating biological phenomena, and predicting gene targets to engineer for biotechnological applications. With their increasing importance, their precise network characterization has also been crucial for better understanding of the cellular physiology. Results We herein introduce a framework for network modularization and Bayesian network analysis (FMB) to investigate organism’s metabolism under perturbation. FMB reveals direction of influences among metabolic modules, in which reactions with similar or positively correlated flux variation patterns are clustered, in response to specific perturbation using metabolic flux data. With metabolic flux data calculated by constraints-based flux analysis under both control and perturbation conditions, FMB, in essence, reveals the effects of specific perturbations on the biological system through network modularization and Bayesian network analysis at metabolic modular level. As a demonstration, this framework was applied to the genetically perturbed Escherichia coli metabolism, which is a lpdA gene knockout mutant, using its genome-scale metabolic network model. Conclusions After all, it provides alternative scenarios of metabolic flux distributions in response to the perturbation, which are complementary to the data obtained from conventionally available genome-wide high-throughput techniques or metabolic flux analysis. PMID:22784571

  3. Framework for network modularization and Bayesian network analysis to investigate the perturbed metabolic network.

    PubMed

    Kim, Hyun Uk; Kim, Tae Yong; Lee, Sang Yup

    2011-01-01

    Genome-scale metabolic network models have contributed to elucidating biological phenomena, and predicting gene targets to engineer for biotechnological applications. With their increasing importance, their precise network characterization has also been crucial for better understanding of the cellular physiology. We herein introduce a framework for network modularization and Bayesian network analysis (FMB) to investigate organism's metabolism under perturbation. FMB reveals direction of influences among metabolic modules, in which reactions with similar or positively correlated flux variation patterns are clustered, in response to specific perturbation using metabolic flux data. With metabolic flux data calculated by constraints-based flux analysis under both control and perturbation conditions, FMB, in essence, reveals the effects of specific perturbations on the biological system through network modularization and Bayesian network analysis at metabolic modular level. As a demonstration, this framework was applied to the genetically perturbed Escherichia coli metabolism, which is a lpdA gene knockout mutant, using its genome-scale metabolic network model. After all, it provides alternative scenarios of metabolic flux distributions in response to the perturbation, which are complementary to the data obtained from conventionally available genome-wide high-throughput techniques or metabolic flux analysis.

  4. Evolution of Dengue Virus Type 3 Genotype III in Venezuela: Diversification, Rates and Population Dynamics

    PubMed Central

    2010-01-01

    Background Dengue virus (DENV) is a member of the genus Flavivirus of the family Flaviviridae. DENV are comprised of four distinct serotypes (DENV-1 through DENV-4) and each serotype can be divided in different genotypes. Currently, there is a dramatic emergence of DENV-3 genotype III in Latin America. Nevertheless, we still have an incomplete understanding of the evolutionary forces underlying the evolution of this genotype in this region of the world. In order to gain insight into the degree of genetic variability, rates and patterns of evolution of this genotype in Venezuela and the South American region, phylogenetic analysis, based on a large number (n = 119) of envelope gene sequences from DENV-3 genotype III strains isolated in Venezuela from 2001 to 2008, were performed. Results Phylogenetic analysis revealed an in situ evolution of DENV-3 genotype III following its introduction in the Latin American region, where three different genetic clusters (A to C) can be observed among the DENV-3 genotype III strains circulating in this region. Bayesian coalescent inference analyses revealed an evolutionary rate of 8.48 × 10-4 substitutions/site/year (s/s/y) for strains of cluster A, composed entirely of strains isolated in Venezuela. Amino acid substitution at position 329 of domain III of the E protein (A→V) was found in almost all E proteins from Cluster A strains. Conclusions A significant evolutionary change between DENV-3 genotype III strains that circulated in the initial years of the introduction in the continent and strains isolated in the Latin American region in recent years was observed. The presence of DENV-3 genotype III strains belonging to different clusters was observed in Venezuela, revealing several introduction events into this country. The evolutionary rate found for Cluster A strains circulating in Venezuela is similar to the others previously established for this genotype in other regions of the world. This suggests a lack of correlation among DENV genotype III substitution rate and ecological pattern of virus spread. PMID:21087501

  5. A comment on priors for Bayesian occupancy models

    PubMed Central

    Gerber, Brian D.

    2018-01-01

    Understanding patterns of species occurrence and the processes underlying these patterns is fundamental to the study of ecology. One of the more commonly used approaches to investigate species occurrence patterns is occupancy modeling, which can account for imperfect detection of a species during surveys. In recent years, there has been a proliferation of Bayesian modeling in ecology, which includes fitting Bayesian occupancy models. The Bayesian framework is appealing to ecologists for many reasons, including the ability to incorporate prior information through the specification of prior distributions on parameters. While ecologists almost exclusively intend to choose priors so that they are “uninformative” or “vague”, such priors can easily be unintentionally highly informative. Here we report on how the specification of a “vague” normally distributed (i.e., Gaussian) prior on coefficients in Bayesian occupancy models can unintentionally influence parameter estimation. Using both simulated data and empirical examples, we illustrate how this issue likely compromises inference about species-habitat relationships. While the extent to which these informative priors influence inference depends on the data set, researchers fitting Bayesian occupancy models should conduct sensitivity analyses to ensure intended inference, or employ less commonly used priors that are less informative (e.g., logistic or t prior distributions). We provide suggestions for addressing this issue in occupancy studies, and an online tool for exploring this issue under different contexts. PMID:29481554

  6. Propagation of population pharmacokinetic information using a Bayesian approach: comparison with meta-analysis.

    PubMed

    Dokoumetzidis, Aristides; Aarons, Leon

    2005-08-01

    We investigated the propagation of population pharmacokinetic information across clinical studies by applying Bayesian techniques. The aim was to summarize the population pharmacokinetic estimates of a study in appropriate statistical distributions in order to use them as Bayesian priors in consequent population pharmacokinetic analyses. Various data sets of simulated and real clinical data were fitted with WinBUGS, with and without informative priors. The posterior estimates of fittings with non-informative priors were used to build parametric informative priors and the whole procedure was carried on in a consecutive manner. The posterior distributions of the fittings with informative priors where compared to those of the meta-analysis fittings of the respective combinations of data sets. Good agreement was found, for the simulated and experimental datasets when the populations were exchangeable, with the posterior distribution from the fittings with the prior to be nearly identical to the ones estimated with meta-analysis. However, when populations were not exchangeble an alternative parametric form for the prior, the natural conjugate prior, had to be used in order to have consistent results. In conclusion, the results of a population pharmacokinetic analysis may be summarized in Bayesian prior distributions that can be used consecutively with other analyses. The procedure is an alternative to meta-analysis and gives comparable results. It has the advantage that it is faster than the meta-analysis, due to the large datasets used with the latter and can be performed when the data included in the prior are not actually available.

  7. Cross-validation to select Bayesian hierarchical models in phylogenetics.

    PubMed

    Duchêne, Sebastián; Duchêne, David A; Di Giallonardo, Francesca; Eden, John-Sebastian; Geoghegan, Jemma L; Holt, Kathryn E; Ho, Simon Y W; Holmes, Edward C

    2016-05-26

    Recent developments in Bayesian phylogenetic models have increased the range of inferences that can be drawn from molecular sequence data. Accordingly, model selection has become an important component of phylogenetic analysis. Methods of model selection generally consider the likelihood of the data under the model in question. In the context of Bayesian phylogenetics, the most common approach involves estimating the marginal likelihood, which is typically done by integrating the likelihood across model parameters, weighted by the prior. Although this method is accurate, it is sensitive to the presence of improper priors. We explored an alternative approach based on cross-validation that is widely used in evolutionary analysis. This involves comparing models according to their predictive performance. We analysed simulated data and a range of viral and bacterial data sets using a cross-validation approach to compare a variety of molecular clock and demographic models. Our results show that cross-validation can be effective in distinguishing between strict- and relaxed-clock models and in identifying demographic models that allow growth in population size over time. In most of our empirical data analyses, the model selected using cross-validation was able to match that selected using marginal-likelihood estimation. The accuracy of cross-validation appears to improve with longer sequence data, particularly when distinguishing between relaxed-clock models. Cross-validation is a useful method for Bayesian phylogenetic model selection. This method can be readily implemented even when considering complex models where selecting an appropriate prior for all parameters may be difficult.

  8. Deep Galex Observations of the Coma Cluster: Source Catalog and Galaxy Counts

    NASA Technical Reports Server (NTRS)

    Hammer, D.; Hornschemeier, A. E.; Mobasher, B.; Miller, N.; Smith, R.; Arnouts, S.; Milliard, B.; Jenkins, L.

    2010-01-01

    We present a source catalog from deep 26 ks GALEX observations of the Coma cluster in the far-UV (FUV; 1530 Angstroms) and near-UV (NUV; 2310 Angstroms) wavebands. The observed field is centered 0.9 deg. (1.6 Mpc) south-west of the Coma core, and has full optical photometric coverage by SDSS and spectroscopic coverage to r-21. The catalog consists of 9700 galaxies with GALEX and SDSS photometry, including 242 spectroscopically-confirmed Coma member galaxies that range from giant spirals and elliptical galaxies to dwarf irregular and early-type galaxies. The full multi-wavelength catalog (cluster plus background galaxies) is 80% complete to NUV=23 and FUV=23.5, and has a limiting depth at NUV=24.5 and FUV=25.0 which corresponds to a star formation rate of 10(exp -3) solar mass yr(sup -1) at the distance of Coma. The GALEX images presented here are very deep and include detections of many resolved cluster members superposed on a dense field of unresolved background galaxies. This required a two-fold approach to generating a source catalog: we used a Bayesian deblending algorithm to measure faint and compact sources (using SDSS coordinates as a position prior), and used the GALEX pipeline catalog for bright and/or extended objects. We performed simulations to assess the importance of systematic effects (e.g. object blends, source confusion, Eddington Bias) that influence source detection and photometry when using both methods. The Bayesian deblending method roughly doubles the number of source detections and provides reliable photometry to a few magnitudes deeper than the GALEX pipeline catalog. This method is also free from source confusion over the UV magnitude range studied here: conversely, we estimate that the GALEX pipeline catalogs are confusion limited at NUV approximately 23 and FUV approximately 24. We have measured the total UV galaxy counts using our catalog and report a 50% excess of counts across FUV=22-23.5 and NUV=21.5-23 relative to previous GALEX measurements, which is not attributed to cluster member galaxies. Our galaxy counts are a better match to deeper UV counts measured with HST.

  9. Small area clustering of under-five children's mortality and associated factors using geo-additive Bayesian discrete-time survival model in Kersa HDSS, Ethiopia.

    PubMed

    Dedefo, Melkamu; Oljira, Lemessa; Assefa, Nega

    2016-02-01

    Child mortality reflects a country's level of socio-economic development and quality of life. In Ethiopia, limited studies were conducted on under-five mortality and almost none of them tried to identify the spatial effect on mortality. Thus, this study explored the small area clustering of under-five mortality and associated factors in Kersa HDSS, Eastern Ethiopia. The study population included all children under the age of five years during the time September, 2008-august 31, 2012 which are registered in Kersa Health and Demographic Surveillance System (Kersa HDSS). A flexible Bayesian geo-additive discrete-time survival mixed model was used. Some of the factors that are significantly associated with under-five mortality, with posterior odds ratio and 95% credible intervals, are maternal educational status 1.31(1.13,-1.49), place of delivery 1.016(1.013-1.12), no of live birth at a delivery 0.35(0.23,1.83), low household wealth index 1.26(1.10 1.43) middle level household wealth index 0.95 (0.84 1.07) pre-term duration of pregnancy 1.95(1.27,2.91), post-term duration of pregnancy 0.74(0.60,0.93) and antenatal visit 1.19(1.06, 1.35). Variation was noted in the risk of under-five mortality by the selected small administrative regions (kebeles). This study reveals geographic patterns in rates of under-five mortality in those selected small administrative regions and shows some important determinants of under-five mortality. More importantly, we observed clustering of under-five mortality, which indicates the importance of spatial effects and presentation of this clustering through maps that facilitates visuality and highlights differentials across geographical areas that would, otherwise, be overlooked in traditional data-analytic methods. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Phylogenetic analyses of RPB1 and RPB2 support a middle Cretaceous origin for a clade comprising all agriculturally and medically important fusaria

    USDA-ARS?s Scientific Manuscript database

    Fusarium (Hypocreales, Nectriaceae) is one of the most economically important and systematically challenging groups of mycotoxigenic phytopathogens and emergent human pathogens. We conducted maximum likelihood (ML), maximum parsimony (MP) and Bayesian (B) analyses on partial RNA polymerase largest (...

  11. Reuse, Recycle, Reweigh: Combating Influenza through Efficient Sequential Bayesian Computation for Massive Data.

    PubMed

    Tom, Jennifer A; Sinsheimer, Janet S; Suchard, Marc A

    Massive datasets in the gigabyte and terabyte range combined with the availability of increasingly sophisticated statistical tools yield analyses at the boundary of what is computationally feasible. Compromising in the face of this computational burden by partitioning the dataset into more tractable sizes results in stratified analyses, removed from the context that justified the initial data collection. In a Bayesian framework, these stratified analyses generate intermediate realizations, often compared using point estimates that fail to account for the variability within and correlation between the distributions these realizations approximate. However, although the initial concession to stratify generally precludes the more sensible analysis using a single joint hierarchical model, we can circumvent this outcome and capitalize on the intermediate realizations by extending the dynamic iterative reweighting MCMC algorithm. In doing so, we reuse the available realizations by reweighting them with importance weights, recycling them into a now tractable joint hierarchical model. We apply this technique to intermediate realizations generated from stratified analyses of 687 influenza A genomes spanning 13 years allowing us to revisit hypotheses regarding the evolutionary history of influenza within a hierarchical statistical framework.

  12. Reuse, Recycle, Reweigh: Combating Influenza through Efficient Sequential Bayesian Computation for Massive Data

    PubMed Central

    Tom, Jennifer A.; Sinsheimer, Janet S.; Suchard, Marc A.

    2015-01-01

    Massive datasets in the gigabyte and terabyte range combined with the availability of increasingly sophisticated statistical tools yield analyses at the boundary of what is computationally feasible. Compromising in the face of this computational burden by partitioning the dataset into more tractable sizes results in stratified analyses, removed from the context that justified the initial data collection. In a Bayesian framework, these stratified analyses generate intermediate realizations, often compared using point estimates that fail to account for the variability within and correlation between the distributions these realizations approximate. However, although the initial concession to stratify generally precludes the more sensible analysis using a single joint hierarchical model, we can circumvent this outcome and capitalize on the intermediate realizations by extending the dynamic iterative reweighting MCMC algorithm. In doing so, we reuse the available realizations by reweighting them with importance weights, recycling them into a now tractable joint hierarchical model. We apply this technique to intermediate realizations generated from stratified analyses of 687 influenza A genomes spanning 13 years allowing us to revisit hypotheses regarding the evolutionary history of influenza within a hierarchical statistical framework. PMID:26681992

  13. U.S. consumer demand for restaurant calorie information: targeting demographic and behavioral segments in labeling initiatives.

    PubMed

    Kolodinsky, Jane; Reynolds, Travis William; Cannella, Mark; Timmons, David; Bromberg, Daniel

    2009-01-01

    To identify different segments of U.S. consumers based on food choices, exercise patterns, and desire for restaurant calorie labeling. Using a stratified (by region) random sample of the U.S. population, trained interviewers collected data for this cross-sectional study through telephone surveys. Center for Rural Studies U.S. national health survey. The final sample included 580 responses (22% response rate); data were weighted to be representative of age and gender characteristics of the U.S. population. Self-reported behaviors related to food choices, exercise patterns, desire for calorie information in restaurants, and sample demographics. Clusters were identified using Schwartz Bayesian criteria. Impacts of demographic characteristics on cluster membership were analyzed using bivariate tests of association and multinomial logit regression. Cluster analysis revealed three clusters based on respondents' food choices, activity levels, and desire for restaurant labeling. Two clusters, comprising three quarters of the sample, desired calorie labeling in restaurants. The remaining cluster opposed restaurant labeling. Demographic variables significantly predicting cluster membership included region of residence (p < .10), income (p < .05), gender (p < .01), and age (p < .10). Though limited by a low response and potential self-reporting bias in the phone survey, this study suggests that several groups are likely to benefit from restaurant calorie labeling. Specific demographic clusters could be targeted through labeling initiatives.

  14. Recovering the Genetic Identity of an Extinct-in-the-Wild Species: The Puzzling Case of the Alagoas Curassow

    PubMed Central

    Costa, Mariellen C.; Oliveira, Paulo R. R.; Davanço, Paulo V.; de Camargo, Crisley; Laganaro, Natasha M.; Azeredo, Roberto A.; Simpson, James; Silveira, Luis F.

    2017-01-01

    The conservation of many endangered taxa relies on hybrid identification, and when hybrids become morphologically indistinguishable from the parental species, the use of molecular markers can assign individual admixture levels. Here, we present the puzzling case of the extinct in the wild Alagoas Curassow (Pauxi mitu), whose captive population descends from only three individuals. Hybridization with the Razor-billed Curassow (P. tuberosa) began more than eight generations ago, and admixture uncertainty affects the whole population. We applied an analysis framework that combined morphological diagnostic traits, Bayesian clustering analyses using 14 microsatellite loci, and mtDNA haplotypes to assess the ancestry of all individuals that were alive from 2008 to 2012. Simulated data revealed that our microsatellites could accurately assign an individual a hybrid origin until the second backcross generation, which permitted us to identify a pure group among the older, but still reproductive animals. No wild species has ever survived such a severe bottleneck, followed by hybridization, and studying the recovery capability of the selected pure Alagoas Curassow group might provide valuable insights into biological conservation theory. PMID:28056082

  15. Recovering the Genetic Identity of an Extinct-in-the-Wild Species: The Puzzling Case of the Alagoas Curassow.

    PubMed

    Costa, Mariellen C; Oliveira, Paulo R R; Davanço, Paulo V; Camargo, Crisley de; Laganaro, Natasha M; Azeredo, Roberto A; Simpson, James; Silveira, Luis F; Francisco, Mercival R

    2017-01-01

    The conservation of many endangered taxa relies on hybrid identification, and when hybrids become morphologically indistinguishable from the parental species, the use of molecular markers can assign individual admixture levels. Here, we present the puzzling case of the extinct in the wild Alagoas Curassow (Pauxi mitu), whose captive population descends from only three individuals. Hybridization with the Razor-billed Curassow (P. tuberosa) began more than eight generations ago, and admixture uncertainty affects the whole population. We applied an analysis framework that combined morphological diagnostic traits, Bayesian clustering analyses using 14 microsatellite loci, and mtDNA haplotypes to assess the ancestry of all individuals that were alive from 2008 to 2012. Simulated data revealed that our microsatellites could accurately assign an individual a hybrid origin until the second backcross generation, which permitted us to identify a pure group among the older, but still reproductive animals. No wild species has ever survived such a severe bottleneck, followed by hybridization, and studying the recovery capability of the selected pure Alagoas Curassow group might provide valuable insights into biological conservation theory.

  16. Genetic and morphological characterisation of the Ankole Longhorn cattle in the African Great Lakes region.

    PubMed

    Ndumu, Deo B; Baumung, Roswitha; Hanotte, Olivier; Wurzinger, Maria; Okeyo, Mwai A; Jianlin, Han; Kibogo, Harrison; Sölkner, Johann

    2008-01-01

    The study investigated the population structure, diversity and differentiation of almost all of the ecotypes representing the African Ankole Longhorn cattle breed on the basis of morphometric (shape and size), genotypic and spatial distance data. Twentyone morphometric measurements were used to describe the morphology of 439 individuals from 11 sub-populations located in five countries around the Great Lakes region of central and eastern Africa. Additionally, 472 individuals were genotyped using 15 DNA microsatellites. Femoral length, horn length, horn circumference, rump height, body length and fore-limb circumference showed the largest differences between regions. An overall FST index indicated that 2.7% of the total genetic variation was present among sub-populations. The least differentiation was observed between the two sub-populations of Mbarara south and Luwero in Uganda, while the highest level of differentiation was observed between the Mugamba in Burundi and Malagarasi in Tanzania. An estimated membership of four for the inferred clusters from a model-based Bayesian approach was obtained. Both analyses on distance-based and model-based methods consistently isolated the Mugamba sub-population in Burundi from the others.

  17. Population genetic structure and conservation genetics of threatened Okaloosa darters (Etheostoma okaloosae).

    USGS Publications Warehouse

    Austin, James D.; Jelks, Howard L.; Tate, Bill; Johnson, Aria R.; Jordan, Frank

    2011-01-01

    Imperiled Okaloosa darters (Etheostoma okaloosae) are small, benthic fish limited to six streams that flow into three bayous of Choctawhatchee Bay in northwest Florida, USA. We analyzed the complete mitochondrial cytochrome b gene and 10 nuclear microsatellite loci for 255 and 273 Okaloosa darters, respectively. Bayesian clustering analyses and AMOVA reflect congruent population genetic structure in both mitochondrial and microsatellite DNA. This structure reveals historical isolation of Okaloosa darter streams nested within bayous. Most of the six streams appear to have exchanged migrants though they remain genetically distinct. The U.S. Fish and Wildlife Service recently reclassified Okaloosa darters from endangered to threatened status. Our genetic data support the reclassification of Okaloosa darter Evolutionary Significant Units (ESUs) in the larger Tom's, Turkey, and Rocky creeks from endangered to threatened status. However, the three smaller drainages (Mill, Swift, and Turkey Bolton creeks) remain at risk due to their small population sizes and anthropogenic pressures on remaining habitat. Natural resource managers now have the evolutionary information to guide recovery actions within and among drainages throughout the range of the Okaloosa darter.

  18. Back to BaySICS: a user-friendly program for Bayesian Statistical Inference from Coalescent Simulations.

    PubMed

    Sandoval-Castellanos, Edson; Palkopoulou, Eleftheria; Dalén, Love

    2014-01-01

    Inference of population demographic history has vastly improved in recent years due to a number of technological and theoretical advances including the use of ancient DNA. Approximate Bayesian computation (ABC) stands among the most promising methods due to its simple theoretical fundament and exceptional flexibility. However, limited availability of user-friendly programs that perform ABC analysis renders it difficult to implement, and hence programming skills are frequently required. In addition, there is limited availability of programs able to deal with heterochronous data. Here we present the software BaySICS: Bayesian Statistical Inference of Coalescent Simulations. BaySICS provides an integrated and user-friendly platform that performs ABC analyses by means of coalescent simulations from DNA sequence data. It estimates historical demographic population parameters and performs hypothesis testing by means of Bayes factors obtained from model comparisons. Although providing specific features that improve inference from datasets with heterochronous data, BaySICS also has several capabilities making it a suitable tool for analysing contemporary genetic datasets. Those capabilities include joint analysis of independent tables, a graphical interface and the implementation of Markov-chain Monte Carlo without likelihoods.

  19. BM-Map: Bayesian Mapping of Multireads for Next-Generation Sequencing Data

    PubMed Central

    Ji, Yuan; Xu, Yanxun; Zhang, Qiong; Tsui, Kam-Wah; Yuan, Yuan; Norris, Clift; Liang, Shoudan; Liang, Han

    2011-01-01

    Summary Next-generation sequencing (NGS) technology generates millions of short reads, which provide valuable information for various aspects of cellular activities and biological functions. A key step in NGS applications (e.g., RNA-Seq) is to map short reads to correct genomic locations within the source genome. While most reads are mapped to a unique location, a significant proportion of reads align to multiple genomic locations with equal or similar numbers of mismatches; these are called multireads. The ambiguity in mapping the multireads may lead to bias in downstream analyses. Currently, most practitioners discard the multireads in their analysis, resulting in a loss of valuable information, especially for the genes with similar sequences. To refine the read mapping, we develop a Bayesian model that computes the posterior probability of mapping a multiread to each competing location. The probabilities are used for downstream analyses, such as the quantification of gene expression. We show through simulation studies and RNA-Seq analysis of real life data that the Bayesian method yields better mapping than the current leading methods. We provide a C++ program for downloading that is being packaged into a user-friendly software. PMID:21517792

  20. Rapid genetic diversification within dog breeds as evidenced by a case study on Schnauzers.

    PubMed

    Streitberger, K; Schweizer, M; Kropatsch, R; Dekomien, G; Distl, O; Fischer, M S; Epplen, J T; Hertwig, S T

    2012-10-01

    As a result of strong artificial selection, the domesticated dog has arguably become one of the most morphologically diverse vertebrate species, which is mirrored in the classification of around 400 different breeds. To test the influence of breeding history on the genetic structure and variability of today's dog breeds, we investigated 12 dog breeds using a set of 19 microsatellite markers from a total of 597 individuals with about 50 individuals analysed per breed. High genetic diversity was noted over all breeds, with the ancient Asian breeds (Akita, Chow Chow, Shar Pei) exhibiting the highest variability, as was indicated chiefly by an extraordinarily high number of rare and private alleles. Using a Bayesian clustering method, we detected significant genetic stratification within the closely related Schnauzer breeds. The individuals of these three recently differentiated breeds (Miniature, Standard and Giant Schnauzer) could not be assigned to a single cluster each. This hidden genetic structure was probably caused by assortative mating owing to breeders' preferences regarding coat colour types and the underlying practice of breeding in separate lineages. Such processes of strong artificial disruptive selection for different morphological traits in isolated and relatively small lineages can result in the rapid creation of new dog types and potentially new breeds and represent a unique opportunity to study the evolution of genetic and morphological differences in recently diverged populations. © 2011 The Authors, Animal Genetics © 2011 Stichting International Foundation for Animal Genetics.

  1. Genetic diversity of Pinus nigra Arn. populations in Southern Spain and Northern Morocco revealed by inter-simple sequence repeat profiles.

    PubMed

    Rubio-Moraga, Angela; Candel-Perez, David; Lucas-Borja, Manuel E; Tiscar, Pedro A; Viñegla, Benjamin; Linares, Juan C; Gómez-Gómez, Lourdes; Ahrazem, Oussama

    2012-01-01

    Eight Pinus nigra Arn. populations from Southern Spain and Northern Morocco were examined using inter-simple sequence repeat markers to characterize the genetic variability amongst populations. Pair-wise population genetic distance ranged from 0.031 to 0.283, with a mean of 0.150 between populations. The highest inter-population average distance was between PaCU from Cuenca and YeCA from Cazorla, while the lowest distance was between TaMO from Morocco and MA Sierra Mágina populations. Analysis of molecular variance (AMOVA) and Nei's genetic diversity analyses revealed higher genetic variation within the same population than among different populations. Genetic differentiation (Gst) was 0.233. Cuenca showed the highest Nei's genetic diversity followed by the Moroccan region, Sierra Mágina, and Cazorla region. However, clustering of populations was not in accordance with their geographical locations. Principal component analysis showed the presence of two major groups-Group 1 contained all populations from Cuenca while Group 2 contained populations from Cazorla, Sierra Mágina and Morocco-while Bayesian analysis revealed the presence of three clusters. The low genetic diversity observed in PaCU and YeCA is probably a consequence of inappropriate management since no estimation of genetic variability was performed before the silvicultural treatments. Data indicates that the inter-simple sequence repeat (ISSR) method is sufficiently informative and powerful to assess genetic variability among populations of P. nigra.

  2. Genetic Diversity of Pinus nigra Arn. Populations in Southern Spain and Northern Morocco Revealed By Inter-Simple Sequence Repeat Profiles †

    PubMed Central

    Rubio-Moraga, Angela; Candel-Perez, David; Lucas-Borja, Manuel E.; Tiscar, Pedro A.; Viñegla, Benjamin; Linares, Juan C.; Gómez-Gómez, Lourdes; Ahrazem, Oussama

    2012-01-01

    Eight Pinus nigra Arn. populations from Southern Spain and Northern Morocco were examined using inter-simple sequence repeat markers to characterize the genetic variability amongst populations. Pair-wise population genetic distance ranged from 0.031 to 0.283, with a mean of 0.150 between populations. The highest inter-population average distance was between PaCU from Cuenca and YeCA from Cazorla, while the lowest distance was between TaMO from Morocco and MA Sierra Mágina populations. Analysis of molecular variance (AMOVA) and Nei’s genetic diversity analyses revealed higher genetic variation within the same population than among different populations. Genetic differentiation (Gst) was 0.233. Cuenca showed the highest Nei’s genetic diversity followed by the Moroccan region, Sierra Mágina, and Cazorla region. However, clustering of populations was not in accordance with their geographical locations. Principal component analysis showed the presence of two major groups—Group 1 contained all populations from Cuenca while Group 2 contained populations from Cazorla, Sierra Mágina and Morocco—while Bayesian analysis revealed the presence of three clusters. The low genetic diversity observed in PaCU and YeCA is probably a consequence of inappropriate management since no estimation of genetic variability was performed before the silvicultural treatments. Data indicates that the inter-simple sequence repeat (ISSR) method is sufficiently informative and powerful to assess genetic variability among populations of P. nigra. PMID:22754321

  3. A Daily Diary Study of Posttraumatic Stress Symptoms and Romantic Partner Accommodation

    PubMed Central

    Campbell, Sarah B.; Renshaw, Keith D.; Kashdan, Todd B.; Curby, Timothy W.; Carter, Sarah P.

    2017-01-01

    Little is known about the role of romantic partner symptom accommodation in PTSD symptom maintenance. To explore the bidirectional associations of posttraumatic stress disorder (PTSD) symptoms and romantic partner symptom accommodation over time, military servicemen (n = 64) with symptoms of PTSD and their co-habiting heterosexual civilian romantic partners (n = 64) completed a 2-week daily diary study. Cross-lagged, autoregressive models assessed the stability of men’s PTSD symptoms and partners’ accommodation, as well as the prospective associations of earlier PTSD symptoms with later accommodation and vice versa. Analyses used Bayesian estimation to provide point estimates (b) and Credible Intervals (CIs). In all models, PTSD symptoms (total and individual clusters) were highly stable (b = 0.91; CI: 0.88–0.95), and accommodation was moderately stable (b = 0.48; CI: 0.40–0.54). In all models, earlier PTSD symptoms (total and clusters) were significantly, positively associated with later accommodation (b = 0.04; CI: 0.02–0.07). In contrast, earlier accommodation was significantly associated only with later situational avoidance (b = 0.02; CI: 0.00–0.07). Thus, PTSD symptoms may lead to subsequent accommodating behaviors in romantic partners, but partner accommodation seems to contribute only to survivors’ future situational avoidance symptoms. The findings reinforce the notion that PTSD symptoms have an impact on relationship behaviors, and that accommodation from partners may sustain avoidant behaviors in particular. Clinicians should attend to romantic partners’ accommodating behaviors when working with survivors. PMID:28270332

  4. Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data

    PubMed Central

    Duan, Leo L.; Clancy, John P.; Szczesniak, Rhonda D.

    2016-01-01

    We propose a novel “tree-averaging” model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplemental materials are available online. PMID:27524872

  5. Bayesian correlated clustering to integrate multiple datasets

    PubMed Central

    Kirk, Paul; Griffin, Jim E.; Savage, Richard S.; Ghahramani, Zoubin; Wild, David L.

    2012-01-01

    Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct—but often complementary—information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. Results: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI’s performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation–chip and protein–protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques—as well as to non-integrative approaches—demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods. Availability: A Matlab implementation of MDI is available from http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/. Contact: D.L.Wild@warwick.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23047558

  6. Genetic structuring and recent demographic history of red pandas (Ailurus fulgens) inferred from microsatellite and mitochondrial DNA.

    PubMed

    Hu, Yibo; Guo, Yu; Qi, Dunwu; Zhan, Xiangjiang; Wu, Hua; Bruford, Michael W; Wei, Fuwen

    2011-07-01

    Clarification of the genetic structure and population history of a species can shed light on the impacts of landscapes, historical climate change and contemporary human activities and thus enables evidence-based conservation decisions for endangered organisms. The red panda (Ailurus fulgens) is an endangered species distributing at the edge of the Qinghai-Tibetan Plateau and is currently subject to habitat loss, fragmentation and population decline, thus representing a good model to test the influences of the above-mentioned factors on a plateau edge species. We combined nine microsatellite loci and 551 bp of mitochondrial control region (mtDNA CR) to explore the genetic structure and demographic history of this species. A total of 123 individuals were sampled from 23 locations across five populations. High levels of genetic variation were identified for both mtDNA and microsatellites. Phylogeographic analyses indicated little geographic structure, suggesting historically wide gene flow. However, microsatellite-based Bayesian clustering clearly identified three groups (Qionglai-Liangshan, Xiaoxiangling and Gaoligong-Tibet). A significant isolation-by-distance pattern was detected only after removing Xiaoxiangling. For mtDNA data, there was no statistical support for a historical population expansion or contraction for the whole sample or any population except Xiaoxiangling where a signal of contraction was detected. However, Bayesian simulations of population history using microsatellite data did pinpoint population declines for Qionglai, Xiaoxiangling and Gaoligong, demonstrating significant influences of human activity on demography. The unique history of the Xiaoxiangling population plays a critical role in shaping the genetic structure of this species, and large-scale habitat loss and fragmentation is hampering gene flow among populations. The implications of our findings for the biogeography of the Qinghai-Tibetan Plateau, subspecies classification and conservation of red pandas are discussed. © 2011 Blackwell Publishing Ltd.

  7. Exploring the IMF of star clusters: a joint SLUG and LEGUS effort

    NASA Astrophysics Data System (ADS)

    Ashworth, G.; Fumagalli, M.; Krumholz, M. R.; Adamo, A.; Calzetti, D.; Chandar, R.; Cignoni, M.; Dale, D.; Elmegreen, B. G.; Gallagher, J. S., III; Gouliermis, D. A.; Grasha, K.; Grebel, E. K.; Johnson, K. E.; Lee, J.; Tosi, M.; Wofford, A.

    2017-08-01

    We present the implementation of a Bayesian formalism within the Stochastically Lighting Up Galaxies (slug) stellar population synthesis code, which is designed to investigate variations in the initial mass function (IMF) of star clusters. By comparing observed cluster photometry to large libraries of clusters simulated with a continuously varying IMF, our formalism yields the posterior probability distribution function (PDF) of the cluster mass, age and extinction, jointly with the parameters describing the IMF. We apply this formalism to a sample of star clusters from the nearby galaxy NGC 628, for which broad-band photometry in five filters is available as part of the Legacy ExtraGalactic UV Survey (LEGUS). After allowing the upper-end slope of the IMF (α3) to vary, we recover PDFs for the mass, age and extinction that are broadly consistent with what is found when assuming an invariant Kroupa IMF. However, the posterior PDF for α3 is very broad due to a strong degeneracy with the cluster mass, and it is found to be sensitive to the choice of priors, particularly on the cluster mass. We find only a modest improvement in the constraining power of α3 when adding Hα photometry from the companion Hα-LEGUS survey. Conversely, Hα photometry significantly improves the age determination, reducing the frequency of multi-modal PDFs. With the aid of mock clusters, we quantify the degeneracy between physical parameters, showing how constraints on the cluster mass that are independent of photometry can be used to pin down the IMF properties of star clusters.

  8. Formation history of open clusters constrained by detailed asteroseismology of red giant stars observed by Kepler

    NASA Astrophysics Data System (ADS)

    Corsaro, Enrico; Lee, Yueh-Ning; García, Rafael A.; Hennebelle, Patrick; Mathur, Savita; Beck, Paul G.; Mathis, Stephane; Stello, Dennis; Bouvier, Jérôme

    2017-10-01

    Stars originate by the gravitational collapse of a turbulent molecular cloud of a diffuse medium, and are often observed to form clusters. Stellar clusters therefore play an important role in our understanding of star formation and of the dynamical processes at play. However, investigating the cluster formation is diffcult because the density of the molecular cloud undergoes a change of many orders of magnitude. Hierarchical-step approaches to decompose the problem into different stages are therefore required, as well as reliable assumptions on the initial conditions in the clouds. We report for the first time the use of the full potential of NASA Kepler asteroseismic observations coupled with 3D numerical simulations, to put strong constraints on the early formation stages of open clusters. Thanks to a Bayesian peak bagging analysis of about 50 red giant members of NGC 6791 and NGC 6819, the two most populated open clusters observed in the nominal Kepler mission, we derive a complete set of detailed oscillation mode properties for each star, with thousands of oscillation modes characterized. We therefore show how these asteroseismic properties lead us to a discovery about the rotation history of stellar clusters. Finally, our observational findings will be compared with hydrodynamical simulations for stellar cluster formation to constrain the physical processes of turbulence, rotation, and magnetic fields that are in action during the collapse of the progenitor cloud into a proto-cluster.

  9. The Impact of Environment on the Stellar Mass–Halo Mass Relation

    NASA Astrophysics Data System (ADS)

    Golden-Marx, Jesse B.; Miller, Christopher J.

    2018-06-01

    A large variance exists in the amplitude of the stellar mass–halo mass (SMHM) relation for group- and cluster-size halos. Using a sample of 254 clusters, we show that the magnitude gap between the brightest central galaxy (BCG) and its second or fourth brightest neighbor accounts for a significant portion of this variance. We find that at fixed halo mass, galaxy clusters with a larger magnitude gap have a higher BCG stellar mass. This relationship is also observed in semi-analytic representations of low-redshift galaxy clusters in simulations. This SMHM–magnitude gap stratification likely results from BCG growth via hierarchical mergers and may link the assembly of the halo with the growth of the BCG. Using a Bayesian model, we quantify the importance of the magnitude gap in the SMHM relation using a multiplicative stretch factor, which we find to be significantly non-zero. The inclusion of the magnitude gap in the SMHM relation results in a large reduction in the inferred intrinsic scatter in the BCG stellar mass at fixed halo mass. We discuss the ramifications of this result in the context of galaxy formation models of centrals in group- and cluster-size halos.

  10. Advances in Significance Testing for Cluster Detection

    NASA Astrophysics Data System (ADS)

    Coleman, Deidra Andrea

    Over the past two decades, much attention has been given to data driven project goals such as the Human Genome Project and the development of syndromic surveillance systems. A major component of these types of projects is analyzing the abundance of data. Detecting clusters within the data can be beneficial as it can lead to the identification of specified sequences of DNA nucleotides that are related to important biological functions or the locations of epidemics such as disease outbreaks or bioterrorism attacks. Cluster detection techniques require efficient and accurate hypothesis testing procedures. In this dissertation, we improve upon the hypothesis testing procedures for cluster detection by enhancing distributional theory and providing an alternative method for spatial cluster detection using syndromic surveillance data. In Chapter 2, we provide an efficient method to compute the exact distribution of the number and coverage of h-clumps of a collection of words. This method involves defining a Markov chain using a minimal deterministic automaton to reduce the number of states needed for computation. We allow words of the collection to contain other words of the collection making the method more general. We use our method to compute the distributions of the number and coverage of h-clumps in the Chi motif of H. influenza.. In Chapter 3, we provide an efficient algorithm to compute the exact distribution of multiple window discrete scan statistics for higher-order, multi-state Markovian sequences. This algorithm involves defining a Markov chain to efficiently keep track of probabilities needed to compute p-values of the statistic. We use our algorithm to identify cases where the available approximation does not perform well. We also use our algorithm to detect unusual clusters of made free throw shots by National Basketball Association players during the 2009-2010 regular season. In Chapter 4, we give a procedure to detect outbreaks using syndromic surveillance data while controlling the Bayesian False Discovery Rate (BFDR). The procedure entails choosing an appropriate Bayesian model that captures the spatial dependency inherent in epidemiological data and considers all days of interest, selecting a test statistic based on a chosen measure that provides the magnitude of the maximumal spatial cluster for each day, and identifying a cutoff value that controls the BFDR for rejecting the collective null hypothesis of no outbreak over a collection of days for a specified region.We use our procedure to analyze botulism-like syndrome data collected by the North Carolina Disease Event Tracking and Epidemiologic Collection Tool (NC DETECT).

  11. Alternative models in genetic analyses of carcass traits measured by ultrasonography in Guzerá cattle: A Bayesian approach

    USDA-ARS?s Scientific Manuscript database

    The objective was to study alternative models for genetic analyses of carcass traits assessed by ultrasonography in Guzerá cattle. Data from 947 measurements (655 animals) of Rib-eye area (REA), rump fat thickness (RFT) and backfat thickness (BFT) were used. Finite polygenic models (FPM), infinitesi...

  12. Innovative Bayesian and Parsimony Phylogeny of Dung Beetles (Coleoptera, Scarabaeidae, Scarabaeinae) Enhanced by Ontology-Based Partitioning of Morphological Characters

    PubMed Central

    Tarasov, Sergei; Génier, François

    2015-01-01

    Scarabaeine dung beetles are the dominant dung feeding group of insects and are widely used as model organisms in conservation, ecology and developmental biology. Due to the conflicts among 13 recently published phylogenies dealing with the higher-level relationships of dung beetles, the phylogeny of this lineage remains largely unresolved. In this study, we conduct rigorous phylogenetic analyses of dung beetles, based on an unprecedented taxon sample (110 taxa) and detailed investigation of morphology (205 characters). We provide the description of morphology and thoroughly illustrate the used characters. Along with parsimony, traditionally used in the analysis of morphological data, we also apply the Bayesian method with a novel approach that uses anatomy ontology for matrix partitioning. This approach allows for heterogeneity in evolutionary rates among characters from different anatomical regions. Anatomy ontology generates a number of parameter-partition schemes which we compare using Bayes factor. We also test the effect of inclusion of autapomorphies in the morphological analysis, which hitherto has not been examined. Generally, schemes with more parameters were favored in the Bayesian comparison suggesting that characters located on different body regions evolve at different rates and that partitioning of the data matrix using anatomy ontology is reasonable; however, trees from the parsimony and all the Bayesian analyses were quite consistent. The hypothesized phylogeny reveals many novel clades and provides additional support for some clades recovered in previous analyses. Our results provide a solid basis for a new classification of dung beetles, in which the taxonomic limits of the tribes Dichotomiini, Deltochilini and Coprini are restricted and many new tribes must be described. Based on the consistency of the phylogeny with biogeography, we speculate that dung beetles may have originated in the Mesozoic contrary to the traditional view pointing to a Cenozoic origin. PMID:25781019

  13. A Bayesian sequential design using alpha spending function to control type I error.

    PubMed

    Zhu, Han; Yu, Qingzhao

    2017-10-01

    We propose in this article a Bayesian sequential design using alpha spending functions to control the overall type I error in phase III clinical trials. We provide algorithms to calculate critical values, power, and sample sizes for the proposed design. Sensitivity analysis is implemented to check the effects from different prior distributions, and conservative priors are recommended. We compare the power and actual sample sizes of the proposed Bayesian sequential design with different alpha spending functions through simulations. We also compare the power of the proposed method with frequentist sequential design using the same alpha spending function. Simulations show that, at the same sample size, the proposed method provides larger power than the corresponding frequentist sequential design. It also has larger power than traditional Bayesian sequential design which sets equal critical values for all interim analyses. When compared with other alpha spending functions, O'Brien-Fleming alpha spending function has the largest power and is the most conservative in terms that at the same sample size, the null hypothesis is the least likely to be rejected at early stage of clinical trials. And finally, we show that adding a step of stop for futility in the Bayesian sequential design can reduce the overall type I error and reduce the actual sample sizes.

  14. Genetic Diversity and Domestication Footprints of Chinese Cherry [Cerasus pseudocerasus (Lindl.) G.Don] as Revealed by Nuclear Microsatellites

    PubMed Central

    Zhang, Jing; Chen, Tao; Wang, Yan; Chen, Qing; Sun, Bo; Luo, Ya; Zhang, Yong; Tang, Haoru; Wang, Xiaorong

    2018-01-01

    Chinese cherry [Cerasus pseudocerasus (Lindl.) G.Don] is a commercially important fruit crop in China, but its structure patterns and domestication history remain imprecise. To address these questions, we estimated the genetic structure and domestication history of Chinese cherry using 19 nuclear microsatellite markers and 650 representative accessions (including 118 Cerasus relatives) selected throughout their natural eco-geographical distributions. Our structure analyses detected no genetic contribution from Cerasus relatives to the evolution history of Chinese cherry. A separate genetic structure was detected in wild Chinese cherries and rough geographical structures were observed in cultivated Chinese cherries. One wild (wild Chinese cherry, WC) and two cultivated (cultivated Chinese cherry, CC1 and CC2) genetic clusters were defined. Our approximate Bayesian computation analyses supported an independent domestication history with two domestication events for CC1 and CC2, happening about 3900 and 2200 years ago, respectively. Moderate loss of genetic diversity, over 1000-year domestication bottlenecks and divergent domestication in fruit traits were also detected in cultivated Chinese cherries, which is highly correlated to long-term clonal propagation and different domestication trends and preferences. Our study is the first to comprehensively and systematically investigate the structure patterns and domestication history for Chinese cherry, providing important references for revealing the evolution and domestication history of perennial woody fruit trees. PMID:29535750

  15. Present-Day Genetic Structure of Atlantic Salmon (Salmo salar) in Icelandic Rivers and Ice-Cap Retreat Models

    PubMed Central

    Olafsson, Kristinn; Pampoulie, Christophe; Hjorleifsdottir, Sigridur; Gudjonsson, Sigurdur; Hreggvidsson, Gudmundur O.

    2014-01-01

    Due to an improved understanding of past climatological conditions, it has now become possible to study the potential concordance between former climatological models and present-day genetic structure. Genetic variability was assessed in 26 samples from different rivers of Atlantic salmon in Iceland (total of 2,352 individuals), using 15 microsatellite loci. F-statistics revealed significant differences between the majority of the populations that were sampled. Bayesian cluster analyses using both prior information and no prior information on sampling location revealed the presence of two distinguishable genetic pools - namely, the Northern (Group 1) and Southern (Group 2) regions of Iceland. Furthermore, the random permutation of different allele sizes among allelic states revealed a significant mutational component to the genetic differentiation at four microsatellite loci (SsaD144, Ssa171, SSsp2201 and SsaF3), and supported the proposition of a historical origin behind the observed variation. The estimated time of divergence, using two different ABC methods, suggested that the observed genetic pattern originated from between the Last Glacial Maximum to the Younger Dryas, which serves as additional evidence of the relative immaturity of Icelandic fish populations, on account of the re-colonisation of this young environment following the Last Glacial Maximum. Additional analyses suggested the presence of several genetic entities which were likely to originate from the original groups detected. PMID:24498283

  16. Phylogenetic Analysis and Epidemic History of Hepatitis C Virus Genotype 2 in Tunisia, North Africa

    PubMed Central

    Rajhi, Mouna; Ghedira, Kais; Chouikha, Anissa; Djebbi, Ahlem; Cheikh, Imed; Ben Yahia, Ahlem; Sadraoui, Amel; Hammami, Walid; Azouz, Msaddek; Ben Mami, Nabil; Triki, Henda

    2016-01-01

    HCV genotype 2 (HCV-2) has a worldwide distribution with prevalence rates that vary from country to country. High genetic diversity and long-term endemicity were suggested in West African countries. A global dispersal of HCV-2 would have occurred during the 20th century, especially in European countries. In Tunisia, genotype 2 was the second prevalent genotype after genotype 1 and most isolates belong to subtypes 2c and 2k. In this study, phylogenetic analyses based on the NS5B genomic sequences of 113 Tunisian HCV isolates from subtypes 2c and 2k were carried out. A Bayesian coalescent-based framework was used to estimate the origin and the spread of these subtypes circulating in Tunisia. Phylogenetic analyses of HCV-2c sequences suggest the absence of country-specific or time-specific variants. In contrast, the phylogenetic grouping of HCV-2k sequences shows the existence of two major genetic clusters that may represent two distinct circulating variants. Coalescent analysis indicated a most recent common ancestor (tMRCA) of Tunisian HCV-2c around 1886 (1869–1902) before the introduction of HCV-2k in 1901 (1867–1931). Our findings suggest that the introduction of HCV-2c in Tunisia is possibly a result of population movements between Tunisia and European population following the French colonization. PMID:27100294

  17. Phylogenetic Analysis and Epidemic History of Hepatitis C Virus Genotype 2 in Tunisia, North Africa.

    PubMed

    Rajhi, Mouna; Ghedira, Kais; Chouikha, Anissa; Djebbi, Ahlem; Cheikh, Imed; Ben Yahia, Ahlem; Sadraoui, Amel; Hammami, Walid; Azouz, Msaddek; Ben Mami, Nabil; Triki, Henda

    2016-01-01

    HCV genotype 2 (HCV-2) has a worldwide distribution with prevalence rates that vary from country to country. High genetic diversity and long-term endemicity were suggested in West African countries. A global dispersal of HCV-2 would have occurred during the 20th century, especially in European countries. In Tunisia, genotype 2 was the second prevalent genotype after genotype 1 and most isolates belong to subtypes 2c and 2k. In this study, phylogenetic analyses based on the NS5B genomic sequences of 113 Tunisian HCV isolates from subtypes 2c and 2k were carried out. A Bayesian coalescent-based framework was used to estimate the origin and the spread of these subtypes circulating in Tunisia. Phylogenetic analyses of HCV-2c sequences suggest the absence of country-specific or time-specific variants. In contrast, the phylogenetic grouping of HCV-2k sequences shows the existence of two major genetic clusters that may represent two distinct circulating variants. Coalescent analysis indicated a most recent common ancestor (tMRCA) of Tunisian HCV-2c around 1886 (1869-1902) before the introduction of HCV-2k in 1901 (1867-1931). Our findings suggest that the introduction of HCV-2c in Tunisia is possibly a result of population movements between Tunisia and European population following the French colonization.

  18. Historical and current introgression in a Mesoamerican hummingbird species complex: a biogeographic perspective

    PubMed Central

    Jiménez, Rosa Alicia

    2016-01-01

    The influence of geologic and Pleistocene glacial cycles might result in morphological and genetic complex scenarios in the biota of the Mesoamerican region. We tested whether berylline, blue-tailed and steely-blue hummingbirds, Amazilia beryllina, Amazilia cyanura and Amazilia saucerottei, show evidence of historical or current introgression as their plumage colour variation might suggest. We also analysed the role of past and present climatic events in promoting genetic introgression and species diversification. We collected mitochondrial DNA (mtDNA) sequence data and microsatellite loci scores for populations throughout the range of the three Amazilia species, as well as morphological and ecological data. Haplotype network, Bayesian phylogenetic and divergence time inference, historical demography, palaeodistribution modelling, and niche divergence tests were used to reconstruct the evolutionary history of this Amazilia species complex. An isolation-with-migration coalescent model and Bayesian assignment analysis were assessed to determine historical introgression and current genetic admixture. mtDNA haplotypes were geographically unstructured, with haplotypes from disparate areas interdispersed on a shallow tree and an unresolved haplotype network. Assignment analysis of the nuclear genome (nuDNA) supported three genetic groups with signs of genetic admixture, corresponding to: (1) A. beryllina populations located west of the Isthmus of Tehuantepec; (2) A. cyanura populations between the Isthmus of Tehuantepec and the Nicaraguan Depression (Nuclear Central America); and (3) A. saucerottei populations southeast of the Nicaraguan Depression. Gene flow and divergence time estimates, and demographic and palaeodistribution patterns suggest an evolutionary history of introgression mediated by Quaternary climatic fluctuations. High levels of gene flow were indicated by mtDNA and asymmetrical isolation-with-migration, whereas the microsatellite analyses found evidence for three genetic clusters with distributions corresponding to isolation by the Isthmus of Tehuantepec and the Nicaraguan Depression and signs of admixture. Historical levels of migration between genetically distinct groups estimated using microsatellites were higher than contemporary levels of migration. These results support the scenario of secondary contact and range contact during the glacial periods of the Pleistocene and strongly imply that the high levels of structure currently observed are a consequence of the limited dispersal of these hummingbirds across the isthmus and depression barriers. PMID:26788433

  19. Cross-Cultural Invariance of the Mental Toughness Inventory Among Australian, Chinese, and Malaysian Athletes: A Bayesian Estimation Approach.

    PubMed

    Gucciardi, Daniel F; Zhang, Chun-Qing; Ponnusamy, Vellapandian; Si, Gangyan; Stenling, Andreas

    2016-04-01

    The aims of this study were to assess the cross-cultural invariance of athletes' self-reports of mental toughness and to introduce and illustrate the application of approximate measurement invariance using Bayesian estimation for sport and exercise psychology scholars. Athletes from Australia (n = 353, Mage = 19.13, SD = 3.27, men = 161), China (n = 254, Mage = 17.82, SD = 2.28, men = 138), and Malaysia (n = 341, Mage = 19.13, SD = 3.27, men = 200) provided a cross-sectional snapshot of their mental toughness. The cross-cultural invariance of the mental toughness inventory in terms of (a) the factor structure (configural invariance), (b) factor loadings (metric invariance), and (c) item intercepts (scalar invariance) was tested using an approximate measurement framework with Bayesian estimation. Results indicated that approximate metric and scalar invariance was established. From a methodological standpoint, this study demonstrated the usefulness and flexibility of Bayesian estimation for single-sample and multigroup analyses of measurement instruments. Substantively, the current findings suggest that the measurement of mental toughness requires cultural adjustments to better capture the contextually salient (emic) aspects of this concept.

  20. Posterior Predictive Bayesian Phylogenetic Model Selection

    PubMed Central

    Lewis, Paul O.; Xie, Wangang; Chen, Ming-Hui; Fan, Yu; Kuo, Lynn

    2014-01-01

    We present two distinctly different posterior predictive approaches to Bayesian phylogenetic model selection and illustrate these methods using examples from green algal protein-coding cpDNA sequences and flowering plant rDNA sequences. The Gelfand–Ghosh (GG) approach allows dissection of an overall measure of model fit into components due to posterior predictive variance (GGp) and goodness-of-fit (GGg), which distinguishes this method from the posterior predictive P-value approach. The conditional predictive ordinate (CPO) method provides a site-specific measure of model fit useful for exploratory analyses and can be combined over sites yielding the log pseudomarginal likelihood (LPML) which is useful as an overall measure of model fit. CPO provides a useful cross-validation approach that is computationally efficient, requiring only a sample from the posterior distribution (no additional simulation is required). Both GG and CPO add new perspectives to Bayesian phylogenetic model selection based on the predictive abilities of models and complement the perspective provided by the marginal likelihood (including Bayes Factor comparisons) based solely on the fit of competing models to observed data. [Bayesian; conditional predictive ordinate; CPO; L-measure; LPML; model selection; phylogenetics; posterior predictive.] PMID:24193892

  1. A nonparametric method to generate synthetic populations to adjust for complex sampling design features.

    PubMed

    Dong, Qi; Elliott, Michael R; Raghunathan, Trivellore E

    2014-06-01

    Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.

  2. A nonparametric method to generate synthetic populations to adjust for complex sampling design features

    PubMed Central

    Dong, Qi; Elliott, Michael R.; Raghunathan, Trivellore E.

    2017-01-01

    Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs. PMID:29200608

  3. The effect of using genealogy-based haplotypes for genomic prediction

    PubMed Central

    2013-01-01

    Background Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. Methods A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. Results About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Conclusions Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy. PMID:23496971

  4. The effect of using genealogy-based haplotypes for genomic prediction.

    PubMed

    Edriss, Vahid; Fernando, Rohan L; Su, Guosheng; Lund, Mogens S; Guldbrandtsen, Bernt

    2013-03-06

    Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy.

  5. Efficient Implementation of MrBayes on Multi-GPU

    PubMed Central

    Zhou, Jianfu; Liu, Xiaoguang; Wang, Gang

    2013-01-01

    MrBayes, using Metropolis-coupled Markov chain Monte Carlo (MCMCMC or (MC)3), is a popular program for Bayesian inference. As a leading method of using DNA data to infer phylogeny, the (MC)3 Bayesian algorithm and its improved and parallel versions are now not fast enough for biologists to analyze massive real-world DNA data. Recently, graphics processor unit (GPU) has shown its power as a coprocessor (or rather, an accelerator) in many fields. This article describes an efficient implementation a(MC)3 (aMCMCMC) for MrBayes (MC)3 on compute unified device architecture. By dynamically adjusting the task granularity to adapt to input data size and hardware configuration, it makes full use of GPU cores with different data sets. An adaptive method is also developed to split and combine DNA sequences to make full use of a large number of GPU cards. Furthermore, a new “node-by-node” task scheduling strategy is developed to improve concurrency, and several optimizing methods are used to reduce extra overhead. Experimental results show that a(MC)3 achieves up to 63× speedup over serial MrBayes on a single machine with one GPU card, and up to 170× speedup with four GPU cards, and up to 478× speedup with a 32-node GPU cluster. a(MC)3 is dramatically faster than all the previous (MC)3 algorithms and scales well to large GPU clusters. PMID:23493260

  6. Efficient implementation of MrBayes on multi-GPU.

    PubMed

    Bao, Jie; Xia, Hongju; Zhou, Jianfu; Liu, Xiaoguang; Wang, Gang

    2013-06-01

    MrBayes, using Metropolis-coupled Markov chain Monte Carlo (MCMCMC or (MC)(3)), is a popular program for Bayesian inference. As a leading method of using DNA data to infer phylogeny, the (MC)(3) Bayesian algorithm and its improved and parallel versions are now not fast enough for biologists to analyze massive real-world DNA data. Recently, graphics processor unit (GPU) has shown its power as a coprocessor (or rather, an accelerator) in many fields. This article describes an efficient implementation a(MC)(3) (aMCMCMC) for MrBayes (MC)(3) on compute unified device architecture. By dynamically adjusting the task granularity to adapt to input data size and hardware configuration, it makes full use of GPU cores with different data sets. An adaptive method is also developed to split and combine DNA sequences to make full use of a large number of GPU cards. Furthermore, a new "node-by-node" task scheduling strategy is developed to improve concurrency, and several optimizing methods are used to reduce extra overhead. Experimental results show that a(MC)(3) achieves up to 63× speedup over serial MrBayes on a single machine with one GPU card, and up to 170× speedup with four GPU cards, and up to 478× speedup with a 32-node GPU cluster. a(MC)(3) is dramatically faster than all the previous (MC)(3) algorithms and scales well to large GPU clusters.

  7. Weighted community detection and data clustering using message passing

    NASA Astrophysics Data System (ADS)

    Shi, Cheng; Liu, Yanchen; Zhang, Pan

    2018-03-01

    Grouping objects into clusters based on the similarities or weights between them is one of the most important problems in science and engineering. In this work, by extending message-passing algorithms and spectral algorithms proposed for an unweighted community detection problem, we develop a non-parametric method based on statistical physics, by mapping the problem to the Potts model at the critical temperature of spin-glass transition and applying belief propagation to solve the marginals corresponding to the Boltzmann distribution. Our algorithm is robust to over-fitting and gives a principled way to determine whether there are significant clusters in the data and how many clusters there are. We apply our method to different clustering tasks. In the community detection problem in weighted and directed networks, we show that our algorithm significantly outperforms existing algorithms. In the clustering problem, where the data were generated by mixture models in the sparse regime, we show that our method works all the way down to the theoretical limit of detectability and gives accuracy very close to that of the optimal Bayesian inference. In the semi-supervised clustering problem, our method only needs several labels to work perfectly in classic datasets. Finally, we further develop Thouless-Anderson-Palmer equations which heavily reduce the computation complexity in dense networks but give almost the same performance as belief propagation.

  8. CCD UBVRI photometry of NGC 6811

    NASA Astrophysics Data System (ADS)

    Yontan, T.; Bilir, S.; Bostancı, Z. F.; Ak, T.; Karaali, S.; Güver, T.; Ak, S.; Duran, Ş.; Paunzen, E.

    2015-02-01

    We present the results of CCD UBVRI observations of the open cluster NGC 6811 obtained on 18th July 2012 with the 1 m telescope at the TÜBİTAK National Observatory (TUG). Using these photometric results, we determine the structural and astrophysical parameters of the cluster. The mean photometric uncertainties are better than 0.02 mag in the V magnitude and B- V, V- R, and V- I colour indices to about 0.03 mag for U- B among stars brighter than magnitude V=18. Cluster member stars were separated from the field stars using the Galaxia model of Sharma et al. (2011) together with other techniques. The core radius of the cluster is found to be r c =3.60 arcmin. The astrophysical parameters were determined simultaneously via Bayesian statistics using the colour-magnitude diagrams V versus B- V, V versus V- I, V versus V- R, and V versus R- I of the cluster. The resulting most likely parameters were further confirmed using independent methods, removing any possible degeneracies. The colour excess, distance modulus, metallicity and the age of the cluster are determined simultaneously as E( B- V)=0.05±0.01 mag, μ=10.06±0.08 mag, [ M/ H]=-0.10±0.01 dex and t=1.00±0.05 Gyr, respectively. Distances of five red clump stars which were found to be members of the cluster further confirm our distance estimation.

  9. Mechanisms of motivational interviewing in health promotion: a Bayesian mediation analysis

    PubMed Central

    2012-01-01

    Background Counselor behaviors that mediate the efficacy of motivational interviewing (MI) are not well understood, especially when applied to health behavior promotion. We hypothesized that client change talk mediates the relationship between counselor variables and subsequent client behavior change. Methods Purposeful sampling identified individuals from a prospective randomized worksite trial using an MI intervention to promote firefighters’ healthy diet and regular exercise that increased dietary intake of fruits and vegetables (n = 21) or did not increase intake of fruits and vegetables (n = 22). MI interactions were coded using the Motivational Interviewing Skill Code (MISC 2.1) to categorize counselor and firefighter verbal utterances. Both Bayesian and frequentist mediation analyses were used to investigate whether client change talk mediated the relationship between counselor skills and behavior change. Results Counselors’ global spirit, empathy, and direction and MI-consistent behavioral counts (e.g., reflections, open questions, affirmations, emphasize control) significantly correlated with firefighters’ total client change talk utterances (rs = 0.42, 0.40, 0.30, and 0.61, respectively), which correlated significantly with their fruit and vegetable intake increase (r = 0.33). Both Bayesian and frequentist mediation analyses demonstrated that findings were consistent with hypotheses, such that total client change talk mediated the relationship between counselor’s skills—MI-consistent behaviors [Bayesian mediated effect: αβ = .06 (.03), 95% CI = .02, .12] and MI spirit [Bayesian mediated effect: αβ = .06 (.03), 95% CI = .01, .13]—and increased fruit and vegetable consumption. Conclusion Motivational interviewing is a resource- and time-intensive intervention, and is currently being applied in many arenas. Previous research has identified the importance of counselor behaviors and client change talk in the treatment of substance use disorders. Our results indicate that similar mechanisms may underlie the effects of MI for dietary change. These results inform MI training and application by identifying those processes critical for MI success in health promotion domains. PMID:22681874

  10. Estimating the extent and distribution of new-onset adult asthma in British Columbia using frequentist and Bayesian approaches.

    PubMed

    Beach, Jeremy; Burstyn, Igor; Cherry, Nicola

    2012-07-01

    We previously described a method to identify the incidence of new-onset adult asthma (NOAA) in Alberta by industry and occupation, utilizing Workers' Compensation Board (WCB) and physician billing data. The aim of this study was to extend this method to data from British Columbia (BC) so as to compare the two provinces and to incorporate Bayesian methodology into estimates of risk. WCB claims for any reason 1995-2004 were linked to physician billing data. NOAA was defined as a billing for asthma (ICD-9 493) in the 12 months before a WCB claim without asthma in the previous 3 years. Incidence was calculated by occupation and industry. In a matched case-referent analysis, associations with exposures were examined using an asthma-specific job exposure matrix (JEM). Posterior distributions from the Alberta analysis and estimated misclassification parameters were used as priors in the Bayesian analysis of the BC data. Among 1 118 239 eligible WCB claims the incidence of NOAA was 1.4%. Sixteen occupations and 44 industries had a significantly increased risk; six industries had a decreased risk. The JEM identified wood dust [odds ratio (OR) 1.55, 95% confidence interval (CI) 1.08-2.24] and animal antigens (OR 1.66, 95% CI 1.17-2.36) as related to an increased risk of NOAA. Exposure to isocyanates was associated with decreased risk (OR 0.57, 95% CI 0.39-0.85). Bayesian analyses taking account of exposure misclassification and informative priors resulted in posterior distributions of ORs with lower boundary of 95% credible intervals >1.00 for almost all exposures. The distribution of NOAA in BC appeared somewhat similar to that in Alberta, except for isocyanates. Bayesian analyses allowed incorporation of prior evidence into risk estimates, permitting reconsideration of the apparently protective effect of isocyanate exposure.

  11. Testing for Divergent Transmission Histories among Cultural Characters: A Study Using Bayesian Phylogenetic Methods and Iranian Tribal Textile Data

    PubMed Central

    Matthews, Luke J.; Tehrani, Jamie J.; Jordan, Fiona M.; Collard, Mark; Nunn, Charles L.

    2011-01-01

    Background Archaeologists and anthropologists have long recognized that different cultural complexes may have distinct descent histories, but they have lacked analytical techniques capable of easily identifying such incongruence. Here, we show how Bayesian phylogenetic analysis can be used to identify incongruent cultural histories. We employ the approach to investigate Iranian tribal textile traditions. Methods We used Bayes factor comparisons in a phylogenetic framework to test two models of cultural evolution: the hierarchically integrated system hypothesis and the multiple coherent units hypothesis. In the hierarchically integrated system hypothesis, a core tradition of characters evolves through descent with modification and characters peripheral to the core are exchanged among contemporaneous populations. In the multiple coherent units hypothesis, a core tradition does not exist. Rather, there are several cultural units consisting of sets of characters that have different histories of descent. Results For the Iranian textiles, the Bayesian phylogenetic analyses supported the multiple coherent units hypothesis over the hierarchically integrated system hypothesis. Our analyses suggest that pile-weave designs represent a distinct cultural unit that has a different phylogenetic history compared to other textile characters. Conclusions The results from the Iranian textiles are consistent with the available ethnographic evidence, which suggests that the commercial rug market has influenced pile-rug designs but not the techniques or designs incorporated in the other textiles produced by the tribes. We anticipate that Bayesian phylogenetic tests for inferring cultural units will be of great value for researchers interested in studying the evolution of cultural traits including language, behavior, and material culture. PMID:21559083

  12. "A Bayesian sensitivity analysis to evaluate the impact of unmeasured confounding with external data: a real world comparative effectiveness study in osteoporosis".

    PubMed

    Zhang, Xiang; Faries, Douglas E; Boytsov, Natalie; Stamey, James D; Seaman, John W

    2016-09-01

    Observational studies are frequently used to assess the effectiveness of medical interventions in routine clinical practice. However, the use of observational data for comparative effectiveness is challenged by selection bias and the potential of unmeasured confounding. This is especially problematic for analyses using a health care administrative database, in which key clinical measures are often not available. This paper provides an approach to conducting a sensitivity analyses to investigate the impact of unmeasured confounding in observational studies. In a real world osteoporosis comparative effectiveness study, the bone mineral density (BMD) score, an important predictor of fracture risk and a factor in the selection of osteoporosis treatments, is unavailable in the data base and lack of baseline BMD could potentially lead to significant selection bias. We implemented Bayesian twin-regression models, which simultaneously model both the observed outcome and the unobserved unmeasured confounder, using information from external sources. A sensitivity analysis was also conducted to assess the robustness of our conclusions to changes in such external data. The use of Bayesian modeling in this study suggests that the lack of baseline BMD did have a strong impact on the analysis, reversing the direction of the estimated effect (odds ratio of fracture incidence at 24 months: 0.40 vs. 1.36, with/without adjusting for unmeasured baseline BMD). The Bayesian twin-regression models provide a flexible sensitivity analysis tool to quantitatively assess the impact of unmeasured confounding in observational studies. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  13. The impact of using informative priors in a Bayesian cost-effectiveness analysis: an application of endovascular versus open surgical repair for abdominal aortic aneurysms in high-risk patients.

    PubMed

    McCarron, C Elizabeth; Pullenayegum, Eleanor M; Thabane, Lehana; Goeree, Ron; Tarride, Jean-Eric

    2013-04-01

    Bayesian methods have been proposed as a way of synthesizing all available evidence to inform decision making. However, few practical applications of the use of Bayesian methods for combining patient-level data (i.e., trial) with additional evidence (e.g., literature) exist in the cost-effectiveness literature. The objective of this study was to compare a Bayesian cost-effectiveness analysis using informative priors to a standard non-Bayesian nonparametric method to assess the impact of incorporating additional information into a cost-effectiveness analysis. Patient-level data from a previously published nonrandomized study were analyzed using traditional nonparametric bootstrap techniques and bivariate normal Bayesian models with vague and informative priors. Two different types of informative priors were considered to reflect different valuations of the additional evidence relative to the patient-level data (i.e., "face value" and "skeptical"). The impact of using different distributions and valuations was assessed in a sensitivity analysis. Models were compared in terms of incremental net monetary benefit (INMB) and cost-effectiveness acceptability frontiers (CEAFs). The bootstrapping and Bayesian analyses using vague priors provided similar results. The most pronounced impact of incorporating the informative priors was the increase in estimated life years in the control arm relative to what was observed in the patient-level data alone. Consequently, the incremental difference in life years originally observed in the patient-level data was reduced, and the INMB and CEAF changed accordingly. The results of this study demonstrate the potential impact and importance of incorporating additional information into an analysis of patient-level data, suggesting this could alter decisions as to whether a treatment should be adopted and whether more information should be acquired.

  14. Population structure of an endemic vulnerable species, the Jamaican boa (Epicrates subflavus).

    PubMed

    Tzika, Athanasia C; Koenig, Susan; Miller, Ricardo; Garcia, Gerardo; Remy, Christophe; Milinkovitch, Michel C

    2008-01-01

    The Jamaican boa (Epicrates subflavus; also called Yellow boa) is an endemic species whose natural populations greatly and constantly declined since the late 19th century, mainly because of predation by introduced species, human persecution, and habitat destruction. In-situ conservation of the Jamaican boa is seriously hindered by the lack of information on demographic and ecological parameters as well as by a poor understanding of the population structure and species distribution in the wild. Here, using nine nuclear microsatellite loci and a fragment of the mitochondrial cytochrome b gene from 87 wild-born individuals, we present the first molecular genetic analyses focusing on the diversity and structure of the natural populations of the Jamaican boa. A model-based clustering analysis of multilocus microsatellite genotypes identifies three groups that are also significantly differentiated on the basis of F-statistics. Similarly, haplotypic network reconstruction methods applied on the cytochrome b haplotypes isolated here identify two well-differentiated haplogroups separated by four to six fixed mutations. Bayesian and metaGA analyses of the mitochondrial data set combined with sequences from other Boidae species indicate that rooting of the haplotypic network occurs most likely between the two defined haplogroups. Both analyses (based on nuclear and mitochondrial markers) underline an Eastern vs. (Western + Central) pattern of differentiation in agreement with geological data and patterns of differentiation uncovered in other vertebrate and invertebrate Jamaican species. Our results provide important insights for improving management of ex-situ captive populations and for guiding the development of proper in-situ species survival and habitat management plans for this spectacular, yet poorly known and vulnerable, snake.

  15. Local Population Structure and Patterns of Western Hemisphere Dispersal for Coccidioides spp., the Fungal Cause of Valley Fever

    PubMed Central

    Roe, Chandler C.; Hepp, Crystal M.; Teixeira, Marcus; Driebe, Elizabeth M.; Schupp, James M.; Gade, Lalitha; Waddell, Victor; Komatsu, Kenneth; Arathoon, Eduardo; Logemann, Heidi; Thompson, George R.; Chiller, Tom; Keim, Paul; Litvintseva, Anastasia P.

    2016-01-01

    ABSTRACT Coccidioidomycosis (or valley fever) is a fungal disease with high morbidity and mortality that affects tens of thousands of people each year. This infection is caused by two sibling species, Coccidioides immitis and C. posadasii, which are endemic to specific arid locales throughout the Western Hemisphere, particularly the desert southwest of the United States. Recent epidemiological and population genetic data suggest that the geographic range of coccidioidomycosis is expanding, as new endemic clusters have been identified in the state of Washington, well outside the established endemic range. The genetic mechanisms and epidemiological consequences of this expansion are unknown and require better understanding of the population structure and evolutionary history of these pathogens. Here we performed multiple phylogenetic inference and population genomics analyses of 68 new and 18 previously published genomes. The results provide evidence of substantial population structure in C. posadasii and demonstrate the presence of distinct geographic clades in central and southern Arizona as well as dispersed populations in Texas, Mexico, South America, and Central America. Although a smaller number of C. immitis strains were included in the analyses, some evidence of phylogeographic structure was also detected in this species, which has been historically limited to California and Baja, Mexico. Bayesian analyses indicated that C. posadasii is the more ancient of the two species and that Arizona contains the most diverse subpopulations. We propose a southern Arizona-northern Mexico origin for C. posadasii and describe a pathway for dispersal and distribution out of this region. PMID:27118594

  16. Phylogenetic analysis of Pasteuria penetrans by use of multiple genetic loci.

    PubMed

    Charles, Lauren; Carbone, Ignazio; Davies, Keith G; Bird, David; Burke, Mark; Kerry, Brian R; Opperman, Charles H

    2005-08-01

    Pasteuria penetrans is a gram-positive, endospore-forming eubacterium that apparently is a member of the Bacillus-Clostridium clade. It is an obligate parasite of root knot nematodes (Meloidogyne spp.) and preferentially grows on the developing ovaries, inhibiting reproduction. Root knot nematodes are devastating root pests of economically important crop plants and are difficult to control. Consequently, P. penetrans has long been recognized as a potential biocontrol agent for root knot nematodes, but the fastidious life cycle and the obligate nature of parasitism have inhibited progress on mass culture and deployment. We are currently sequencing the genome of the Pasteuria bacterium and have performed amino acid level analyses of 33 bacterial species (including P. penetrans) using concatenation of 40 housekeeping genes, with and without insertions/deletions (indels) removed, and using each gene individually. By application of maximum-likelihood, maximum-parsimony, and Bayesian methods to the resulting data sets, P. penetrans was found to cluster tightly, with a high level of confidence, in the Bacillus class of the gram-positive, low-G+C-content eubacteria. Strikingly, our analyses identified P. penetrans as ancestral to Bacillus spp. Additionally, all analyses revealed that P. penetrans is surprisingly more closely related to the saprophytic extremophile Bacillus haladurans and Bacillus subtilis than to the pathogenic species Bacillus anthracis and Bacillus cereus. Collectively, these findings strongly imply that P. penetrans is an ancient member of the Bacillus group. We suggest that P. penetrans may have evolved from an ancient symbiotic bacterial associate of nematodes, possibly as the root knot nematode evolved to be a highly specialized parasite of plants.

  17. Dark Energy Survey Year 1 Results: redshift distributions of the weak-lensing source galaxies

    NASA Astrophysics Data System (ADS)

    Hoyle, B.; Gruen, D.; Bernstein, G. M.; Rau, M. M.; De Vicente, J.; Hartley, W. G.; Gaztanaga, E.; DeRose, J.; Troxel, M. A.; Davis, C.; Alarcon, A.; MacCrann, N.; Prat, J.; Sánchez, C.; Sheldon, E.; Wechsler, R. H.; Asorey, J.; Becker, M. R.; Bonnett, C.; Carnero Rosell, A.; Carollo, D.; Carrasco Kind, M.; Castander, F. J.; Cawthon, R.; Chang, C.; Childress, M.; Davis, T. M.; Drlica-Wagner, A.; Gatti, M.; Glazebrook, K.; Gschwend, J.; Hinton, S. R.; Hoormann, J. K.; Kim, A. G.; King, A.; Kuehn, K.; Lewis, G.; Lidman, C.; Lin, H.; Macaulay, E.; Maia, M. A. G.; Martini, P.; Mudd, D.; Möller, A.; Nichol, R. C.; Ogando, R. L. C.; Rollins, R. P.; Roodman, A.; Ross, A. J.; Rozo, E.; Rykoff, E. S.; Samuroff, S.; Sevilla-Noarbe, I.; Sharp, R.; Sommer, N. E.; Tucker, B. E.; Uddin, S. A.; Varga, T. N.; Vielzeuf, P.; Yuan, F.; Zhang, B.; Abbott, T. M. C.; Abdalla, F. B.; Allam, S.; Annis, J.; Bechtol, K.; Benoit-Lévy, A.; Bertin, E.; Brooks, D.; Buckley-Geer, E.; Burke, D. L.; Busha, M. T.; Capozzi, D.; Carretero, J.; Crocce, M.; D'Andrea, C. B.; da Costa, L. N.; DePoy, D. L.; Desai, S.; Diehl, H. T.; Doel, P.; Eifler, T. F.; Estrada, J.; Evrard, A. E.; Fernandez, E.; Flaugher, B.; Fosalba, P.; Frieman, J.; García-Bellido, J.; Gerdes, D. W.; Giannantonio, T.; Goldstein, D. A.; Gruendl, R. A.; Gutierrez, G.; Honscheid, K.; James, D. J.; Jarvis, M.; Jeltema, T.; Johnson, M. W. G.; Johnson, M. D.; Kirk, D.; Krause, E.; Kuhlmann, S.; Kuropatkin, N.; Lahav, O.; Li, T. S.; Lima, M.; March, M.; Marshall, J. L.; Melchior, P.; Menanteau, F.; Miquel, R.; Nord, B.; O'Neill, C. R.; Plazas, A. A.; Romer, A. K.; Sako, M.; Sanchez, E.; Santiago, B.; Scarpine, V.; Schindler, R.; Schubnell, M.; Smith, M.; Smith, R. C.; Soares-Santos, M.; Sobreira, F.; Suchyta, E.; Swanson, M. E. C.; Tarle, G.; Thomas, D.; Tucker, D. L.; Vikram, V.; Walker, A. R.; Weller, J.; Wester, W.; Wolf, R. C.; Yanny, B.; Zuntz, J.

    2018-07-01

    We describe the derivation and validation of redshift distribution estimates and their uncertainties for the populations of galaxies used as weak-lensing sources in the Dark Energy Survey (DES) Year 1 cosmological analyses. The Bayesian Photometric Redshift (BPZ) code is used to assign galaxies to four redshift bins between z ≈ 0.2 and ≈1.3, and to produce initial estimates of the lensing-weighted redshift distributions n^i_PZ(z)∝ dn^i/dz for members of bin i. Accurate determination of cosmological parameters depends critically on knowledge of ni, but is insensitive to bin assignments or redshift errors for individual galaxies. The cosmological analyses allow for shifts n^i(z)=n^i_PZ(z-Δ z^i) to correct the mean redshift of ni(z) for biases in n^i_PZ. The Δzi are constrained by comparison of independently estimated 30-band photometric redshifts of galaxies in the Cosmic Evolution Survey (COSMOS) field to BPZ estimates made from the DES griz fluxes, for a sample matched in fluxes, pre-seeing size, and lensing weight to the DES weak-lensing sources. In companion papers, the Δzi of the three lowest redshift bins are further constrained by the angular clustering of the source galaxies around red galaxies with secure photometric redshifts at 0.15 < z < 0.9. This paper details the BPZ and COSMOS procedures, and demonstrates that the cosmological inference is insensitive to details of the ni(z) beyond the choice of Δzi. The clustering and COSMOS validation methods produce consistent estimates of Δzi in the bins where both can be applied, with combined uncertainties of σ_{Δ z^i}=0.015, 0.013, 0.011, and 0.022 in the four bins. Repeating the photo-z procedure instead using the Directional Neighbourhood Fitting algorithm, or using the ni(z) estimated from the matched sample in COSMOS, yields no discernible difference in cosmological inferences.

  18. Transmissivity interpolation using Fluid Flow Log data at different depth level in Liwa Aquifer, UAE.

    NASA Astrophysics Data System (ADS)

    Gülşen, Esra; Kurtulus, Bedri; Necati Yaylim, Tolga; Avsar, Ozgur

    2017-04-01

    In groundwater studies, quantification and detection of fluid flows in borehole is an important part of assessment aquifer characteristic at different depths. Monitoring wells disturbs the natural flow field and this disturbance creates different flow paths to an aquifer. Vertical flow fluid analyses are one of the important techniques to deal with the detection and quantification of these vertical flows in borehole/monitoring wells. Liwa region is located about 146 km to the south west of Abu Dhabi city and about 36 km southwest of Madinat Zayed. SWSR (Strategic Water Storage & Recovery Project) comprises three Schemes (A, B and C) and each scheme contains an infiltration basin in the center, 105 recovery wells, 10 clusters and each cluster comprises 3 monitoring wells with different depths; shallow ( 50 m), intermediate ( 75 m) and deep ( 100 m). The scope of this study is to calculate the transmissivity values at different depth and evaluate the Fluid Flow Log (FFL) data for Scheme A (105 recovery wells) in order to understand the aquifer characteristic at different depths. The transmissivity values at different depth levels are calculated using Razack and Huntley (1991) equation for vertical flow rates of 30 m3 /h, 60 m3 /h, 90 m3 /h, 120 m3 /h and then Empirical Bayesian Kriging is used for interpolation in Scheme A using ArcGIS 10.2 software. FFL are drawn by GeODin software. Derivative analysis of fluid flow data are done by Microsoft Office: Excel software. All statistical analyses are calculated by IBMSPSS software. The interpolation results show that the transmissivity values are higher at the top of the aquifer. In other word, the aquifer is found more productive at the upper part of the Liwa aquifer. We are very grateful for financial support and providing us the data to ZETAS Dubai Inc.

  19. Dark Energy Survey Year 1 Results: Redshift distributions of the weak lensing source galaxies

    NASA Astrophysics Data System (ADS)

    Hoyle, B.; Gruen, D.; Bernstein, G. M.; Rau, M. M.; De Vicente, J.; Hartley, W. G.; Gaztanaga, E.; DeRose, J.; Troxel, M. A.; Davis, C.; Alarcon, A.; MacCrann, N.; Prat, J.; Sánchez, C.; Sheldon, E.; Wechsler, R. H.; Asorey, J.; Becker, M. R.; Bonnett, C.; Carnero Rosell, A.; Carollo, D.; Carrasco Kind, M.; Castander, F. J.; Cawthon, R.; Chang, C.; Childress, M.; Davis, T. M.; Drlica-Wagner, A.; Gatti, M.; Glazebrook, K.; Gschwend, J.; Hinton, S. R.; Hoormann, J. K.; Kim, A. G.; King, A.; Kuehn, K.; Lewis, G.; Lidman, C.; Lin, H.; Macaulay, E.; Maia, M. A. G.; Martini, P.; Mudd, D.; Möller, A.; Nichol, R. C.; Ogando, R. L. C.; Rollins, R. P.; Roodman, A.; Ross, A. J.; Rozo, E.; Rykoff, E. S.; Samuroff, S.; Sevilla-Noarbe, I.; Sharp, R.; Sommer, N. E.; Tucker, B. E.; Uddin, S. A.; Varga, T. N.; Vielzeuf, P.; Yuan, F.; Zhang, B.; Abbott, T. M. C.; Abdalla, F. B.; Allam, S.; Annis, J.; Bechtol, K.; Benoit-Lévy, A.; Bertin, E.; Brooks, D.; Buckley-Geer, E.; Burke, D. L.; Busha, M. T.; Capozzi, D.; Carretero, J.; Crocce, M.; D'Andrea, C. B.; da Costa, L. N.; DePoy, D. L.; Desai, S.; Diehl, H. T.; Doel, P.; Eifler, T. F.; Estrada, J.; Evrard, A. E.; Fernandez, E.; Flaugher, B.; Fosalba, P.; Frieman, J.; García-Bellido, J.; Gerdes, D. W.; Giannantonio, T.; Goldstein, D. A.; Gruendl, R. A.; Gutierrez, G.; Honscheid, K.; James, D. J.; Jarvis, M.; Jeltema, T.; Johnson, M. W. G.; Johnson, M. D.; Kirk, D.; Krause, E.; Kuhlmann, S.; Kuropatkin, N.; Lahav, O.; Li, T. S.; Lima, M.; March, M.; Marshall, J. L.; Melchior, P.; Menanteau, F.; Miquel, R.; Nord, B.; O'Neill, C. R.; Plazas, A. A.; Romer, A. K.; Sako, M.; Sanchez, E.; Santiago, B.; Scarpine, V.; Schindler, R.; Schubnell, M.; Smith, M.; Smith, R. C.; Soares-Santos, M.; Sobreira, F.; Suchyta, E.; Swanson, M. E. C.; Tarle, G.; Thomas, D.; Tucker, D. L.; Vikram, V.; Walker, A. R.; Weller, J.; Wester, W.; Wolf, R. C.; Yanny, B.; Zuntz, J.; DES Collaboration

    2018-04-01

    We describe the derivation and validation of redshift distribution estimates and their uncertainties for the populations of galaxies used as weak lensing sources in the Dark Energy Survey (DES) Year 1 cosmological analyses. The Bayesian Photometric Redshift (BPZ) code is used to assign galaxies to four redshift bins between z ≈ 0.2 and ≈1.3, and to produce initial estimates of the lensing-weighted redshift distributions n^i_PZ(z)∝ dn^i/dz for members of bin i. Accurate determination of cosmological parameters depends critically on knowledge of ni but is insensitive to bin assignments or redshift errors for individual galaxies. The cosmological analyses allow for shifts n^i(z)=n^i_PZ(z-Δ z^i) to correct the mean redshift of ni(z) for biases in n^i_PZ. The Δzi are constrained by comparison of independently estimated 30-band photometric redshifts of galaxies in the COSMOS field to BPZ estimates made from the DES griz fluxes, for a sample matched in fluxes, pre-seeing size, and lensing weight to the DES weak-lensing sources. In companion papers, the Δzi of the three lowest redshift bins are further constrained by the angular clustering of the source galaxies around red galaxies with secure photometric redshifts at 0.15 < z < 0.9. This paper details the BPZ and COSMOS procedures, and demonstrates that the cosmological inference is insensitive to details of the ni(z) beyond the choice of Δzi. The clustering and COSMOS validation methods produce consistent estimates of Δzi in the bins where both can be applied, with combined uncertainties of σ _{Δ z^i}=0.015, 0.013, 0.011, and 0.022 in the four bins. Repeating the photo-z proceedure instead using the Directional Neighborhood Fitting (DNF) algorithm, or using the ni(z) estimated from the matched sample in COSMOS, yields no discernible difference in cosmological inferences.

  20. Insights into the phylogeny of Northern Hemisphere Armillaria: Neighbor-net and Bayesian analyses of translation elongation factor 1-α gene sequences

    Treesearch

    Ned B. Klopfenstein; Jane E. Stewart; Yuko Ota; John W. Hanna; Bryce A. Richardson; Amy L. Ross-Davis; Ruben D. Elias-Roman; Kari Korhonen; Nenad Keca; Eugenia Iturritxa; Dionicio Alvarado-Rosales; Halvor Solheim; Nicholas J. Brazee; Piotr Lakomy; Michelle R. Cleary; Eri Hasegawa; Taisei Kikuchi; Fortunato Garza-Ocanas; Panaghiotis Tsopelas; Daniel Rigling; Simone Prospero; Tetyana Tsykun; Jean A. Berube; Franck O. P. Stefani; Saeideh Jafarpour; Vladimir Antonin; Michal Tomsovsky; Geral I. McDonald; Stephen Woodward; Mee-Sook Kim

    2017-01-01

    Armillaria possesses several intriguing characteristics that have inspired wide interest in understanding phylogenetic relationships within and among species of this genus. Nuclear ribosomal DNA sequence–based analyses of Armillaria provide only limited information for phylogenetic studies among widely divergent taxa. More recent studies have shown that translation...

  1. Improving phylogenetic analyses by incorporating additional information from genetic sequence databases.

    PubMed

    Liang, Li-Jung; Weiss, Robert E; Redelings, Benjamin; Suchard, Marc A

    2009-10-01

    Statistical analyses of phylogenetic data culminate in uncertain estimates of underlying model parameters. Lack of additional data hinders the ability to reduce this uncertainty, as the original phylogenetic dataset is often complete, containing the entire gene or genome information available for the given set of taxa. Informative priors in a Bayesian analysis can reduce posterior uncertainty; however, publicly available phylogenetic software specifies vague priors for model parameters by default. We build objective and informative priors using hierarchical random effect models that combine additional datasets whose parameters are not of direct interest but are similar to the analysis of interest. We propose principled statistical methods that permit more precise parameter estimates in phylogenetic analyses by creating informative priors for parameters of interest. Using additional sequence datasets from our lab or public databases, we construct a fully Bayesian semiparametric hierarchical model to combine datasets. A dynamic iteratively reweighted Markov chain Monte Carlo algorithm conveniently recycles posterior samples from the individual analyses. We demonstrate the value of our approach by examining the insertion-deletion (indel) process in the enolase gene across the Tree of Life using the phylogenetic software BALI-PHY; we incorporate prior information about indels from 82 curated alignments downloaded from the BAliBASE database.

  2. Do Bayesian adaptive trials offer advantages for comparative effectiveness research? Protocol for the RE-ADAPT study

    PubMed Central

    Luce, Bryan R; Broglio, Kristine R; Ishak, K Jack; Mullins, C Daniel; Vanness, David J; Fleurence, Rachael; Saunders, Elijah; Davis, Barry R

    2013-01-01

    Background Randomized clinical trials, particularly for comparative effectiveness research (CER), are frequently criticized for being overly restrictive or untimely for health-care decision making. Purpose Our prospectively designed REsearch in ADAptive methods for Pragmatic Trials (RE-ADAPT) study is a ‘proof of concept’ to stimulate investment in Bayesian adaptive designs for future CER trials. Methods We will assess whether Bayesian adaptive designs offer potential efficiencies in CER by simulating a re-execution of the Antihypertensive and Lipid Lowering Treatment to Prevent Heart Attack Trial (ALLHAT) study using actual data from ALLHAT. Results We prospectively define seven alternate designs consisting of various combinations of arm dropping, adaptive randomization, and early stopping and describe how these designs will be compared to the original ALLHAT design. We identify the one particular design that would have been executed, which incorporates early stopping and information-based adaptive randomization. Limitations While the simulation realistically emulates patient enrollment, interim analyses, and adaptive changes to design, it cannot incorporate key features like the involvement of data monitoring committee in making decisions about adaptive changes. Conclusion This article describes our analytic approach for RE-ADAPT. The next stage of the project is to conduct the re-execution analyses using the seven prespecified designs and the original ALLHAT data. PMID:23983160

  3. Use of Principal Components Analysis and Kriging to Predict Groundwater-Sourced Rural Drinking Water Quality in Saskatchewan

    PubMed Central

    McLeod, Lianne; Bharadwaj, Lalita; Epp, Tasha; Waldner, Cheryl L.

    2017-01-01

    Groundwater drinking water supply surveillance data were accessed to summarize water quality delivered as public and private water supplies in southern Saskatchewan as part of an exposure assessment for epidemiologic analyses of associations between water quality and type 2 diabetes or cardiovascular disease. Arsenic in drinking water has been linked to a variety of chronic diseases and previous studies have identified multiple wells with arsenic above the drinking water standard of 0.01 mg/L; therefore, arsenic concentrations were of specific interest. Principal components analysis was applied to obtain principal component (PC) scores to summarize mixtures of correlated parameters identified as health standards and those identified as aesthetic objectives in the Saskatchewan Drinking Water Quality Standards and Objective. Ordinary, universal, and empirical Bayesian kriging were used to interpolate arsenic concentrations and PC scores in southern Saskatchewan, and the results were compared. Empirical Bayesian kriging performed best across all analyses, based on having the greatest number of variables for which the root mean square error was lowest. While all of the kriging methods appeared to underestimate high values of arsenic and PC scores, empirical Bayesian kriging was chosen to summarize large scale geographic trends in groundwater-sourced drinking water quality and assess exposure to mixtures of trace metals and ions. PMID:28914824

  4. Analysis of phase II methodologies for single-arm clinical trials with multiple endpoints in rare cancers: An example in Ewing's sarcoma.

    PubMed

    Dutton, P; Love, S B; Billingham, L; Hassan, A B

    2018-05-01

    Trials run in either rare diseases, such as rare cancers, or rare sub-populations of common diseases are challenging in terms of identifying, recruiting and treating sufficient patients in a sensible period. Treatments for rare diseases are often designed for other disease areas and then later proposed as possible treatments for the rare disease after initial phase I testing is complete. To ensure the trial is in the best interests of the patient participants, frequent interim analyses are needed to force the trial to stop promptly if the treatment is futile or toxic. These non-definitive phase II trials should also be stopped for efficacy to accelerate research progress if the treatment proves to be particularly promising. In this paper, we review frequentist and Bayesian methods that have been adapted to incorporate two binary endpoints and frequent interim analyses. The Eurosarc Trial of Linsitinib in advanced Ewing Sarcoma (LINES) is used as a motivating example and provides a suitable platform to compare these approaches. The Bayesian approach provides greater design flexibility, but does not provide additional value over the frequentist approaches in a single trial setting when the prior is non-informative. However, Bayesian designs are able to borrow from any previous experience, using prior information to improve efficiency.

  5. Use of Principal Components Analysis and Kriging to Predict Groundwater-Sourced Rural Drinking Water Quality in Saskatchewan.

    PubMed

    McLeod, Lianne; Bharadwaj, Lalita; Epp, Tasha; Waldner, Cheryl L

    2017-09-15

    Groundwater drinking water supply surveillance data were accessed to summarize water quality delivered as public and private water supplies in southern Saskatchewan as part of an exposure assessment for epidemiologic analyses of associations between water quality and type 2 diabetes or cardiovascular disease. Arsenic in drinking water has been linked to a variety of chronic diseases and previous studies have identified multiple wells with arsenic above the drinking water standard of 0.01 mg/L; therefore, arsenic concentrations were of specific interest. Principal components analysis was applied to obtain principal component (PC) scores to summarize mixtures of correlated parameters identified as health standards and those identified as aesthetic objectives in the Saskatchewan Drinking Water Quality Standards and Objective. Ordinary, universal, and empirical Bayesian kriging were used to interpolate arsenic concentrations and PC scores in southern Saskatchewan, and the results were compared. Empirical Bayesian kriging performed best across all analyses, based on having the greatest number of variables for which the root mean square error was lowest. While all of the kriging methods appeared to underestimate high values of arsenic and PC scores, empirical Bayesian kriging was chosen to summarize large scale geographic trends in groundwater-sourced drinking water quality and assess exposure to mixtures of trace metals and ions.

  6. Bayesian meta-analysis of Cronbach's coefficient alpha to evaluate informative hypotheses.

    PubMed

    Okada, Kensuke

    2015-12-01

    This paper proposes a new method to evaluate informative hypotheses for meta-analysis of Cronbach's coefficient alpha using a Bayesian approach. The coefficient alpha is one of the most widely used reliability indices. In meta-analyses of reliability, researchers typically form specific informative hypotheses beforehand, such as 'alpha of this test is greater than 0.8' or 'alpha of one form of a test is greater than the others.' The proposed method enables direct evaluation of these informative hypotheses. To this end, a Bayes factor is calculated to evaluate the informative hypothesis against its complement. It allows researchers to summarize the evidence provided by previous studies in favor of their informative hypothesis. The proposed approach can be seen as a natural extension of the Bayesian meta-analysis of coefficient alpha recently proposed in this journal (Brannick and Zhang, 2013). The proposed method is illustrated through two meta-analyses of real data that evaluate different kinds of informative hypotheses on superpopulation: one is that alpha of a particular test is above the criterion value, and the other is that alphas among different test versions have ordered relationships. Informative hypotheses are supported from the data in both cases, suggesting that the proposed approach is promising for application. Copyright © 2015 John Wiley & Sons, Ltd.

  7. A Comparison of Japan and U.K. SF-6D Health-State Valuations Using a Non-Parametric Bayesian Method.

    PubMed

    Kharroubi, Samer A

    2015-08-01

    There is interest in the extent to which valuations of health may differ between different countries and cultures, but few studies have compared preference values of health states obtained in different countries. We sought to estimate and compare two directly elicited valuations for SF-6D health states between the Japan and U.K. general adult populations using Bayesian methods. We analysed data from two SF-6D valuation studies where, using similar standard gamble protocols, values for 241 and 249 states were elicited from representative samples of the Japan and U.K. general adult populations, respectively. We estimate a function applicable across both countries that explicitly accounts for the differences between them, and is estimated using data from both countries. The results suggest that differences in SF-6D health-state valuations between the Japan and U.K. general populations are potentially important. The magnitude of these country-specific differences in health-state valuation depended, however, in a complex way on the levels of individual dimensions. The new Bayesian non-parametric method is a powerful approach for analysing data from multiple nationalities or ethnic groups, to understand the differences between them and potentially to estimate the underlying utility functions more efficiently.

  8. Hypothesis testing on the fractal structure of behavioral sequences: the Bayesian assessment of scaling methodology.

    PubMed

    Moscoso del Prado Martín, Fermín

    2013-12-01

    I introduce the Bayesian assessment of scaling (BAS), a simple but powerful Bayesian hypothesis contrast methodology that can be used to test hypotheses on the scaling regime exhibited by a sequence of behavioral data. Rather than comparing parametric models, as typically done in previous approaches, the BAS offers a direct, nonparametric way to test whether a time series exhibits fractal scaling. The BAS provides a simpler and faster test than do previous methods, and the code for making the required computations is provided. The method also enables testing of finely specified hypotheses on the scaling indices, something that was not possible with the previously available methods. I then present 4 simulation studies showing that the BAS methodology outperforms the other methods used in the psychological literature. I conclude with a discussion of methodological issues on fractal analyses in experimental psychology. PsycINFO Database Record (c) 2014 APA, all rights reserved.

  9. Poisson Mixture Regression Models for Heart Disease Prediction.

    PubMed

    Mufudza, Chipo; Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.

  10. Poisson Mixture Regression Models for Heart Disease Prediction

    PubMed Central

    Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611

  11. Hot spots, cluster detection and spatial outlier analysis of teen birth rates in the U.S., 2003-2012.

    PubMed

    Khan, Diba; Rossen, Lauren M; Hamilton, Brady E; He, Yulei; Wei, Rong; Dienes, Erin

    2017-06-01

    Teen birth rates have evidenced a significant decline in the United States over the past few decades. Most of the states in the US have mirrored this national decline, though some reports have illustrated substantial variation in the magnitude of these decreases across the U.S. Importantly, geographic variation at the county level has largely not been explored. We used National Vital Statistics Births data and Hierarchical Bayesian space-time interaction models to produce smoothed estimates of teen birth rates at the county level from 2003-2012. Results indicate that teen birth rates show evidence of clustering, where hot and cold spots occur, and identify spatial outliers. Findings from this analysis may help inform efforts targeting the prevention efforts by illustrating how geographic patterns of teen birth rates have changed over the past decade and where clusters of high or low teen birth rates are evident. Published by Elsevier Ltd.

  12. Hot spots, cluster detection and spatial outlier analysis of teen birth rates in the U.S., 2003–2012

    PubMed Central

    Khan, Diba; Rossen, Lauren M.; Hamilton, Brady E.; He, Yulei; Wei, Rong; Dienes, Erin

    2017-01-01

    Teen birth rates have evidenced a significant decline in the United States over the past few decades. Most of the states in the US have mirrored this national decline, though some reports have illustrated substantial variation in the magnitude of these decreases across the U.S. Importantly, geographic variation at the county level has largely not been explored. We used National Vital Statistics Births data and Hierarchical Bayesian space-time interaction models to produce smoothed estimates of teen birth rates at the county level from 2003–2012. Results indicate that teen birth rates show evidence of clustering, where hot and cold spots occur, and identify spatial outliers. Findings from this analysis may help inform efforts targeting the prevention efforts by illustrating how geographic patterns of teen birth rates have changed over the past decade and where clusters of high or low teen birth rates are evident. PMID:28552189

  13. Joint model-based clustering of nonlinear longitudinal trajectories and associated time-to-event data analysis, linked by latent class membership: with application to AIDS clinical studies.

    PubMed

    Huang, Yangxin; Lu, Xiaosun; Chen, Jiaqing; Liang, Juan; Zangmeister, Miriam

    2017-10-27

    Longitudinal and time-to-event data are often observed together. Finite mixture models are currently used to analyze nonlinear heterogeneous longitudinal data, which, by releasing the homogeneity restriction of nonlinear mixed-effects (NLME) models, can cluster individuals into one of the pre-specified classes with class membership probabilities. This clustering may have clinical significance, and be associated with clinically important time-to-event data. This article develops a joint modeling approach to a finite mixture of NLME models for longitudinal data and proportional hazard Cox model for time-to-event data, linked by individual latent class indicators, under a Bayesian framework. The proposed joint models and method are applied to a real AIDS clinical trial data set, followed by simulation studies to assess the performance of the proposed joint model and a naive two-step model, in which finite mixture model and Cox model are fitted separately.

  14. Cluster mass profile reconstruction with size and flux magnification on the HST STAGES survey.

    PubMed

    Duncan, Christopher A J; Heymans, Catherine; Heavens, Alan F; Joachimi, Benjamin

    2016-03-21

    We present the first measurement of individual cluster mass estimates using weak lensing size and flux magnification. Using data from the HST STAGES (Space Telescope A901/902 Galaxy Evolution Survey) survey of the A901/902 supercluster we detect the four known groups in the supercluster at high significance using magnification alone. We discuss the application of a fully Bayesian inference analysis, and investigate a broad range of potential systematics in the application of the method. We compare our results to a previous weak lensing shear analysis of the same field finding the recovered signal-to-noise of our magnification-only analysis to range from 45 to 110 per cent of the signal-to-noise in the shear-only analysis. On a case-by-case basis we find consistent magnification and shear constraints on cluster virial radius, and finding that for the full sample, magnification constraints to be a factor 0.77 ± 0.18 lower than the shear measurements.

  15. Spatial quantile regression using INLA with applications to childhood overweight in Malawi.

    PubMed

    Mtambo, Owen P L; Masangwi, Salule J; Kazembe, Lawrence N M

    2015-04-01

    Analyses of childhood overweight have mainly used mean regression. However, using quantile regression is more appropriate as it provides flexibility to analyse the determinants of overweight corresponding to quantiles of interest. The main objective of this study was to fit a Bayesian additive quantile regression model with structured spatial effects for childhood overweight in Malawi using the 2010 Malawi DHS data. Inference was fully Bayesian using R-INLA package. The significant determinants of childhood overweight ranged from socio-demographic factors such as type of residence to child and maternal factors such as child age and maternal BMI. We observed significant positive structured spatial effects on childhood overweight in some districts of Malawi. We recommended that the childhood malnutrition policy makers should consider timely interventions based on risk factors as identified in this paper including spatial targets of interventions. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. Logistic Stick-Breaking Process

    PubMed Central

    Ren, Lu; Du, Lan; Carin, Lawrence; Dunson, David B.

    2013-01-01

    A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. The LSBP is also extended for the simultaneous processing of multiple data sets, yielding a hierarchical logistic stick-breaking process (H-LSBP). The model parameters (atoms) within the H-LSBP are shared across the multiple learning tasks. Efficient variational Bayesian inference is derived, and comparisons are made to related techniques in the literature. Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries. PMID:25258593

  17. Phylogenetic relationships of Malaysia’s long-tailed macaques, Macaca fascicularis, based on cytochrome b sequences

    PubMed Central

    Abdul-Latiff, Muhammad Abu Bakar; Ruslin, Farhani; Fui, Vun Vui; Abu, Mohd-Hashim; Rovie-Ryan, Jeffrine Japning; Abdul-Patah, Pazil; Lakim, Maklarin; Roos, Christian; Yaakop, Salmah; Md-Zain, Badrul Munir

    2014-01-01

    Abstract Phylogenetic relationships among Malaysia’s long-tailed macaques have yet to be established, despite abundant genetic studies of the species worldwide. The aims of this study are to examine the phylogenetic relationships of Macaca fascicularis in Malaysia and to test its classification as a morphological subspecies. A total of 25 genetic samples of M. fascicularis yielding 383 bp of Cytochrome b (Cyt b) sequences were used in phylogenetic analysis along with one sample each of M. nemestrina and M. arctoides used as outgroups. Sequence character analysis reveals that Cyt b locus is a highly conserved region with only 23% parsimony informative character detected among ingroups. Further analysis indicates a clear separation between populations originating from different regions; the Malay Peninsula versus Borneo Insular, the East Coast versus West Coast of the Malay Peninsula, and the island versus mainland Malay Peninsula populations. Phylogenetic trees (NJ, MP and Bayesian) portray a consistent clustering paradigm as Borneo’s population was distinguished from Peninsula’s population (99% and 100% bootstrap value in NJ and MP respectively and 1.00 posterior probability in Bayesian trees). The East coast population was separated from other Peninsula populations (64% in NJ, 66% in MP and 0.53 posterior probability in Bayesian). West coast populations were divided into 2 clades: the North-South (47%/54% in NJ, 26/26% in MP and 1.00/0.80 posterior probability in Bayesian) and Island-Mainland (93% in NJ, 90% in MP and 1.00 posterior probability in Bayesian). The results confirm the previous morphological assignment of 2 subspecies, M. f. fascicularis and M. f. argentimembris, in the Malay Peninsula. These populations should be treated as separate genetic entities in order to conserve the genetic diversity of Malaysia’s M. fascicularis. These findings are crucial in aiding the conservation management and translocation process of M. fascicularis populations in Malaysia. PMID:24899832

  18. Phylogenetic relationships of Malaysia's long-tailed macaques, Macaca fascicularis, based on cytochrome b sequences.

    PubMed

    Abdul-Latiff, Muhammad Abu Bakar; Ruslin, Farhani; Fui, Vun Vui; Abu, Mohd-Hashim; Rovie-Ryan, Jeffrine Japning; Abdul-Patah, Pazil; Lakim, Maklarin; Roos, Christian; Yaakop, Salmah; Md-Zain, Badrul Munir

    2014-01-01

    Phylogenetic relationships among Malaysia's long-tailed macaques have yet to be established, despite abundant genetic studies of the species worldwide. The aims of this study are to examine the phylogenetic relationships of Macaca fascicularis in Malaysia and to test its classification as a morphological subspecies. A total of 25 genetic samples of M. fascicularis yielding 383 bp of Cytochrome b (Cyt b) sequences were used in phylogenetic analysis along with one sample each of M. nemestrina and M. arctoides used as outgroups. Sequence character analysis reveals that Cyt b locus is a highly conserved region with only 23% parsimony informative character detected among ingroups. Further analysis indicates a clear separation between populations originating from different regions; the Malay Peninsula versus Borneo Insular, the East Coast versus West Coast of the Malay Peninsula, and the island versus mainland Malay Peninsula populations. Phylogenetic trees (NJ, MP and Bayesian) portray a consistent clustering paradigm as Borneo's population was distinguished from Peninsula's population (99% and 100% bootstrap value in NJ and MP respectively and 1.00 posterior probability in Bayesian trees). The East coast population was separated from other Peninsula populations (64% in NJ, 66% in MP and 0.53 posterior probability in Bayesian). West coast populations were divided into 2 clades: the North-South (47%/54% in NJ, 26/26% in MP and 1.00/0.80 posterior probability in Bayesian) and Island-Mainland (93% in NJ, 90% in MP and 1.00 posterior probability in Bayesian). The results confirm the previous morphological assignment of 2 subspecies, M. f. fascicularis and M. f. argentimembris, in the Malay Peninsula. These populations should be treated as separate genetic entities in order to conserve the genetic diversity of Malaysia's M. fascicularis. These findings are crucial in aiding the conservation management and translocation process of M. fascicularis populations in Malaysia.

  19. Empirical Bayes estimation of proportions with application to cowbird parasitism rates

    USGS Publications Warehouse

    Link, W.A.; Hahn, D.C.

    1996-01-01

    Bayesian models provide a structure for studying collections of parameters such as are considered in the investigation of communities, ecosystems, and landscapes. This structure allows for improved estimation of individual parameters, by considering them in the context of a group of related parameters. Individual estimates are differentially adjusted toward an overall mean, with the magnitude of their adjustment based on their precision. Consequently, Bayesian estimation allows for a more credible identification of extreme values in a collection of estimates. Bayesian models regard individual parameters as values sampled from a specified probability distribution, called a prior. The requirement that the prior be known is often regarded as an unattractive feature of Bayesian analysis and may be the reason why Bayesian analyses are not frequently applied in ecological studies. Empirical Bayes methods provide an alternative approach that incorporates the structural advantages of Bayesian models while requiring a less stringent specification of prior knowledge. Rather than requiring that the prior distribution be known, empirical Bayes methods require only that it be in a certain family of distributions, indexed by hyperparameters that can be estimated from the available data. This structure is of interest per se, in addition to its value in allowing for improved estimation of individual parameters; for example, hypotheses regarding the existence of distinct subgroups in a collection of parameters can be considered under the empirical Bayes framework by allowing the hyperparameters to vary among subgroups. Though empirical Bayes methods have been applied in a variety of contexts, they have received little attention in the ecological literature. We describe the empirical Bayes approach in application to estimation of proportions, using data obtained in a community-wide study of cowbird parasitism rates for illustration. Since observed proportions based on small sample sizes are heavily adjusted toward the mean, extreme values among empirical Bayes estimates identify those species for which there is the greatest evidence of extreme parasitism rates. Applying a subgroup analysis to our data on cowbird parasitism rates, we conclude that parasitism rates for Neotropical Migrants as a group are no greater than those of Resident/Short-distance Migrant species in this forest community. Our data and analyses demonstrate that the parasitism rates for certain Neotropical Migrant species are remarkably low (Wood Thrush and Rose-breasted Grosbeak) while those for others are remarkably high (Ovenbird and Red-eyed Vireo).

  20. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kwon, Deukwoo; Little, Mark P.; Miller, Donald L.

    Purpose: To determine more accurate regression formulas for estimating peak skin dose (PSD) from reference air kerma (RAK) or kerma-area product (KAP). Methods: After grouping of the data from 21 procedures into 13 clinically similar groups, assessments were made of optimal clustering using the Bayesian information criterion to obtain the optimal linear regressions of (log-transformed) PSD vs RAK, PSD vs KAP, and PSD vs RAK and KAP. Results: Three clusters of clinical groups were optimal in regression of PSD vs RAK, seven clusters of clinical groups were optimal in regression of PSD vs KAP, and six clusters of clinical groupsmore » were optimal in regression of PSD vs RAK and KAP. Prediction of PSD using both RAK and KAP is significantly better than prediction of PSD with either RAK or KAP alone. The regression of PSD vs RAK provided better predictions of PSD than the regression of PSD vs KAP. The partial-pooling (clustered) method yields smaller mean squared errors compared with the complete-pooling method.Conclusion: PSD distributions for interventional radiology procedures are log-normal. Estimates of PSD derived from RAK and KAP jointly are most accurate, followed closely by estimates derived from RAK alone. Estimates of PSD derived from KAP alone are the least accurate. Using a stochastic search approach, it is possible to cluster together certain dissimilar types of procedures to minimize the total error sum of squares.« less

  1. Population structure and migration pattern of a conifer pathogen, Grosmannia clavigera, as influenced by its symbiont, the mountain pine beetle.

    PubMed

    Tsui, Clement K M; Roe, Amanda D; El-Kassaby, Yousry A; Rice, Adrianne V; Alamouti, Sepideh M; Sperling, Felix A H; Cooke, Janice E K; Bohlmann, Jörg; Hamelin, Richard C

    2012-01-01

    We investigated the population structure of Grosmannia clavigera (Gc), a fungal symbiont of the mountain pine beetle (MPB) that plays a crucial role in the establishment and reproductive success of this pathogen. This insect-fungal complex has destroyed over 16 million ha of lodgepole pine forests in Canada, the largest MPB epidemic in recorded history. During this current epidemic, MPB has expanded its range beyond historically recorded boundaries, both northward and eastward, and has now reached the jack pine of Alberta, potentially threatening the Canadian boreal forest. To better understand the dynamics between the beetle and its fungal symbiont, we sampled 19 populations in western North America and genotyped individuals from these populations with eight microsatellite markers. The fungus displayed high haplotype diversity, with over 250 unique haplotypes observed in 335 single spore isolates. Linkage equilibria in 13 of the 19 populations suggested that the fungus reproduces sexually. Bayesian clustering and distance analyses identified four genetic clusters that corresponded to four major geographical regions, which suggested that the epidemic arose from multiple geographical sources. A genetic cluster north of the Rocky Mountains, where the MPB has recently become established, experienced a population bottleneck, probably as a result of the recent range expansion. The two genetic clusters located north and west of the Rocky Mountains contained many fungal isolates admixed from all populations, possibly due to the massive movement of MPB during the epidemic. The general agreement in north-south differentiation of MPB and G. clavigera populations points to the fungal pathogen's dependence on the movement of its insect vector. In addition, the patterns of diversity and the individual assignment tests of the fungal associate suggest that migration across the Rocky Mountains occurred via a northeastern corridor, in accordance with meteorological patterns and observation of MPB movement data. Our results highlight the potential of this pathogen for both expansion and sexual reproduction, and also identify some possible barriers to gene flow. Understanding the ecological and evolutionary dynamics of this fungus-beetle association is important for the modelling and prediction of MPB epidemics. © 2011 Crown in the right of Canada.

  2. Genetic diversity and divergence at the Arbutus unedo L. (Ericaceae) westernmost distribution limit.

    PubMed

    Ribeiro, Maria Margarida; Piotti, Andrea; Ricardo, Alexandra; Gaspar, Daniel; Costa, Rita; Parducci, Laura; Vendramin, Giovanni Giuseppe

    2017-01-01

    Mediterranean forests are fragile ecosystems vulnerable to recent global warming and reduction of precipitation, and a long-term negative effect is expected on vegetation with increasing drought and in areas burnt by fires. We investigated the spatial distribution of genetic variation of Arbutus unedo in the western Iberia Peninsula, using plastid markers with conservation and provenance regions design purposes. This species is currently undergoing an intense domestication process in the region, and, like other species, is increasingly under the threat from climate change, habitat fragmentation and wildfires. We sampled 451 trees from 15 natural populations from different ecological conditions spanning the whole species' distribution range in the region. We applied Bayesian analysis and identified four clusters (north, centre, south, and a single-population cluster). Hierarchical AMOVA showed higher differentiation among clusters than among populations within clusters. The relatively low within-clusters differentiation can be explained by a common postglacial history of nearby populations. The genetic structure found, supported by the few available palaeobotanical records, cannot exclude the hypothesis of two independent A. unedo refugia in western Iberia Peninsula during the Last Glacial Maximum. Based on the results we recommend a conservation strategy by selecting populations for conservation based on their allelic richness and diversity and careful seed transfer consistent with current species' genetic structure.

  3. HICOSMO - X-ray analysis of a complete sample of galaxy clusters

    NASA Astrophysics Data System (ADS)

    Schellenberger, G.; Reiprich, T.

    2017-10-01

    Galaxy clusters are known to be the largest virialized objects in the Universe. Based on the theory of structure formation one can use them as cosmological probes, since they originate from collapsed overdensities in the early Universe and witness its history. The X-ray regime provides the unique possibility to measure in detail the most massive visible component, the intra cluster medium. Using Chandra observations of a local sample of 64 bright clusters (HIFLUGCS) we provide total (hydrostatic) and gas mass estimates of each cluster individually. Making use of the completeness of the sample we quantify two interesting cosmological parameters by a Bayesian cosmological likelihood analysis. We find Ω_{M}=0.3±0.01 and σ_{8}=0.79±0.03 (statistical uncertainties) using our default analysis strategy combining both, a mass function analysis and the gas mass fraction results. The main sources of biases that we discuss and correct here are (1) the influence of galaxy groups (higher incompleteness in parent samples and a differing behavior of the L_{x} - M relation), (2) the hydrostatic mass bias (as determined by recent hydrodynamical simulations), (3) the extrapolation of the total mass (comparing various methods), (4) the theoretical halo mass function and (5) other cosmological (non-negligible neutrino mass), and instrumental (calibration) effects.

  4. ALMA-SZ Detection of a Galaxy Cluster Merger Shock at Half the Age of the Universe

    NASA Astrophysics Data System (ADS)

    Basu, K.; Sommer, M.; Erler, J.; Eckert, D.; Vazza, F.; Magnelli, B.; Bertoldi, F.; Tozzi, P.

    2016-10-01

    We present ALMA measurements of a merger shock using the thermal Sunyaev-Zel’dovich (SZ) effect signal, at the location of a radio relic in the famous El Gordo galaxy cluster at z≈ 0.9. Multi-wavelength analysis in combination with the archival Chandra data and a high-resolution radio image provides a consistent picture of the thermal and non-thermal signal variation across the shock front and helps to put robust constraints on the shock Mach number as well as the relic magnetic field. We employ a Bayesian analysis technique for modeling the SZ and X-ray data self-consistently, illustrating respective parameter degeneracies. Combined results indicate a shock with Mach number { M }={2.4}-0.6+1.3, which in turn suggests a high value of the magnetic field (of the order of 4-10 μ {{G}}) to account for the observed relic width at 2 GHz. At roughly half the current age of the universe, this is the highest-redshift direct detection of a cluster shock to date, and one of the first instances of an ALMA-SZ observation in a galaxy cluster. It shows the tremendous potential for future ALMA-SZ observations to detect merger shocks and other cluster substructures out to the highest redshifts.

  5. Genetic diversity and divergence at the Arbutus unedo L. (Ericaceae) westernmost distribution limit

    PubMed Central

    Ribeiro, Maria Margarida; Piotti, Andrea; Ricardo, Alexandra; Gaspar, Daniel; Costa, Rita; Parducci, Laura; Vendramin, Giovanni Giuseppe

    2017-01-01

    Mediterranean forests are fragile ecosystems vulnerable to recent global warming and reduction of precipitation, and a long-term negative effect is expected on vegetation with increasing drought and in areas burnt by fires. We investigated the spatial distribution of genetic variation of Arbutus unedo in the western Iberia Peninsula, using plastid markers with conservation and provenance regions design purposes. This species is currently undergoing an intense domestication process in the region, and, like other species, is increasingly under the threat from climate change, habitat fragmentation and wildfires. We sampled 451 trees from 15 natural populations from different ecological conditions spanning the whole species’ distribution range in the region. We applied Bayesian analysis and identified four clusters (north, centre, south, and a single-population cluster). Hierarchical AMOVA showed higher differentiation among clusters than among populations within clusters. The relatively low within-clusters differentiation can be explained by a common postglacial history of nearby populations. The genetic structure found, supported by the few available palaeobotanical records, cannot exclude the hypothesis of two independent A. unedo refugia in western Iberia Peninsula during the Last Glacial Maximum. Based on the results we recommend a conservation strategy by selecting populations for conservation based on their allelic richness and diversity and careful seed transfer consistent with current species’ genetic structure. PMID:28384294

  6. Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model.

    PubMed

    Jääskinen, Väinö; Parkkinen, Ville; Cheng, Lu; Corander, Jukka

    2014-02-01

    In many biological applications it is necessary to cluster DNA sequences into groups that represent underlying organismal units, such as named species or genera. In metagenomics this grouping needs typically to be achieved on the basis of relatively short sequences which contain different types of errors, making the use of a statistical modeling approach desirable. Here we introduce a novel method for this purpose by developing a stochastic partition model that clusters Markov chains of a given order. The model is based on a Dirichlet process prior and we use conjugate priors for the Markov chain parameters which enables an analytical expression for comparing the marginal likelihoods of any two partitions. To find a good candidate for the posterior mode in the partition space, we use a hybrid computational approach which combines the EM-algorithm with a greedy search. This is demonstrated to be faster and yield highly accurate results compared to earlier suggested clustering methods for the metagenomics application. Our model is fairly generic and could also be used for clustering of other types of sequence data for which Markov chains provide a reasonable way to compress information, as illustrated by experiments on shotgun sequence type data from an Escherichia coli strain.

  7. Compulsive buying disorder clustering based on sex, age, onset and personality traits.

    PubMed

    Granero, Roser; Fernández-Aranda, Fernando; Baño, Marta; Steward, Trevor; Mestre-Bach, Gemma; Del Pino-Gutiérrez, Amparo; Moragas, Laura; Mallorquí-Bagué, Núria; Aymamí, Neus; Goméz-Peña, Mónica; Tárrega, Salomé; Menchón, José M; Jiménez-Murcia, Susana

    2016-07-01

    In spite of the revived interest in compulsive buying disorder (CBD), its classification into the contemporary nosologic systems continues to be debated, and scarce studies have addressed heterogeneity in the clinical phenotype through methodologies based on a person-centered approach. To identify empirical clusters of CBD employing personality traits, as well as patients' sex, age and the age of CBD onset as indicators. An agglomerative hierarchical clustering method defining a combination of the Schwarz Bayesian Information Criterion and log-likelihood was used. Three clusters were identified in a sample of n=110 patients attending a specialized CBD unit a) "male compulsive buyers" reported the highest prevalence of comorbid gambling disorder and the lowest levels of reward dependence; b) "female low-dysfunctional" mainly included employed women, with the highest level of education, the oldest age of onset, the lowest scores in harm avoidance and the highest levels of persistence, self-directedness and cooperativeness; and c) "female highly-dysfunctional" with the youngest age of onset, the highest levels of comorbid psychopathology and harm avoidance, and the lowest score in self-directedness. Sociodemographic characteristics and personality traits can be used to determine CBD clusters which represent different clinical subtypes. These subtypes should be considered when developing assessment instruments, preventive programs and treatment interventions. Copyright © 2016 Elsevier Inc. All rights reserved.

  8. Conditional clustering of temporal expression profiles

    PubMed Central

    Wang, Ling; Montano, Monty; Rarick, Matt; Sebastiani, Paola

    2008-01-01

    Background Many microarray experiments produce temporal profiles in different biological conditions but common cluster techniques are not able to analyze the data conditional on the biological conditions. Results This article presents a novel technique to cluster data from time course microarray experiments performed across several experimental conditions. Our algorithm uses polynomial models to describe the gene expression patterns over time, a full Bayesian approach with proper conjugate priors to make the algorithm invariant to linear transformations, and an iterative procedure to identify genes that have a common temporal expression profile across two or more experimental conditions, and genes that have a unique temporal profile in a specific condition. Conclusion We use simulated data to evaluate the effectiveness of this new algorithm in finding the correct number of clusters and in identifying genes with common and unique profiles. We also use the algorithm to characterize the response of human T cells to stimulations of antigen-receptor signaling gene expression temporal profiles measured in six different biological conditions and we identify common and unique genes. These studies suggest that the methodology proposed here is useful in identifying and distinguishing uniquely stimulated genes from commonly stimulated genes in response to variable stimuli. Software for using this clustering method is available from the project home page. PMID:18334028

  9. Specialist and generalist symbionts show counterintuitive levels of genetic diversity and discordant demographic histories along the Florida Reef Tract

    NASA Astrophysics Data System (ADS)

    Titus, Benjamin M.; Daly, Marymegan

    2017-03-01

    Specialist and generalist life histories are expected to result in contrasting levels of genetic diversity at the population level, and symbioses are expected to lead to patterns that reflect a shared biogeographic history and co-diversification. We test these assumptions using mtDNA sequencing and a comparative phylogeographic approach for six co-occurring crustacean species that are symbiotic with sea anemones on western Atlantic coral reefs, yet vary in their host specificities: four are host specialists and two are host generalists. We first conducted species discovery analyses to delimit cryptic lineages, followed by classic population genetic diversity analyses for each delimited taxon, and then reconstructed the demographic history for each taxon using traditional summary statistics, Bayesian skyline plots, and approximate Bayesian computation to test for signatures of recent and concerted population expansion. The genetic diversity values recovered here contravene the expectations of the specialist-generalist variation hypothesis and classic population genetics theory; all specialist lineages had greater genetic diversity than generalists. Demography suggests recent population expansions in all taxa, although Bayesian skyline plots and approximate Bayesian computation suggest the timing and magnitude of these events were idiosyncratic. These results do not meet the a priori expectation of concordance among symbiotic taxa and suggest that intrinsic aspects of species biology may contribute more to phylogeographic history than extrinsic forces that shape whole communities. The recovery of two cryptic specialist lineages adds an additional layer of biodiversity to this symbiosis and contributes to an emerging pattern of cryptic speciation in the specialist taxa. Our results underscore the differences in the evolutionary processes acting on marine systems from the terrestrial processes that often drive theory. Finally, we continue to highlight the Florida Reef Tract as an important biodiversity hotspot.

  10. Inferring Population Genetic Structure in Widely and Continuously Distributed Carnivores: The Stone Marten (Martes foina) as a Case Study

    PubMed Central

    Vergara, María; Basto, Mafalda P.; Madeira, María José; Gómez-Moliner, Benjamín J.; Santos-Reis, Margarida; Fernandes, Carlos; Ruiz-González, Aritz

    2015-01-01

    The stone marten is a widely distributed mustelid in the Palaearctic region that exhibits variable habitat preferences in different parts of its range. The species is a Holocene immigrant from southwest Asia which, according to fossil remains, followed the expansion of the Neolithic farming cultures into Europe and possibly colonized the Iberian Peninsula during the Early Neolithic (ca. 7,000 years BP). However, the population genetic structure and historical biogeography of this generalist carnivore remains essentially unknown. In this study we have combined mitochondrial DNA (mtDNA) sequencing (621 bp) and microsatellite genotyping (23 polymorphic markers) to infer the population genetic structure of the stone marten within the Iberian Peninsula. The mtDNA data revealed low haplotype and nucleotide diversities and a lack of phylogeographic structure, most likely due to a recent colonization of the Iberian Peninsula by a few mtDNA lineages during the Early Neolithic. The microsatellite data set was analysed with a) spatial and non-spatial Bayesian individual-based clustering (IBC) approaches (STRUCTURE, TESS, BAPS and GENELAND), and b) multivariate methods [discriminant analysis of principal components (DAPC) and spatial principal component analysis (sPCA)]. Additionally, because isolation by distance (IBD) is a common spatial genetic pattern in mobile and continuously distributed species and it may represent a challenge to the performance of the above methods, the microsatellite data set was tested for its presence. Overall, the genetic structure of the stone marten in the Iberian Peninsula was characterized by a NE-SW spatial pattern of IBD, and this may explain the observed disagreement between clustering solutions obtained by the different IBC methods. However, there was significant indication for contemporary genetic structuring, albeit weak, into at least three different subpopulations. The detected subdivision could be attributed to the influence of the rivers Ebro, Tagus and Guadiana, suggesting that main watercourses in the Iberian Peninsula may act as semi-permeable barriers to gene flow in stone martens. To our knowledge, this is the first phylogeographic and population genetic study of the species at a broad regional scale. We also wanted to make the case for the importance and benefits of using and comparing multiple different clustering and multivariate methods in spatial genetic analyses of mobile and continuously distributed species. PMID:26222680

  11. Inferring Population Genetic Structure in Widely and Continuously Distributed Carnivores: The Stone Marten (Martes foina) as a Case Study.

    PubMed

    Vergara, María; Basto, Mafalda P; Madeira, María José; Gómez-Moliner, Benjamín J; Santos-Reis, Margarida; Fernandes, Carlos; Ruiz-González, Aritz

    2015-01-01

    The stone marten is a widely distributed mustelid in the Palaearctic region that exhibits variable habitat preferences in different parts of its range. The species is a Holocene immigrant from southwest Asia which, according to fossil remains, followed the expansion of the Neolithic farming cultures into Europe and possibly colonized the Iberian Peninsula during the Early Neolithic (ca. 7,000 years BP). However, the population genetic structure and historical biogeography of this generalist carnivore remains essentially unknown. In this study we have combined mitochondrial DNA (mtDNA) sequencing (621 bp) and microsatellite genotyping (23 polymorphic markers) to infer the population genetic structure of the stone marten within the Iberian Peninsula. The mtDNA data revealed low haplotype and nucleotide diversities and a lack of phylogeographic structure, most likely due to a recent colonization of the Iberian Peninsula by a few mtDNA lineages during the Early Neolithic. The microsatellite data set was analysed with a) spatial and non-spatial Bayesian individual-based clustering (IBC) approaches (STRUCTURE, TESS, BAPS and GENELAND), and b) multivariate methods [discriminant analysis of principal components (DAPC) and spatial principal component analysis (sPCA)]. Additionally, because isolation by distance (IBD) is a common spatial genetic pattern in mobile and continuously distributed species and it may represent a challenge to the performance of the above methods, the microsatellite data set was tested for its presence. Overall, the genetic structure of the stone marten in the Iberian Peninsula was characterized by a NE-SW spatial pattern of IBD, and this may explain the observed disagreement between clustering solutions obtained by the different IBC methods. However, there was significant indication for contemporary genetic structuring, albeit weak, into at least three different subpopulations. The detected subdivision could be attributed to the influence of the rivers Ebro, Tagus and Guadiana, suggesting that main watercourses in the Iberian Peninsula may act as semi-permeable barriers to gene flow in stone martens. To our knowledge, this is the first phylogeographic and population genetic study of the species at a broad regional scale. We also wanted to make the case for the importance and benefits of using and comparing multiple different clustering and multivariate methods in spatial genetic analyses of mobile and continuously distributed species.

  12. Star Cluster Properties in Two LEGUS Galaxies Computed with Stochastic Stellar Population Synthesis Models

    NASA Astrophysics Data System (ADS)

    Krumholz, Mark R.; Adamo, Angela; Fumagalli, Michele; Wofford, Aida; Calzetti, Daniela; Lee, Janice C.; Whitmore, Bradley C.; Bright, Stacey N.; Grasha, Kathryn; Gouliermis, Dimitrios A.; Kim, Hwihyun; Nair, Preethi; Ryon, Jenna E.; Smith, Linda J.; Thilker, David; Ubeda, Leonardo; Zackrisson, Erik

    2015-10-01

    We investigate a novel Bayesian analysis method, based on the Stochastically Lighting Up Galaxies (slug) code, to derive the masses, ages, and extinctions of star clusters from integrated light photometry. Unlike many analysis methods, slug correctly accounts for incomplete initial mass function (IMF) sampling, and returns full posterior probability distributions rather than simply probability maxima. We apply our technique to 621 visually confirmed clusters in two nearby galaxies, NGC 628 and NGC 7793, that are part of the Legacy Extragalactic UV Survey (LEGUS). LEGUS provides Hubble Space Telescope photometry in the NUV, U, B, V, and I bands. We analyze the sensitivity of the derived cluster properties to choices of prior probability distribution, evolutionary tracks, IMF, metallicity, treatment of nebular emission, and extinction curve. We find that slug's results for individual clusters are insensitive to most of these choices, but that the posterior probability distributions we derive are often quite broad, and sometimes multi-peaked and quite sensitive to the choice of priors. In contrast, the properties of the cluster population as a whole are relatively robust against all of these choices. We also compare our results from slug to those derived with a conventional non-stochastic fitting code, Yggdrasil. We show that slug's stochastic models are generally a better fit to the observations than the deterministic ones used by Yggdrasil. However, the overall properties of the cluster populations recovered by both codes are qualitatively similar.

  13. Phylogeny and variability of Colletotrichum truncatum associated with soybean anthracnose in Brazil.

    PubMed

    Rogério, F; Ciampi-Guillardi, M; Barbieri, M C G; Bragança, C A D; Seixas, C D S; Almeida, A M R; Massola, N S

    2017-02-01

    Fungal diseases are among the main factors limiting high yields of soybean crop. Colletotrichum isolates from soybean plants with anthracnose symptoms were studied from different regions and time periods in Brazil using molecular, morphological and pathogenic analyses. Bayesian phylogenetic inference of GAPDH, HIS3 and ITS-5.8S rDNA sequences, the morphologies of colony and conidia, and inoculation tests on seeds and seedlings were performed. All isolates clustered only with Colletotrichum truncatum species in three well-separated clusters. Intraspecific genetic diversity revealed 27 distinct haplotypes in 51 fungal isolates; some of which were identical to C. truncatum sequences from other regions around the world, while others were related to alternative hosts. Conidia were falcate, hyaline, unicellular and aseptate, formed in acervuli, with variable dimensions. Despite being pathogenic to seedlings by both inoculation methods, variation was observed in the aggressiveness of the tested isolates, which was not correlated with genetic variation. The identification of C. truncatum in the sampled isolates was evidenced as being the only causal agent of soybean anthracnose in Brazil until 2007, with relevant genetic, morphological and pathogenic variability as well as a broad geographical origin. The wide distribution of the predominant C. truncatum haplotype indicated the existence of a highly efficient mechanism of pathogen dispersal over long distances, reinforcing the role of seeds as the primary source of disease inoculum. The characterization and distribution of Colletotrichum species in soybean-producing regions in Brazil is fundamental for understanding the disease epidemiology and for ensuring effective control strategies against anthracnose. © 2016 The Society for Applied Microbiology.

  14. Geography of Adolescent Obesity in the U.S., 2007-2011.

    PubMed

    Kramer, Michael R; Raskind, Ilana G; Van Dyke, Miriam E; Matthews, Stephen A; Cook-Smith, Jessica N

    2016-12-01

    Obesity remains a significant threat to the current and long-term health of U.S. adolescents. The authors developed county-level estimates of adolescent obesity for the contiguous U.S., and then explored the association between 23 conceptually derived area-based correlates of adolescent obesity and ecologic obesity prevalence. Multilevel small area regression methods applied to the 2007 and 2011-2012 National Survey of Children's Health produced county-level obesity prevalence estimates for children aged 10-17 years. Exploratory multivariable Bayesian regression estimated the cross-sectional association between nutrition, activity, and macrosocial characteristics of counties and states, and county-level obesity prevalence. All analyses were conducted in 2015. Adolescent obesity varies geographically with clusters of high prevalence in the Deep South and Southern Appalachian regions. Geographic disparities and clustering in observed data are largely explained by hypothesized area-based variables. In adjusted models, activity environment, but not nutrition environment variables were associated with county-level obesity prevalence. County violent crime was associated with higher obesity, whereas recreational facility density was associated with lower obesity. Measures of the macrosocial and relational domain, including community SES, community health, and social marginalization, were the strongest correlates of county-level obesity. County-level estimates of adolescent obesity demonstrate notable geographic disparities, which are largely explained by conceptually derived area-based contextual measures. This ecologic exploratory study highlights the importance of taking a multidimensional approach to understanding the social and community context in which adolescents make obesity-relevant behavioral choices. Copyright © 2016 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.

  15. Classification of California streams using combined deductive and inductive approaches: Setting the foundation for analysis of hydrologic alteration

    USGS Publications Warehouse

    Pyne, Matthew I.; Carlisle, Daren M.; Konrad, Christopher P.; Stein, Eric D.

    2017-01-01

    Regional classification of streams is an early step in the Ecological Limits of Hydrologic Alteration framework. Many stream classifications are based on an inductive approach using hydrologic data from minimally disturbed basins, but this approach may underrepresent streams from heavily disturbed basins or sparsely gaged arid regions. An alternative is a deductive approach, using watershed climate, land use, and geomorphology to classify streams, but this approach may miss important hydrological characteristics of streams. We classified all stream reaches in California using both approaches. First, we used Bayesian and hierarchical clustering to classify reaches according to watershed characteristics. Streams were clustered into seven classes according to elevation, sedimentary rock, and winter precipitation. Permutation-based analysis of variance and random forest analyses were used to determine which hydrologic variables best separate streams into their respective classes. Stream typology (i.e., the class that a stream reach is assigned to) is shaped mainly by patterns of high and mean flow behavior within the stream's landscape context. Additionally, random forest was used to determine which hydrologic variables best separate minimally disturbed reference streams from non-reference streams in each of the seven classes. In contrast to stream typology, deviation from reference conditions is more difficult to detect and is largely defined by changes in low-flow variables, average daily flow, and duration of flow. Our combined deductive/inductive approach allows us to estimate flow under minimally disturbed conditions based on the deductive analysis and compare to measured flow based on the inductive analysis in order to estimate hydrologic change.

  16. DNA methylation alterations in response to pesticide exposure in vitro

    PubMed Central

    Zhang, Xiao; Wallace, Andrew D.; Du, Pan; Kibbe, Warren A.; Jafari, Nadereh; Xie, Hehuang; Lin, Simon; Baccarelli, Andrea; Soares, Marcelo Bento; Hou, Lifang

    2013-01-01

    Although pesticides are subject to extensive carcinogenicity testing before regulatory approval, pesticide exposure has repeatedly been associated with various cancers. This suggests that pesticides may cause cancer via non-mutagenicity mechanisms. The present study provides evidence to support the hypothesis that pesticide-induced cancer may be mediated in part by epigenetic mechanisms. We examined whether exposure to 7 commonly used pesticides (i.e., fonofos, parathion, terbufos, chlorpyrifos, diazinon, malathion, and phorate) induces DNA methylation alterations in vitro. We conducted genome-wide DNA methylation analyses on DNA samples obtained from the human hematopoietic K562 cell line exposed to ethanol (control) and several OPs using the Illumina Infinium HumanMethylation27 BeadChip. Bayesian-adjusted t-tests were used to identify differentially methylated gene promoter CpG sites. In this report, we present our results on three pesticides (fonofos, parathion, and terbufos) that clustered together based on principle component analysis and hierarchical clustering. These three pesticides induced similar methylation changes in the promoter regions of 712 genes, while also exhibiting their own OP-specific methylation alterations. Functional analysis of methylation changes specific to each OP, or common to all three OPs, revealed that differential methylation was associated with numerous genes that are involved in carcinogenesis-related processes. Our results provide experimental evidence that pesticides may modify gene promoter DNA methylation levels, suggesting that epigenetic mechanisms may contribute to pesticide-induced carcinogenesis. Further studies in other cell types and human samples are required, as well as determining the impact of these methylation changes on gene expression. PMID:22847954

  17. Herbarium specimens reveal a historical shift in phylogeographic structure of common ragweed during native range disturbance.

    PubMed

    Martin, Michael D; Zimmer, Elizabeth A; Olsen, Morten T; Foote, Andrew D; Gilbert, M Thomas P; Brush, Grace S

    2014-04-01

    Invasive plants provide ample opportunity to study evolutionary shifts that occur after introduction to novel environments. However, although genetic characters pre-dating introduction can be important determinants of later success, large-scale investigations of historical genetic structure have not been feasible. Common ragweed (Ambrosia artemisiifolia L.) is an invasive weed native to North America that is known for its allergenic pollen. Palynological records from sediment cores indicate that this species was uncommon before European colonization of North America, and ragweed populations expanded rapidly as settlers deforested the landscape on a massive scale, later becoming an aggressive invasive with populations established globally. Towards a direct comparison of genetic structure now and during intense anthropogenic disturbance of the late 19th century, we sampled 45 natural populations of common ragweed across its native range as well as historical herbarium specimens collected up to 140 years ago. Bayesian clustering analyses of 453 modern and 473 historical samples genotyped at three chloroplast spacer regions and six nuclear microsatellite loci reveal that historical ragweed's spatial genetic structure mirrors both the palaeo-record of Ambrosia pollen deposition and the historical pattern of agricultural density across the landscape. Furthermore, for unknown reasons, this spatial genetic pattern has changed substantially in the intervening years. Following on previous work relating morphology and genetic expression between plants collected from eastern North America and Western Europe, we speculate that the cluster associated with humans' rapid transformation of the landscape is a likely source of these aggressive invasive populations. © 2014 John Wiley & Sons Ltd.

  18. An Empirical Typology of Perfectionism in Academically Talented Children.

    ERIC Educational Resources Information Center

    Parker, Wayne D.

    1997-01-01

    A national sample of 820 academically talented children took the Multidimensional Perfectionism Scale. Cluster analyses of scores found a three-cluster solution. Further analyses indicated that these clusters were: nonperfectionistic (32.%), healthy perfectionistic (41.7%), and dysfunctional perfectionistic (25.5%). The construct of perfectionism…

  19. Built environment and Property Crime in Seattle, 1998-2000: A Bayesian Analysis.

    PubMed

    Matthews, Stephen A; Yang, Tse-Chuan; Hayslett-McCall, Karen L; Ruback, R Barry

    2010-06-01

    The past decade has seen a rapid growth in the use of a spatial perspective in studies of crime. In part this growth has been driven by the availability of georeferenced data, and the tools to analyze and visualize them: geographic information systems (GIS), spatial analysis, and spatial statistics. In this paper we use exploratory spatial data analysis (ESDA) tools and Bayesian models to help better understand the spatial patterning and predictors of property crime in Seattle, Washington for 1998-2000, including a focus on built environment variables. We present results for aggregate property crime data as well as models for specific property crime types: residential burglary, nonresidential burglary, theft, auto theft, and arson. ESDA confirms the presence of spatial clustering of property crime and we seek to explain these patterns using spatial Poisson models implemented in WinBUGS. Our results indicate that built environment variables were significant predictors of property crime, especially the presence of a highway on auto theft and burglary.

  20. COSMOABC: Likelihood-free inference via Population Monte Carlo Approximate Bayesian Computation

    NASA Astrophysics Data System (ADS)

    Ishida, E. E. O.; Vitenti, S. D. P.; Penna-Lima, M.; Cisewski, J.; de Souza, R. S.; Trindade, A. M. M.; Cameron, E.; Busti, V. C.; COIN Collaboration

    2015-11-01

    Approximate Bayesian Computation (ABC) enables parameter inference for complex physical systems in cases where the true likelihood function is unknown, unavailable, or computationally too expensive. It relies on the forward simulation of mock data and comparison between observed and synthetic catalogues. Here we present COSMOABC, a Python ABC sampler featuring a Population Monte Carlo variation of the original ABC algorithm, which uses an adaptive importance sampling scheme. The code is very flexible and can be easily coupled to an external simulator, while allowing to incorporate arbitrary distance and prior functions. As an example of practical application, we coupled COSMOABC with the NUMCOSMO library and demonstrate how it can be used to estimate posterior probability distributions over cosmological parameters based on measurements of galaxy clusters number counts without computing the likelihood function. COSMOABC is published under the GPLv3 license on PyPI and GitHub and documentation is available at http://goo.gl/SmB8EX.

  1. Missing-value estimation using linear and non-linear regression with Bayesian gene selection.

    PubMed

    Zhou, Xiaobo; Wang, Xiaodong; Dougherty, Edward R

    2003-11-22

    Data from microarray experiments are usually in the form of large matrices of expression levels of genes under different experimental conditions. Owing to various reasons, there are frequently missing values. Estimating these missing values is important because they affect downstream analysis, such as clustering, classification and network design. Several methods of missing-value estimation are in use. The problem has two parts: (1) selection of genes for estimation and (2) design of an estimation rule. We propose Bayesian variable selection to obtain genes to be used for estimation, and employ both linear and nonlinear regression for the estimation rule itself. Fast implementation issues for these methods are discussed, including the use of QR decomposition for parameter estimation. The proposed methods are tested on data sets arising from hereditary breast cancer and small round blue-cell tumors. The results compare very favorably with currently used methods based on the normalized root-mean-square error. The appendix is available from http://gspsnap.tamu.edu/gspweb/zxb/missing_zxb/ (user: gspweb; passwd: gsplab).

  2. Built environment and Property Crime in Seattle, 1998–2000: A Bayesian Analysis

    PubMed Central

    Matthews, Stephen A.; Yang, Tse-chuan; Hayslett-McCall, Karen L.; Ruback, R. Barry

    2014-01-01

    The past decade has seen a rapid growth in the use of a spatial perspective in studies of crime. In part this growth has been driven by the availability of georeferenced data, and the tools to analyze and visualize them: geographic information systems (GIS), spatial analysis, and spatial statistics. In this paper we use exploratory spatial data analysis (ESDA) tools and Bayesian models to help better understand the spatial patterning and predictors of property crime in Seattle, Washington for 1998–2000, including a focus on built environment variables. We present results for aggregate property crime data as well as models for specific property crime types: residential burglary, nonresidential burglary, theft, auto theft, and arson. ESDA confirms the presence of spatial clustering of property crime and we seek to explain these patterns using spatial Poisson models implemented in WinBUGS. Our results indicate that built environment variables were significant predictors of property crime, especially the presence of a highway on auto theft and burglary. PMID:24737924

  3. Probabilisitc Geobiological Classification Using Elemental Abundance Distributions and Lossless Image Compression in Recent and Modern Organisms

    NASA Technical Reports Server (NTRS)

    Storrie-Lombardi, Michael C.; Hoover, Richard B.

    2005-01-01

    Last year we presented techniques for the detection of fossils during robotic missions to Mars using both structural and chemical signatures[Storrie-Lombardi and Hoover, 2004]. Analyses included lossless compression of photographic images to estimate the relative complexity of a putative fossil compared to the rock matrix [Corsetti and Storrie-Lombardi, 2003] and elemental abundance distributions to provide mineralogical classification of the rock matrix [Storrie-Lombardi and Fisk, 2004]. We presented a classification strategy employing two exploratory classification algorithms (Principal Component Analysis and Hierarchical Cluster Analysis) and non-linear stochastic neural network to produce a Bayesian estimate of classification accuracy. We now present an extension of our previous experiments exploring putative fossil forms morphologically resembling cyanobacteria discovered in the Orgueil meteorite. Elemental abundances (C6, N7, O8, Na11, Mg12, Ai13, Si14, P15, S16, Cl17, K19, Ca20, Fe26) obtained for both extant cyanobacteria and fossil trilobites produce signatures readily distinguishing them from meteorite targets. When compared to elemental abundance signatures for extant cyanobacteria Orgueil structures exhibit decreased abundances for C6, N7, Na11, All3, P15, Cl17, K19, Ca20 and increases in Mg12, S16, Fe26. Diatoms and silicified portions of cyanobacterial sheaths exhibiting high levels of silicon and correspondingly low levels of carbon cluster more closely with terrestrial fossils than with extant cyanobacteria. Compression indices verify that variations in random and redundant textural patterns between perceived forms and the background matrix contribute significantly to morphological visual identification. The results provide a quantitative probabilistic methodology for discriminating putatitive fossils from the surrounding rock matrix and &om extant organisms using both structural and chemical information. The techniques described appear applicable to the geobiological analysis of meteoritic samples or in situ exploration of the Mars regolith. Keywords: cyanobacteria, microfossils, Mars, elemental abundances, complexity analysis, multifactor analysis, principal component analysis, hierarchical cluster analysis, artificial neural networks, paleo-biosignatures

  4. The seven sisters DANCe. III. Projected spatial distribution

    NASA Astrophysics Data System (ADS)

    Olivares, J.; Moraux, E.; Sarro, L. M.; Bouy, H.; Berihuete, A.; Barrado, D.; Huelamo, N.; Bertin, E.; Bouvier, J.

    2018-04-01

    Context. Membership analyses of the DANCe and Tycho + DANCe data sets provide the largest and least contaminated sample of Pleiades candidate members to date. Aims: We aim at reassessing the different proposals for the number surface density of the Pleiades in the light of the new and most complete list of candidate members, and inferring the parameters of the most adequate model. Methods: We compute the Bayesian evidence and Bayes Factors for variations of the classical radial models. These include elliptical symmetry, and luminosity segregation. As a by-product of the model comparison, we obtain posterior distributions for each set of model parameters. Results: We find that the model comparison results depend on the spatial extent of the region used for the analysis. For a circle of 11.5 parsecs around the cluster centre (the most homogeneous and complete region), we find no compelling reason to abandon King's model, although the Generalised King model introduced here has slightly better fitting properties. Furthermore, we find strong evidence against radially symmetric models when compared to the elliptic extensions. Finally, we find that including mass segregation in the form of luminosity segregation in the J band is strongly supported in all our models. Conclusions: We have put the question of the projected spatial distribution of the Pleiades cluster on a solid probabilistic framework, and inferred its properties using the most exhaustive and least contaminated list of Pleiades candidate members available to date. Our results suggest however that this sample may still lack about 20% of the expected number of cluster members. Therefore, this study should be revised when the completeness and homogeneity of the data can be extended beyond the 11.5 parsecs limit. Such a study will allow for more precise determination of the Pleiades spatial distribution, its tidal radius, ellipticity, number of objects and total mass.

  5. Clustering of dietary intake and sedentary behavior in 2-year-old children.

    PubMed

    Gubbels, Jessica S; Kremers, Stef P J; Stafleu, Annette; Dagnelie, Pieter C; de Vries, Sanne I; de Vries, Nanne K; Thijs, Carel

    2009-08-01

    To examine clustering of energy balance-related behaviors (EBRBs) in young children. This is crucial because lifestyle habits are formed at an early age and track in later life. This study is the first to examine EBRB clustering in children as young as 2 years. Cross-sectional data originated from the Child, Parent and Health: Lifestyle and Genetic Constitution (KOALA) Birth Cohort Study. Parents of 2578 2-year-old children completed a questionnaire. Correlation analyses, principal component analyses, and linear regression analyses were performed to examine clustering of EBRBs. We found modest but consistent correlations in EBRBs. Two clusters emerged: a "sedentary-snacking cluster" and a "fiber cluster." Television viewing clustered with computer use and unhealthy dietary behaviors. Children who frequently consumed vegetables also consumed fruit and brown bread more often and white bread less often. Lower maternal education and maternal obesity were associated with high scores on the sedentary-snacking cluster, whereas higher educational level was associated with high fiber cluster scores. Obesity-prone behavioral clusters are already visible in 2-year-old children and are related to maternal characteristics. The findings suggest that obesity prevention should apply an integrated approach to physical activity and dietary intake in early childhood.

  6. Assessing an ensemble docking-based virtual screening strategy for kinase targets by considering protein flexibility.

    PubMed

    Tian, Sheng; Sun, Huiyong; Pan, Peichen; Li, Dan; Zhen, Xuechu; Li, Youyong; Hou, Tingjun

    2014-10-27

    In this study, to accommodate receptor flexibility, based on multiple receptor conformations, a novel ensemble docking protocol was developed by using the naïve Bayesian classification technique, and it was evaluated in terms of the prediction accuracy of docking-based virtual screening (VS) of three important targets in the kinase family: ALK, CDK2, and VEGFR2. First, for each target, the representative crystal structures were selected by structural clustering, and the capability of molecular docking based on each representative structure to discriminate inhibitors from non-inhibitors was examined. Then, for each target, 50 ns molecular dynamics (MD) simulations were carried out to generate an ensemble of the conformations, and multiple representative structures/snapshots were extracted from each MD trajectory by structural clustering. On average, the representative crystal structures outperform the representative structures extracted from MD simulations in terms of the capabilities to separate inhibitors from non-inhibitors. Finally, by using the naïve Bayesian classification technique, an integrated VS strategy was developed to combine the prediction results of molecular docking based on different representative conformations chosen from crystal structures and MD trajectories. It was encouraging to observe that the integrated VS strategy yields better performance than the docking-based VS based on any single rigid conformation. This novel protocol may provide an improvement over existing strategies to search for more diverse and promising active compounds for a target of interest.

  7. A new species of Tambja (Mollusca, Gastropoda, Nudibranchia) from the Mediterranean Sea: description of the first species of the genus from the Balearic Islands and Malta

    NASA Astrophysics Data System (ADS)

    Domínguez, M.; Pola, M.; Ramón, M.

    2015-06-01

    A new species of polycerid nudibranchs of the genus Tambja is described from Mallorca Island (Spain) and Malta. So far, only two species of Tambja had been recorded in the Mediterranean Sea with a distribution limited to southern Spain. With Tambja mediterranea sp. nov., the distribution of the genus in the Mediterranean Sea is extended, and the new species represents the first occurrence of Tambja at the Balearic Islands and Malta. Externally, the new species is mainly characterized by having ground orange-red colour, dorsum covered with rounded whitish tubercles, rhinophores red with whitish tips and three gill branches with orange-reddish rachis and whitish branches. In the present paper, external and internal features of T. mediterranea are described and compared with other species of the genus, especially with its most similar species, T. limaciformis. Additionally, phylogenetic analyses (Bayesian and maximum likelihood) based on mitochondrial sequences (COI) show that T. mediterranea sp. nov. is sister to T. divae and that both species cluster together with T. limaciformis and T. amakusana with the maximum support.

  8. Multi-Phase US Spread and Habitat Switching of a Post-Columbian Invasive, Sorghum halepense

    PubMed Central

    Barney, Jacob N.; Atwater, Daniel Z.; Pederson, Gary A.; Pederson, Jeffrey F.; Chandler, J. Mike; Cox, T. Stan; Cox, Sheila; Dotray, Peter; Kopec, David; Smith, Steven E.; Schroeder, Jill; Wright, Steven D.; Jiao, Yuannian; Kong, Wenqian; Goff, Valorie; Auckland, Susan; Rainville, Lisa K.; Pierce, Gary J.; Lemke, Cornelia; Compton, Rosana; Phillips, Christine; Kerr, Alexandra; Mettler, Matthew; Paterson, Andrew H.

    2016-01-01

    Johnsongrass (Sorghum halepense) is a striking example of a post-Columbian founder event. This natural experiment within ecological time-scales provides a unique opportunity for understanding patterns of continent-wide genetic diversity following range expansion. Microsatellite markers were used for population genetic analyses including leaf-optimized Neighbor-Joining tree, pairwise FST, mismatch analysis, principle coordinate analysis, Tajima’s D, Fu’s F and Bayesian clusterings of population structure. Evidence indicates two geographically distant introductions of divergent genotypes, which spread across much of the US in <200 years. Based on geophylogeny, gene flow patterns can be inferred to have involved five phases. Centers of genetic diversity have shifted from two introduction sites separated by ~2000 miles toward the middle of the range, consistent with admixture between genotypes from the respective introductions. Genotyping provides evidence for a ‘habitat switch’ from agricultural to non-agricultural systems and may contribute to both Johnsongrass ubiquity and aggressiveness. Despite lower and more structured diversity at the invasion front, Johnsongrass continues to advance northward into cooler and drier habitats. Association genetic approaches may permit identification of alleles contributing to the habitat switch or other traits important to weed/invasive management and/or crop improvement. PMID:27755565

  9. Sea snakes rarely venture far from home

    PubMed Central

    Lukoschek, Vimoksalehi; Shine, Richard

    2012-01-01

    The extent to which populations are connected by dispersal influences all aspects of their biology and informs the spatial scale of optimal conservation strategies. Obtaining direct estimates of dispersal is challenging, particularly in marine systems, with studies typically relying on indirect approaches to evaluate connectivity. To overcome this challenge, we combine information from an eight-year mark-recapture study with high-resolution genetic data to demonstrate extremely low dispersal and restricted gene flow at small spatial scales for a large, potentially mobile marine vertebrate, the turtleheaded sea snake (Emydocephalus annulatus). Our mark-recapture study indicated that adjacent bays in New Caledonia (<1.15 km apart) contain virtually separate sea snake populations. Sea snakes could easily swim between bays but rarely do so. Of 817 recaptures of marked snakes, only two snakes had moved between bays. We genotyped 136 snakes for 11 polymorphic microsatellite loci and found statistically significant genetic divergence between the two bays (FST= 0.008, P < 0.01). Bayesian clustering analyses detected low mixed ancestry within bays and genetic relatedness coefficients were higher, on average, within than between bays. Our results indicate that turtleheaded sea snakes rarely venture far from home, which has strong implications for their ecology, evolution, and conservation. PMID:22833788

  10. Genetic and morphological characterisation of the Ankole Longhorn cattle in the African Great Lakes region

    PubMed Central

    Ndumu, Deo B; Baumung, Roswitha; Hanotte, Olivier; Wurzinger, Maria; Okeyo, Mwai A; Jianlin, Han; Kibogo, Harrison; Sölkner, Johann

    2008-01-01

    The study investigated the population structure, diversity and differentiation of almost all of the ecotypes representing the African Ankole Longhorn cattle breed on the basis of morphometric (shape and size), genotypic and spatial distance data. Twentyone morphometric measurements were used to describe the morphology of 439 individuals from 11 sub-populations located in five countries around the Great Lakes region of central and eastern Africa. Additionally, 472 individuals were genotyped using 15 DNA microsatellites. Femoral length, horn length, horn circumference, rump height, body length and fore-limb circumference showed the largest differences between regions. An overall FST index indicated that 2.7% of the total genetic variation was present among sub-populations. The least differentiation was observed between the two sub-populations of Mbarara south and Luwero in Uganda, while the highest level of differentiation was observed between the Mugamba in Burundi and Malagarasi in Tanzania. An estimated membership of four for the inferred clusters from a model-based Bayesian approach was obtained. Both analyses on distance-based and model-based methods consistently isolated the Mugamba sub-population in Burundi from the others. PMID:18694545

  11. Argentine Population Genetic Structure: Large Variance in Amerindian Contribution

    PubMed Central

    Seldin, Michael F.; Tian, Chao; Shigeta, Russell; Scherbarth, Hugo R.; Silva, Gabriel; Belmont, John W.; Kittles, Rick; Gamron, Susana; Allevi, Alberto; Palatnik, Simon A.; Alvarellos, Alejandro; Paira, Sergio; Caprarulo, Cesar; Guillerón, Carolina; Catoggio, Luis J.; Prigione, Cristina; Berbotto, Guillermo A.; García, Mercedes A.; Perandones, Carlos E.; Pons-Estel, Bernardo A.; Alarcon-Riquelme, Marta E.

    2011-01-01

    Argentine population genetic structure was examined using a set of 78 ancestry informative markers (AIMs) to assess the contributions of European, Amerindian, and African ancestry in 94 individuals members of this population. Using the Bayesian clustering algorithm STRUCTURE, the mean European contribution was 78%, the Amerindian contribution was 19.4%, and the African contribution was 2.5%. Similar results were found using weighted least mean square method: European, 80.2%; Amerindian, 18.1%; and African, 1.7%. Consistent with previous studies the current results showed very few individuals (four of 94) with greater than 10% African admixture. Notably, when individual admixture was examined, the Amerindian and European admixture showed a very large variance and individual Amerindian contribution ranged from 1.5 to 84.5% in the 94 individual Argentine subjects. These results indicate that admixture must be considered when clinical epidemiology or case control genetic analyses are studied in this population. Moreover, the current study provides a set of informative SNPs that can be used to ascertain or control for this potentially hidden stratification. In addition, the large variance in admixture proportions in individual Argentine subjects shown by this study suggests that this population is appropriate for future admixture mapping studies. PMID:17177183

  12. Spatial pattern and temporal trend of mortality due to tuberculosis 10

    PubMed Central

    de Queiroz, Ana Angélica Rêgo; Berra, Thaís Zamboni; Garcia, Maria Concebida da Cunha; Popolin, Marcela Paschoal; Belchior, Aylana de Souza; Yamamura, Mellina; dos Santos, Danielle Talita; Arroyo, Luiz Henrique; Arcêncio, Ricardo Alexandre

    2018-01-01

    ABSTRACT Objectives: To describe the epidemiological profile of mortality due to tuberculosis (TB), to analyze the spatial pattern of these deaths and to investigate the temporal trend in mortality due to tuberculosis in Northeast Brazil. Methods: An ecological study based on secondary mortality data. Deaths due to TB were included in the study. Descriptive statistics were calculated and gross mortality rates were estimated and smoothed by the Local Empirical Bayesian Method. Prais-Winsten’s regression was used to analyze the temporal trend in the TB mortality coefficients. The Kernel density technique was used to analyze the spatial distribution of TB mortality. Results: Tuberculosis was implicated in 236 deaths. The burden of tuberculosis deaths was higher amongst males, single people and people of mixed ethnicity, and the mean age at death was 51 years. TB deaths were clustered in the East, West and North health districts, and the tuberculosis mortality coefficient remained stable throughout the study period. Conclusions: Analyses of the spatial pattern and temporal trend in mortality revealed that certain areas have higher TB mortality rates, and should therefore be prioritized in public health interventions targeting the disease. PMID:29742272

  13. Further consideration of the phylogeny of some "traditional" heterotrichs (Protista, Ciliophora) of uncertain affinities, based on new sequences of the small subunit rRNA gene.

    PubMed

    Miao, Miao; Song, Weibo; Clamp, John C; Al-Rasheid, Khaled A S; Al-Khedhairy, Abdulaziz A; Al-Arifi, Saud

    2009-01-01

    The systematic relationships and taxonomic positions of the traditional heterotrich genera Condylostentor, Climacostomum, Fabrea, Folliculina, Peritromus, and Condylostoma, as well as the licnophorid genus Licnophora, were re-examined using new data from sequences of the gene coding for small subunit ribosomal RNA. Trees constructed using distance-matrix, Bayesian inference, and maximum-parsimony methods all showed the following relationships: (1) the "traditional" heterotrichs consist of several paraphyletic groups, including the current classes Heterotrichea, Armophorea and part of the Spirotrichea; (2) the class Heterotrichea was confirmed as a monophyletic assemblage based on our analyses of 31 taxa, and the genus Peritromus was demonstrated to be a peripheral group; (3) the genus Licnophora occupied an isolated branch on one side of the deepest divergence in the subphylum Intramacronucleata and was closely affiliated with spirotrichs, armophoreans, and clevelandellids; (4) Condylostentor, a recently defined genus with several truly unique morphological features, is more closely related to Condylostoma than to Stentor; (5) Folliculina, Eufolliculina, and Maristentor always clustered together with high bootstrap support; and (6) Climacostomum occupied a paraphyletic position distant from Fabrea, showing a close relationship with Condylostomatidae and Chattonidiidae despite of modest support.

  14. Sea snakes rarely venture far from home.

    PubMed

    Lukoschek, Vimoksalehi; Shine, Richard

    2012-06-01

    The extent to which populations are connected by dispersal influences all aspects of their biology and informs the spatial scale of optimal conservation strategies. Obtaining direct estimates of dispersal is challenging, particularly in marine systems, with studies typically relying on indirect approaches to evaluate connectivity. To overcome this challenge, we combine information from an eight-year mark-recapture study with high-resolution genetic data to demonstrate extremely low dispersal and restricted gene flow at small spatial scales for a large, potentially mobile marine vertebrate, the turtleheaded sea snake (Emydocephalus annulatus). Our mark-recapture study indicated that adjacent bays in New Caledonia (<1.15 km apart) contain virtually separate sea snake populations. Sea snakes could easily swim between bays but rarely do so. Of 817 recaptures of marked snakes, only two snakes had moved between bays. We genotyped 136 snakes for 11 polymorphic microsatellite loci and found statistically significant genetic divergence between the two bays (F(ST)= 0.008, P < 0.01). Bayesian clustering analyses detected low mixed ancestry within bays and genetic relatedness coefficients were higher, on average, within than between bays. Our results indicate that turtleheaded sea snakes rarely venture far from home, which has strong implications for their ecology, evolution, and conservation.

  15. Description and Phylogeny of Urostyla grandis wiackowskii subsp. nov. (Ciliophora, Hypotricha) from an Estuarine Mangrove in Brazil.

    PubMed

    Paiva, Thiago da Silva; Shao, Chen; Fernandes, Noemi Mendes; Borges, Bárbara do Nascimento; da Silva-Neto, Inácio Domingos

    2016-01-01

    Interphase specimens, aspects of physiological reorganization and divisional morphogenesis were investigated in a strain of a hypotrichous ciliate highly similar to Urostyla grandis Ehrenberg, (type species of Urostyla), collected from a mangrove area in the estuary of the Paraíba do Sul river (Rio de Janeiro, Brazil). The results revealed that albeit interphase specimens match with the known morphologic variability in U. grandis, morphogenetic processes have conspicuous differences. Parental adoral zone is entirely renewed during morphogenesis, and marginal cirri exhibit a unique combination of developmental modes, in which left marginal rows originate from multiple anlagen arising from innermost left marginal cirral row, whereas right marginal ciliature originates from individual within-row anlagen. Based on such characteristics, a new subspecies, namely U. grandis wiackowskii subsp. nov. is proposed, and consequently, U. grandis grandis Ehrenberg, stat. nov. is established. Bayesian and maximum-likelihood analyses of the 18S rDNA unambiguously placed U. grandis wiackowskii as adelphotaxon of a cluster formed by other U. grandis sequences. The implications of such findings to the systematics of Urostyla are discussed. © 2015 The Author(s) Journal of Eukaryotic Microbiology © 2015 International Society of Protistologists.

  16. Genetic divergence in nuclear genomes between populations of Fagus crenata along the Japan Sea and Pacific sides of Japan.

    PubMed

    Hiraoka, Koichi; Tomaru, Nobuhiro

    2009-05-01

    Genetic diversity and structure in Fagus crenata were studied by analyzing 14 nuclear microsatellite loci in 23 populations distributed throughout the species' range. Although population differentiation was very low (F (ST) = 0.027; R (ST) = 0.041), both neighbor-joining tree and Bayesian clustering analyses provided clear evidence of genetic divergence between populations along the Japan Sea (Japan Sea lineage) and Pacific (Pacific lineage) sides of Japan, indicating that physical barriers to migration and gene flow, notably the mountain ranges separating the populations along the Japan Sea and Pacific sides, have promoted genetic divergence between these populations. The two lineages of the nuclear genome are generally consistent with those of the chloroplast genome detected in a previous study, with several discrepancies between the two genomes. Within-population genetic diversity was generally very high (average H (E) = 0.839), but decreased in a clinal fashion from southwest to northeast, largely among populations of the Japan Sea lineage. This geographical gradient may have resulted from the late-glacial and postglacial recolonization to the northeast, which led to a loss of within-population genetic diversity due to cumulative founder effects.

  17. A program for the Bayesian Neural Network in the ROOT framework

    NASA Astrophysics Data System (ADS)

    Zhong, Jiahang; Huang, Run-Sheng; Lee, Shih-Chang

    2011-12-01

    We present a Bayesian Neural Network algorithm implemented in the TMVA package (Hoecker et al., 2007 [1]), within the ROOT framework (Brun and Rademakers, 1997 [2]). Comparing to the conventional utilization of Neural Network as discriminator, this new implementation has more advantages as a non-parametric regression tool, particularly for fitting probabilities. It provides functionalities including cost function selection, complexity control and uncertainty estimation. An example of such application in High Energy Physics is shown. The algorithm is available with ROOT release later than 5.29. Program summaryProgram title: TMVA-BNN Catalogue identifier: AEJX_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJX_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: BSD license No. of lines in distributed program, including test data, etc.: 5094 No. of bytes in distributed program, including test data, etc.: 1,320,987 Distribution format: tar.gz Programming language: C++ Computer: Any computer system or cluster with C++ compiler and UNIX-like operating system Operating system: Most UNIX/Linux systems. The application programs were thoroughly tested under Fedora and Scientific Linux CERN. Classification: 11.9 External routines: ROOT package version 5.29 or higher ( http://root.cern.ch) Nature of problem: Non-parametric fitting of multivariate distributions Solution method: An implementation of Neural Network following the Bayesian statistical interpretation. Uses Laplace approximation for the Bayesian marginalizations. Provides the functionalities of automatic complexity control and uncertainty estimation. Running time: Time consumption for the training depends substantially on the size of input sample, the NN topology, the number of training iterations, etc. For the example in this manuscript, about 7 min was used on a PC/Linux with 2.0 GHz processors.

  18. Parallelized Bayesian inversion for three-dimensional dental X-ray imaging.

    PubMed

    Kolehmainen, Ville; Vanne, Antti; Siltanen, Samuli; Järvenpää, Seppo; Kaipio, Jari P; Lassas, Matti; Kalke, Martti

    2006-02-01

    Diagnostic and operational tasks based on dental radiology often require three-dimensional (3-D) information that is not available in a single X-ray projection image. Comprehensive 3-D information about tissues can be obtained by computerized tomography (CT) imaging. However, in dental imaging a conventional CT scan may not be available or practical because of high radiation dose, low-resolution or the cost of the CT scanner equipment. In this paper, we consider a novel type of 3-D imaging modality for dental radiology. We consider situations in which projection images of the teeth are taken from a few sparsely distributed projection directions using the dentist's regular (digital) X-ray equipment and the 3-D X-ray attenuation function is reconstructed. A complication in these experiments is that the reconstruction of the 3-D structure based on a few projection images becomes an ill-posed inverse problem. Bayesian inversion is a well suited framework for reconstruction from such incomplete data. In Bayesian inversion, the ill-posed reconstruction problem is formulated in a well-posed probabilistic form in which a priori information is used to compensate for the incomplete information of the projection data. In this paper we propose a Bayesian method for 3-D reconstruction in dental radiology. The method is partially based on Kolehmainen et al. 2003. The prior model for dental structures consist of a weighted l1 and total variation (TV)-prior together with the positivity prior. The inverse problem is stated as finding the maximum a posteriori (MAP) estimate. To make the 3-D reconstruction computationally feasible, a parallelized version of an optimization algorithm is implemented for a Beowulf cluster computer. The method is tested with projection data from dental specimens and patient data. Tomosynthetic reconstructions are given as reference for the proposed method.

  19. Suggestions for presenting the results of data analyses

    USGS Publications Warehouse

    Anderson, David R.; Link, William A.; Johnson, Douglas H.; Burnham, Kenneth P.

    2001-01-01

    We give suggestions for the presentation of research results from frequentist, information-theoretic, and Bayesian analysis paradigms, followed by several general suggestions. The information-theoretic and Bayesian methods offer alternative approaches to data analysis and inference compared to traditionally used methods. Guidance is lacking on the presentation of results under these alternative procedures and on nontesting aspects of classical frequentists methods of statistical analysis. Null hypothesis testing has come under intense criticism. We recommend less reporting of the results of statistical tests of null hypotheses in cases where the null is surely false anyway, or where the null hypothesis is of little interest to science or management.

  20. Modelling Inter-relationships among water, governance, human development variables in developing countries with Bayesian networks.

    NASA Astrophysics Data System (ADS)

    Dondeynaz, C.; Lopez-Puga, J.; Carmona-Moreno, C.

    2012-04-01

    Improving Water and Sanitation Services (WSS), being a complex and interdisciplinary issue, passes through collaboration and coordination of different sectors (environment, health, economic activities, governance, and international cooperation). This inter-dependency has been recognised with the adoption of the "Integrated Water Resources Management" principles that push for the integration of these various dimensions involved in WSS delivery to ensure an efficient and sustainable management. The understanding of these interrelations appears as crucial for decision makers in the water sector in particular in developing countries where WSS still represent an important leverage for livelihood improvement. In this framework, the Joint Research Centre of the European Commission has developed a coherent database (WatSan4Dev database) containing 29 indicators from environmental, socio-economic, governance and financial aid flows data focusing on developing countries (Celine et al, 2011 under publication). The aim of this work is to model the WatSan4Dev dataset using probabilistic models to identify the key variables influencing or being influenced by the water supply and sanitation access levels. Bayesian Network Models are suitable to map the conditional dependencies between variables and also allows ordering variables by level of influence on the dependent variable. Separated models have been built for water supply and for sanitation because of different behaviour. The models are validated if complying with statistical criteria but either with scientific knowledge and literature. A two steps approach has been adopted to build the structure of the model; Bayesian network is first built for each thematic cluster of variables (e.g governance, agricultural pressure, or human development) keeping a detailed level for interpretation later one. A global model is then built based on significant indicators of each cluster being previously modelled. The structure of the relationships between variable are set a priori according to literature and/or experience in the field (expert knowledge). The statistical validation is verified according to error rate of classification, and the significance of the variables. Sensibility analysis has also been performed to characterise the relative influence of every single variable in the model. Once validated, the models allow the estimation of impact of each variable on the behaviour of the water supply or sanitation providing an interesting mean to test scenarios and predict variables behaviours. The choices made, methods and description of the various models, for each cluster as well as the global model for water supply and sanitation will be presented. Key results and interpretation of the relationships depicted by the models will be detailed during the conference.

  1. Bayesian estimation and use of high-throughput remote sensing indices for quantitative genetic analyses of leaf growth.

    PubMed

    Baker, Robert L; Leong, Wen Fung; An, Nan; Brock, Marcus T; Rubin, Matthew J; Welch, Stephen; Weinig, Cynthia

    2018-02-01

    We develop Bayesian function-valued trait models that mathematically isolate genetic mechanisms underlying leaf growth trajectories by factoring out genotype-specific differences in photosynthesis. Remote sensing data can be used instead of leaf-level physiological measurements. Characterizing the genetic basis of traits that vary during ontogeny and affect plant performance is a major goal in evolutionary biology and agronomy. Describing genetic programs that specifically regulate morphological traits can be complicated by genotypic differences in physiological traits. We describe the growth trajectories of leaves using novel Bayesian function-valued trait (FVT) modeling approaches in Brassica rapa recombinant inbred lines raised in heterogeneous field settings. While frequentist approaches estimate parameter values by treating each experimental replicate discretely, Bayesian models can utilize information in the global dataset, potentially leading to more robust trait estimation. We illustrate this principle by estimating growth asymptotes in the face of missing data and comparing heritabilities of growth trajectory parameters estimated by Bayesian and frequentist approaches. Using pseudo-Bayes factors, we compare the performance of an initial Bayesian logistic growth model and a model that incorporates carbon assimilation (A max ) as a cofactor, thus statistically accounting for genotypic differences in carbon resources. We further evaluate two remotely sensed spectroradiometric indices, photochemical reflectance (pri2) and MERIS Terrestrial Chlorophyll Index (mtci) as covariates in lieu of A max , because these two indices were genetically correlated with A max across years and treatments yet allow much higher throughput compared to direct leaf-level gas-exchange measurements. For leaf lengths in uncrowded settings, including A max improves model fit over the initial model. The mtci and pri2 indices also outperform direct A max measurements. Of particular importance for evolutionary biologists and plant breeders, hierarchical Bayesian models estimating FVT parameters improve heritabilities compared to frequentist approaches.

  2. Analysis of capture-recapture models with individual covariates using data augmentation

    USGS Publications Warehouse

    Royle, J. Andrew

    2009-01-01

    I consider the analysis of capture-recapture models with individual covariates that influence detection probability. Bayesian analysis of the joint likelihood is carried out using a flexible data augmentation scheme that facilitates analysis by Markov chain Monte Carlo methods, and a simple and straightforward implementation in freely available software. This approach is applied to a study of meadow voles (Microtus pennsylvanicus) in which auxiliary data on a continuous covariate (body mass) are recorded, and it is thought that detection probability is related to body mass. In a second example, the model is applied to an aerial waterfowl survey in which a double-observer protocol is used. The fundamental unit of observation is the cluster of individual birds, and the size of the cluster (a discrete covariate) is used as a covariate on detection probability.

  3. Glaucomatous patterns in Frequency Doubling Technology (FDT) perimetry data identified by unsupervised machine learning classifiers.

    PubMed

    Bowd, Christopher; Weinreb, Robert N; Balasubramanian, Madhusudhanan; Lee, Intae; Jang, Giljin; Yousefi, Siamak; Zangwill, Linda M; Medeiros, Felipe A; Girkin, Christopher A; Liebmann, Jeffrey M; Goldbaum, Michael H

    2014-01-01

    The variational Bayesian independent component analysis-mixture model (VIM), an unsupervised machine-learning classifier, was used to automatically separate Matrix Frequency Doubling Technology (FDT) perimetry data into clusters of healthy and glaucomatous eyes, and to identify axes representing statistically independent patterns of defect in the glaucoma clusters. FDT measurements were obtained from 1,190 eyes with normal FDT results and 786 eyes with abnormal FDT results from the UCSD-based Diagnostic Innovations in Glaucoma Study (DIGS) and African Descent and Glaucoma Evaluation Study (ADAGES). For all eyes, VIM input was 52 threshold test points from the 24-2 test pattern, plus age. FDT mean deviation was -1.00 dB (S.D. = 2.80 dB) and -5.57 dB (S.D. = 5.09 dB) in FDT-normal eyes and FDT-abnormal eyes, respectively (p<0.001). VIM identified meaningful clusters of FDT data and positioned a set of statistically independent axes through the mean of each cluster. The optimal VIM model separated the FDT fields into 3 clusters. Cluster N contained primarily normal fields (1109/1190, specificity 93.1%) and clusters G1 and G2 combined, contained primarily abnormal fields (651/786, sensitivity 82.8%). For clusters G1 and G2 the optimal number of axes were 2 and 5, respectively. Patterns automatically generated along axes within the glaucoma clusters were similar to those known to be indicative of glaucoma. Fields located farther from the normal mean on each glaucoma axis showed increasing field defect severity. VIM successfully separated FDT fields from healthy and glaucoma eyes without a priori information about class membership, and identified familiar glaucomatous patterns of loss.

  4. Revisiting the phylogeography, demography and taxonomy of the frog genus Ptychadena in the Ethiopian highlands with the use of genome-wide SNP data.

    PubMed

    Reyes-Velasco, Jacobo; Manthey, Joseph D; Bourgeois, Yann; Freilich, Xenia; Boissinot, Stéphane

    2018-01-01

    Understanding the diversification of biological lineages is central to evolutionary studies. To properly study the process of speciation, it is necessary to link micro-evolutionary studies with macro-evolutionary mechanisms. Micro-evolutionary studies require proper sampling across a taxon's range to adequately infer genetic diversity. Here we use the grass frogs of the genus Ptychadena from the Ethiopian highlands as a model to study the process of lineage diversification in this unique biodiversity hotspot. We used thousands of genome-wide SNPs obtained from double digest restriction site associated DNA sequencing (ddRAD-seq) in populations of the Ptychadena neumanni species complex from the Ethiopian highlands in order to infer their phylogenetic relationships and genetic structure, as well as to study their demographic history. Our genome-wide phylogenetic study supports the existence of approximately 13 lineages clustered into 3 species groups. Our phylogenetic and phylogeographic reconstructions suggest that those endemic lineages diversified in allopatry, and subsequently specialized to different habitats and elevations. Demographic analyses point to a continuous decrease in the population size across the majority of lineages and populations during the Pleistocene, which is consistent with a continuous period of aridification that East Africa experienced since the Pliocene. We discuss the taxonomic implications of our analyses and, in particular, we warn against the recent practice to solely use Bayesian species delimitation methods when proposing taxonomic changes.

  5. Mitochondrial DNA Detects a Complex Evolutionary History with Pleistocene Epoch Divergence for the Neotropical Malaria Vector Anopheles nuneztovari Sensu Lato

    PubMed Central

    Scarpassa, Vera Margarete; Conn, Jan E.

    2011-01-01

    Cryptic species and lineages characterize Anopheles nuneztovari s.l. Gabaldón, an important malaria vector in South America. We investigated the phylogeographic structure across the range of this species with cytochrome oxidase subunit I (COI) mitochondrial DNA sequences to estimate the number of clades and levels of divergence. Bayesian and maximum-likelihood phylogenetic analyses detected four groups distributed in two major monophyletic clades (I and II). Samples from the Amazon Basin were clustered in clade I, as were subclades II-A and II-B, whereas those from Bolivia/Colombia/Venezuela were restricted to one basal subclade (II-C). These data, together with a statistical parsimony network, confirm results of previous studies that An. nuneztovari is a species complex consisting of at least two cryptic taxa, one occurring in Colombia and Venezuela and the another occurring in the Amazon Basin. These data also suggest that additional incipient species may exist in the Amazon Basin. Divergence time and expansion tests suggested that these groups separated and expanded in the Pleistocene Epoch. In addition, the COI sequences clearly separated An. nuneztovari s.l. from the closely related species An. dunhami Causey, and three new records are reported for An. dunhami in Amazonian Brazil. These findings are relevant for vector control programs in areas where both species occur. Our analyses support dynamic geologic and landscape changes in northern South America, and infer particularly active divergence during the Pleistocene Epoch for New World anophelines. PMID:22049039

  6. Molecular phylogenetic and dating analysis of pierid butterfly species using complete mitochondrial genomes.

    PubMed

    Cao, Y; Hao, J S; Sun, X Y; Zheng, B; Yang, Q

    2016-12-02

    Pieridae is a butterfly family whose evolutionary history is poorly understood. Due to the difficulties in identifying morphological synapomorphies within the group and the scarcity of the fossil records, only a few studies on higher phylogeny of Pieridae have been reported to date. In this study, we describe the complete mitochondrial genomes of four pierid butterfly species (Aporia martineti, Aporia hippia, Aporia bieti, and Mesapia peloria), in order to better characterize the pierid butterfly mitogenomes and perform the phylogenetic analyses using all available mitogenomic sequence data (13PCGs, rRNAs, and tRNAs) from the 18 pierid butterfly species comprising the three main subfamilies (Dismorphiinae, Coliadinae and Pierinae). Our analysis shows that the four new mitogenomes share similar features with other known pierid mitogenomes in gene order and organization. Phylogenetic analyses by maximum likelihood and Bayesian inference show that the pierid higher-level relationship is: Dismorphiinae + (Coliadinae + Pierinae), which corroborates the results of some previous molecular and morphological studies. However, we found that the Hebomoia and Anthocharis make a sister group, supporting the traditional tribe Anthocharidini; in addition, the Mesapia peloria was shown to be clustered within the Aporia group, suggesting that the genus Mesapia should be reduced to the taxonomic status of subgenus. Our molecular dating analysis indicates that the family Pieridae began to diverge during the Late Cretaceous about 92 million years ago (mya), while the subfamily Pierinae diverged from the Coliadinae at about 86 mya (Late Cretaceous).

  7. Revisiting the phylogeography, demography and taxonomy of the frog genus Ptychadena in the Ethiopian highlands with the use of genome-wide SNP data

    PubMed Central

    Manthey, Joseph D.; Bourgeois, Yann; Freilich, Xenia; Boissinot, Stéphane

    2018-01-01

    Understanding the diversification of biological lineages is central to evolutionary studies. To properly study the process of speciation, it is necessary to link micro-evolutionary studies with macro-evolutionary mechanisms. Micro-evolutionary studies require proper sampling across a taxon’s range to adequately infer genetic diversity. Here we use the grass frogs of the genus Ptychadena from the Ethiopian highlands as a model to study the process of lineage diversification in this unique biodiversity hotspot. We used thousands of genome-wide SNPs obtained from double digest restriction site associated DNA sequencing (ddRAD-seq) in populations of the Ptychadena neumanni species complex from the Ethiopian highlands in order to infer their phylogenetic relationships and genetic structure, as well as to study their demographic history. Our genome-wide phylogenetic study supports the existence of approximately 13 lineages clustered into 3 species groups. Our phylogenetic and phylogeographic reconstructions suggest that those endemic lineages diversified in allopatry, and subsequently specialized to different habitats and elevations. Demographic analyses point to a continuous decrease in the population size across the majority of lineages and populations during the Pleistocene, which is consistent with a continuous period of aridification that East Africa experienced since the Pliocene. We discuss the taxonomic implications of our analyses and, in particular, we warn against the recent practice to solely use Bayesian species delimitation methods when proposing taxonomic changes. PMID:29389966

  8. Computational Psychometrics for the Measurement of Collaborative Problem Solving Skills

    PubMed Central

    Polyak, Stephen T.; von Davier, Alina A.; Peterschmidt, Kurt

    2017-01-01

    This paper describes a psychometrically-based approach to the measurement of collaborative problem solving skills, by mining and classifying behavioral data both in real-time and in post-game analyses. The data were collected from a sample of middle school children who interacted with a game-like, online simulation of collaborative problem solving tasks. In this simulation, a user is required to collaborate with a virtual agent to solve a series of tasks within a first-person maze environment. The tasks were developed following the psychometric principles of Evidence Centered Design (ECD) and are aligned with the Holistic Framework developed by ACT. The analyses presented in this paper are an application of an emerging discipline called computational psychometrics which is growing out of traditional psychometrics and incorporates techniques from educational data mining, machine learning and other computer/cognitive science fields. In the real-time analysis, our aim was to start with limited knowledge of skill mastery, and then demonstrate a form of continuous Bayesian evidence tracing that updates sub-skill level probabilities as new conversation flow event evidence is presented. This is performed using Bayes' rule and conversation item conditional probability tables. The items are polytomous and each response option has been tagged with a skill at a performance level. In our post-game analysis, our goal was to discover unique gameplay profiles by performing a cluster analysis of user's sub-skill performance scores based on their patterns of selected dialog responses. PMID:29238314

  9. Computational Psychometrics for the Measurement of Collaborative Problem Solving Skills.

    PubMed

    Polyak, Stephen T; von Davier, Alina A; Peterschmidt, Kurt

    2017-01-01

    This paper describes a psychometrically-based approach to the measurement of collaborative problem solving skills, by mining and classifying behavioral data both in real-time and in post-game analyses. The data were collected from a sample of middle school children who interacted with a game-like, online simulation of collaborative problem solving tasks. In this simulation, a user is required to collaborate with a virtual agent to solve a series of tasks within a first-person maze environment. The tasks were developed following the psychometric principles of Evidence Centered Design (ECD) and are aligned with the Holistic Framework developed by ACT. The analyses presented in this paper are an application of an emerging discipline called computational psychometrics which is growing out of traditional psychometrics and incorporates techniques from educational data mining, machine learning and other computer/cognitive science fields. In the real-time analysis, our aim was to start with limited knowledge of skill mastery, and then demonstrate a form of continuous Bayesian evidence tracing that updates sub-skill level probabilities as new conversation flow event evidence is presented. This is performed using Bayes' rule and conversation item conditional probability tables. The items are polytomous and each response option has been tagged with a skill at a performance level. In our post-game analysis, our goal was to discover unique gameplay profiles by performing a cluster analysis of user's sub-skill performance scores based on their patterns of selected dialog responses.

  10. Bayesian informative dropout model for longitudinal binary data with random effects using conditional and joint modeling approaches.

    PubMed

    Chan, Jennifer S K

    2016-05-01

    Dropouts are common in longitudinal study. If the dropout probability depends on the missing observations at or after dropout, this type of dropout is called informative (or nonignorable) dropout (ID). Failure to accommodate such dropout mechanism into the model will bias the parameter estimates. We propose a conditional autoregressive model for longitudinal binary data with an ID model such that the probabilities of positive outcomes as well as the drop-out indicator in each occasion are logit linear in some covariates and outcomes. This model adopting a marginal model for outcomes and a conditional model for dropouts is called a selection model. To allow for the heterogeneity and clustering effects, the outcome model is extended to incorporate mixture and random effects. Lastly, the model is further extended to a novel model that models the outcome and dropout jointly such that their dependency is formulated through an odds ratio function. Parameters are estimated by a Bayesian approach implemented using the user-friendly Bayesian software WinBUGS. A methadone clinic dataset is analyzed to illustrate the proposed models. Result shows that the treatment time effect is still significant but weaker after allowing for an ID process in the data. Finally the effect of drop-out on parameter estimates is evaluated through simulation studies. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Socio-ecological factors and hand, foot and mouth disease in dry climate regions: a Bayesian spatial approach in Gansu, China

    NASA Astrophysics Data System (ADS)

    Gou, Faxiang; Liu, Xinfeng; Ren, Xiaowei; Liu, Dongpeng; Liu, Haixia; Wei, Kongfu; Yang, Xiaoting; Cheng, Yao; Zheng, Yunhe; Jiang, Xiaojuan; Li, Juansheng; Meng, Lei; Hu, Wenbiao

    2017-01-01

    The influence of socio-ecological factors on hand, foot and mouth disease (HFMD) were explored in this study using Bayesian spatial modeling and spatial patterns identified in dry regions of Gansu, China. Notified HFMD cases and socio-ecological data were obtained from the China Information System for Disease Control and Prevention, Gansu Yearbook and Gansu Meteorological Bureau. A Bayesian spatial conditional autoregressive model was used to quantify the effects of socio-ecological factors on the HFMD and explore spatial patterns, with the consideration of its socio-ecological effects. Our non-spatial model suggests temperature (relative risk (RR) 1.15, 95 % CI 1.01-1.31), GDP per capita (RR 1.19, 95 % CI 1.01-1.39) and population density (RR 1.98, 95 % CI 1.19-3.17) to have a significant effect on HFMD transmission. However, after controlling for spatial random effects, only temperature (RR 1.25, 95 % CI 1.04-1.53) showed significant association with HFMD. The spatial model demonstrates temperature to play a major role in the transmission of HFMD in dry regions. Estimated residual variation after taking into account the socio-ecological variables indicated that high incidences of HFMD were mainly clustered in the northwest of Gansu. And, spatial structure showed a unique distribution after taking account of socio-ecological effects.

  12. Individual participant data meta-analyses should not ignore clustering

    PubMed Central

    Abo-Zaid, Ghada; Guo, Boliang; Deeks, Jonathan J.; Debray, Thomas P.A.; Steyerberg, Ewout W.; Moons, Karel G.M.; Riley, Richard David

    2013-01-01

    Objectives Individual participant data (IPD) meta-analyses often analyze their IPD as if coming from a single study. We compare this approach with analyses that rather account for clustering of patients within studies. Study Design and Setting Comparison of effect estimates from logistic regression models in real and simulated examples. Results The estimated prognostic effect of age in patients with traumatic brain injury is similar, regardless of whether clustering is accounted for. However, a family history of thrombophilia is found to be a diagnostic marker of deep vein thrombosis [odds ratio, 1.30; 95% confidence interval (CI): 1.00, 1.70; P = 0.05] when clustering is accounted for but not when it is ignored (odds ratio, 1.06; 95% CI: 0.83, 1.37; P = 0.64). Similarly, the treatment effect of nicotine gum on smoking cessation is severely attenuated when clustering is ignored (odds ratio, 1.40; 95% CI: 1.02, 1.92) rather than accounted for (odds ratio, 1.80; 95% CI: 1.29, 2.52). Simulations show models accounting for clustering perform consistently well, but downwardly biased effect estimates and low coverage can occur when ignoring clustering. Conclusion Researchers must routinely account for clustering in IPD meta-analyses; otherwise, misleading effect estimates and conclusions may arise. PMID:23651765

  13. Analysis of phase II methodologies for single-arm clinical trials with multiple endpoints in rare cancers: An example in Ewing’s sarcoma

    PubMed Central

    Dutton, P; Love, SB; Billingham, L; Hassan, AB

    2016-01-01

    Trials run in either rare diseases, such as rare cancers, or rare sub-populations of common diseases are challenging in terms of identifying, recruiting and treating sufficient patients in a sensible period. Treatments for rare diseases are often designed for other disease areas and then later proposed as possible treatments for the rare disease after initial phase I testing is complete. To ensure the trial is in the best interests of the patient participants, frequent interim analyses are needed to force the trial to stop promptly if the treatment is futile or toxic. These non-definitive phase II trials should also be stopped for efficacy to accelerate research progress if the treatment proves to be particularly promising. In this paper, we review frequentist and Bayesian methods that have been adapted to incorporate two binary endpoints and frequent interim analyses. The Eurosarc Trial of Linsitinib in advanced Ewing Sarcoma (LINES) is used as a motivating example and provides a suitable platform to compare these approaches. The Bayesian approach provides greater design flexibility, but does not provide additional value over the frequentist approaches in a single trial setting when the prior is non-informative. However, Bayesian designs are able to borrow from any previous experience, using prior information to improve efficiency. PMID:27587590

  14. Bayesian assessment of overtriage and undertriage at a level I trauma centre.

    PubMed

    DiDomenico, Paul B; Pietzsch, Jan B; Paté-Cornell, M Elisabeth

    2008-07-13

    We analysed the trauma triage system at a specific level I trauma centre to assess rates of over- and undertriage and to support recommendations for system improvements. The triage process is designed to estimate the severity of patient injury and allocate resources accordingly, with potential errors of overestimation (overtriage) consuming excess resources and underestimation (undertriage) potentially leading to medical errors.We first modelled the overall trauma system using risk analysis methods to understand interdependencies among the actions of the participants. We interviewed six experienced trauma surgeons to obtain their expert opinion of the over- and undertriage rates occurring in the trauma centre. We then assessed actual over- and undertriage rates in a random sample of 86 trauma cases collected over a six-week period at the same centre. We employed Bayesian analysis to quantitatively combine the data with the prior probabilities derived from expert opinion in order to obtain posterior distributions. The results were estimates of overtriage and undertriage in 16.1 and 4.9% of patients, respectively. This Bayesian approach, which provides a quantitative assessment of the error rates using both case data and expert opinion, provides a rational means of obtaining a best estimate of the system's performance. The overall approach that we describe in this paper can be employed more widely to analyse complex health care delivery systems, with the objective of reduced errors, patient risk and excess costs.

  15. Defining objective clusters for rabies virus sequences using affinity propagation clustering

    PubMed Central

    Fischer, Susanne; Freuling, Conrad M.; Pfaff, Florian; Bodenhofer, Ulrich; Höper, Dirk; Fischer, Mareike; Marston, Denise A.; Fooks, Anthony R.; Mettenleiter, Thomas C.; Conraths, Franz J.; Homeier-Bachmann, Timo

    2018-01-01

    Rabies is caused by lyssaviruses, and is one of the oldest known zoonoses. In recent years, more than 21,000 nucleotide sequences of rabies viruses (RABV), from the prototype species rabies lyssavirus, have been deposited in public databases. Subsequent phylogenetic analyses in combination with metadata suggest geographic distributions of RABV. However, these analyses somewhat experience technical difficulties in defining verifiable criteria for cluster allocations in phylogenetic trees inviting for a more rational approach. Therefore, we applied a relatively new mathematical clustering algorythm named ‘affinity propagation clustering’ (AP) to propose a standardized sub-species classification utilizing full-genome RABV sequences. Because AP has the advantage that it is computationally fast and works for any meaningful measure of similarity between data samples, it has previously been applied successfully in bioinformatics, for analysis of microarray and gene expression data, however, cluster analysis of sequences is still in its infancy. Existing (516) and original (46) full genome RABV sequences were used to demonstrate the application of AP for RABV clustering. On a global scale, AP proposed four clusters, i.e. New World cluster, Arctic/Arctic-like, Cosmopolitan, and Asian as previously assigned by phylogenetic studies. By combining AP with established phylogenetic analyses, it is possible to resolve phylogenetic relationships between verifiably determined clusters and sequences. This workflow will be useful in confirming cluster distributions in a uniform transparent manner, not only for RABV, but also for other comparative sequence analyses. PMID:29357361

  16. Bayesian change-point analyses in ecology

    Treesearch

    Brian Bekcage; Lawrence Joseph; Patrick Belisle; David B. Wolfson; William J. Platt

    2007-01-01

    Ecological and biological processes can change from one state to another once a threshold has been crossed in space or time. Threshold responses to incremental changes in underlying variables can characterize diverse processes from climate change to the desertification of arid lands from overgrazing.

  17. Heuristics as Bayesian inference under extreme priors.

    PubMed

    Parpart, Paula; Jones, Matt; Love, Bradley C

    2018-05-01

    Simple heuristics are often regarded as tractable decision strategies because they ignore a great deal of information in the input data. One puzzle is why heuristics can outperform full-information models, such as linear regression, which make full use of the available information. These "less-is-more" effects, in which a relatively simpler model outperforms a more complex model, are prevalent throughout cognitive science, and are frequently argued to demonstrate an inherent advantage of simplifying computation or ignoring information. In contrast, we show at the computational level (where algorithmic restrictions are set aside) that it is never optimal to discard information. Through a formal Bayesian analysis, we prove that popular heuristics, such as tallying and take-the-best, are formally equivalent to Bayesian inference under the limit of infinitely strong priors. Varying the strength of the prior yields a continuum of Bayesian models with the heuristics at one end and ordinary regression at the other. Critically, intermediate models perform better across all our simulations, suggesting that down-weighting information with the appropriate prior is preferable to entirely ignoring it. Rather than because of their simplicity, our analyses suggest heuristics perform well because they implement strong priors that approximate the actual structure of the environment. We end by considering how new heuristics could be derived by infinitely strengthening the priors of other Bayesian models. These formal results have implications for work in psychology, machine learning and economics. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  18. 2b-RAD genotyping for population genomic studies of Chagas disease vectors: Rhodnius ecuadoriensis in Ecuador.

    PubMed

    Hernandez-Castro, Luis E; Paterno, Marta; Villacís, Anita G; Andersson, Björn; Costales, Jaime A; De Noia, Michele; Ocaña-Mayorga, Sofía; Yumiseva, Cesar A; Grijalva, Mario J; Llewellyn, Martin S

    2017-07-01

    Rhodnius ecuadoriensis is the main triatomine vector of Chagas disease, American trypanosomiasis, in Southern Ecuador and Northern Peru. Genomic approaches and next generation sequencing technologies have become powerful tools for investigating population diversity and structure which is a key consideration for vector control. Here we assess the effectiveness of three different 2b restriction site-associated DNA (2b-RAD) genotyping strategies in R. ecuadoriensis to provide sufficient genomic resolution to tease apart microevolutionary processes and undertake some pilot population genomic analyses. The 2b-RAD protocol was carried out in-house at a non-specialized laboratory using 20 R. ecuadoriensis adults collected from the central coast and southern Andean region of Ecuador, from June 2006 to July 2013. 2b-RAD sequencing data was performed on an Illumina MiSeq instrument and analyzed with the STACKS de novo pipeline for loci assembly and Single Nucleotide Polymorphism (SNP) discovery. Preliminary population genomic analyses (global AMOVA and Bayesian clustering) were implemented. Our results showed that the 2b-RAD genotyping protocol is effective for R. ecuadoriensis and likely for other triatomine species. However, only BcgI and CspCI restriction enzymes provided a number of markers suitable for population genomic analysis at the read depth we generated. Our preliminary genomic analyses detected a signal of genetic structuring across the study area. Our findings suggest that 2b-RAD genotyping is both a cost effective and methodologically simple approach for generating high resolution genomic data for Chagas disease vectors with the power to distinguish between different vector populations at epidemiologically relevant scales. As such, 2b-RAD represents a powerful tool in the hands of medical entomologists with limited access to specialized molecular biological equipment.

  19. Genetic Diversity in Introduced Golden Mussel Populations Corresponds to Vector Activity

    PubMed Central

    Ghabooli, Sara; Zhan, Aibin; Sardiña, Paula; Paolucci, Esteban; Sylvester, Francisco; Perepelizin, Pablo V.; Briski, Elizabeta; Cristescu, Melania E.; MacIsaac, Hugh J.

    2013-01-01

    We explored possible links between vector activity and genetic diversity in introduced populations of Limnoperna fortunei by characterizing the genetic structure in native and introduced ranges in Asia and South America. We surveyed 24 populations: ten in Asia and 14 in South America using the mitochondrial cytochrome c oxidase subunit I (COI) gene, as well as eight polymorphic microsatellite markers. We performed population genetics and phylogenetic analyses to investigate population genetic structure across native and introduced regions. Introduced populations in Asia exhibit higher genetic diversity (H E = 0.667–0.746) than those in South America (H E = 0.519–0.575), suggesting higher introduction effort for the former populations. We observed pronounced geographical structuring in introduced regions, as indicated by both mitochondrial and nuclear markers based on multiple genetic analyses including pairwise ФST, F ST, Bayesian clustering method, and three-dimensional factorial correspondence analyses. Pairwise F ST values within both Asia (F ST = 0.017–0.126, P = 0.000–0.009) and South America (F ST = 0.004–0.107, P = 0.000–0.721) were lower than those between continents (F ST = 0.180–0.319, P = 0.000). Fine-scale genetic structuring was also apparent among introduced populations in both Asia and South America, suggesting either multiple introductions of distinct propagules or strong post-introduction selection and demographic stochasticity. Higher genetic diversity in Asia as compared to South America is likely due to more frequent propagule transfers associated with higher shipping activities between source and donor regions within Asia. This study suggests that the intensity of human-mediated introduction vectors influences patterns of genetic diversity in non-indigenous species. PMID:23533614

  20. Fuzzy boundaries: color and gene flow patterns among parapatric lineages of the western shovel-nosed snake and taxonomic implication

    USGS Publications Warehouse

    Wood, Dustin A.; Fisher, Robert N.; Vandergast, Amy G.

    2014-01-01

    Accurate delineation of lineage diversity is increasingly important, as species distributions are becoming more reduced and threatened. During the last century, the subspecies category was often used to denote phenotypic variation within a species range and to provide a framework for understanding lineage differentiation, often considered incipient speciation. While this category has largely fallen into disuse, previously recognized subspecies often serve as important units for conservation policy and management when other information is lacking. In this study, we evaluated phenotypic subspecies hypotheses within shovel-nosed snakes on the basis of genetic data and considered how evolutionary processes such as gene flow influenced possible incongruence between phenotypic and genetic patterns. We used both traditional phylogenetic and Bayesian clustering analyses to infer range-wide genetic structure and spatially explicit analyses to detect possible boundary locations of lineage contact. Multilocus analyses supported three historically isolated groups with low to moderate levels of contemporary gene exchange. Genetic data did not support phenotypic subspecies as exclusive groups, and we detected patterns of discordance in areas where three subspecies are presumed to be in contact. Based on genetic and phenotypic evidence, we suggested that species-level diversity is underestimated in this group and we proposed that two species be recognized, Chionactis occipitalis and C. annulata. In addition, we recommend retention of two subspecific designations within C. annulata (C. a. annulata and C. a. klauberi) that reflect regional shifts in both genetic and phenotypic variation within the species. Our results highlight the difficultly in validating taxonomic boundaries within lineages that are evolving under a time-dependent, continuous process.

  1. 2b-RAD genotyping for population genomic studies of Chagas disease vectors: Rhodnius ecuadoriensis in Ecuador

    PubMed Central

    Villacís, Anita G.; Andersson, Björn; Costales, Jaime A.; De Noia, Michele; Ocaña-Mayorga, Sofía; Yumiseva, Cesar A.; Grijalva, Mario J.; Llewellyn, Martin S.

    2017-01-01

    Background Rhodnius ecuadoriensis is the main triatomine vector of Chagas disease, American trypanosomiasis, in Southern Ecuador and Northern Peru. Genomic approaches and next generation sequencing technologies have become powerful tools for investigating population diversity and structure which is a key consideration for vector control. Here we assess the effectiveness of three different 2b restriction site-associated DNA (2b-RAD) genotyping strategies in R. ecuadoriensis to provide sufficient genomic resolution to tease apart microevolutionary processes and undertake some pilot population genomic analyses. Methodology/Principal findings The 2b-RAD protocol was carried out in-house at a non-specialized laboratory using 20 R. ecuadoriensis adults collected from the central coast and southern Andean region of Ecuador, from June 2006 to July 2013. 2b-RAD sequencing data was performed on an Illumina MiSeq instrument and analyzed with the STACKS de novo pipeline for loci assembly and Single Nucleotide Polymorphism (SNP) discovery. Preliminary population genomic analyses (global AMOVA and Bayesian clustering) were implemented. Our results showed that the 2b-RAD genotyping protocol is effective for R. ecuadoriensis and likely for other triatomine species. However, only BcgI and CspCI restriction enzymes provided a number of markers suitable for population genomic analysis at the read depth we generated. Our preliminary genomic analyses detected a signal of genetic structuring across the study area. Conclusions/Significance Our findings suggest that 2b-RAD genotyping is both a cost effective and methodologically simple approach for generating high resolution genomic data for Chagas disease vectors with the power to distinguish between different vector populations at epidemiologically relevant scales. As such, 2b-RAD represents a powerful tool in the hands of medical entomologists with limited access to specialized molecular biological equipment. PMID:28723901

  2. Fuzzy boundaries: color and gene flow patterns among parapatric lineages of the western shovel-nosed snake and taxonomic implication.

    PubMed

    Wood, Dustin A; Fisher, Robert N; Vandergast, Amy G

    2014-01-01

    Accurate delineation of lineage diversity is increasingly important, as species distributions are becoming more reduced and threatened. During the last century, the subspecies category was often used to denote phenotypic variation within a species range and to provide a framework for understanding lineage differentiation, often considered incipient speciation. While this category has largely fallen into disuse, previously recognized subspecies often serve as important units for conservation policy and management when other information is lacking. In this study, we evaluated phenotypic subspecies hypotheses within shovel-nosed snakes on the basis of genetic data and considered how evolutionary processes such as gene flow influenced possible incongruence between phenotypic and genetic patterns. We used both traditional phylogenetic and Bayesian clustering analyses to infer range-wide genetic structure and spatially explicit analyses to detect possible boundary locations of lineage contact. Multilocus analyses supported three historically isolated groups with low to moderate levels of contemporary gene exchange. Genetic data did not support phenotypic subspecies as exclusive groups, and we detected patterns of discordance in areas where three subspecies are presumed to be in contact. Based on genetic and phenotypic evidence, we suggested that species-level diversity is underestimated in this group and we proposed that two species be recognized, Chionactis occipitalis and C. annulata. In addition, we recommend retention of two subspecific designations within C. annulata (C. a. annulata and C. a. klauberi) that reflect regional shifts in both genetic and phenotypic variation within the species. Our results highlight the difficultly in validating taxonomic boundaries within lineages that are evolving under a time-dependent, continuous process.

  3. Fuzzy Boundaries: Color and Gene Flow Patterns among Parapatric Lineages of the Western Shovel-Nosed Snake and Taxonomic Implication

    PubMed Central

    Wood, Dustin A.; Fisher, Robert N.; Vandergast, Amy G.

    2014-01-01

    Accurate delineation of lineage diversity is increasingly important, as species distributions are becoming more reduced and threatened. During the last century, the subspecies category was often used to denote phenotypic variation within a species range and to provide a framework for understanding lineage differentiation, often considered incipient speciation. While this category has largely fallen into disuse, previously recognized subspecies often serve as important units for conservation policy and management when other information is lacking. In this study, we evaluated phenotypic subspecies hypotheses within shovel-nosed snakes on the basis of genetic data and considered how evolutionary processes such as gene flow influenced possible incongruence between phenotypic and genetic patterns. We used both traditional phylogenetic and Bayesian clustering analyses to infer range-wide genetic structure and spatially explicit analyses to detect possible boundary locations of lineage contact. Multilocus analyses supported three historically isolated groups with low to moderate levels of contemporary gene exchange. Genetic data did not support phenotypic subspecies as exclusive groups, and we detected patterns of discordance in areas where three subspecies are presumed to be in contact. Based on genetic and phenotypic evidence, we suggested that species-level diversity is underestimated in this group and we proposed that two species be recognized, Chionactis occipitalis and C. annulata. In addition, we recommend retention of two subspecific designations within C. annulata (C. a. annulata and C. a. klauberi) that reflect regional shifts in both genetic and phenotypic variation within the species. Our results highlight the difficultly in validating taxonomic boundaries within lineages that are evolving under a time-dependent, continuous process. PMID:24848638

  4. Adaptation of Chain Event Graphs for use with Case-Control Studies in Epidemiology.

    PubMed

    Keeble, Claire; Thwaites, Peter Adam; Barber, Stuart; Law, Graham Richard; Baxter, Paul David

    2017-09-26

    Case-control studies are used in epidemiology to try to uncover the causes of diseases, but are a retrospective study design known to suffer from non-participation and recall bias, which may explain their decreased popularity in recent years. Traditional analyses report usually only the odds ratio for given exposures and the binary disease status. Chain event graphs are a graphical representation of a statistical model derived from event trees which have been developed in artificial intelligence and statistics, and only recently introduced to the epidemiology literature. They are a modern Bayesian technique which enable prior knowledge to be incorporated into the data analysis using the agglomerative hierarchical clustering algorithm, used to form a suitable chain event graph. Additionally, they can account for missing data and be used to explore missingness mechanisms. Here we adapt the chain event graph framework to suit scenarios often encountered in case-control studies, to strengthen this study design which is time and financially efficient. We demonstrate eight adaptations to the graphs, which consist of two suitable for full case-control study analysis, four which can be used in interim analyses to explore biases, and two which aim to improve the ease and accuracy of analyses. The adaptations are illustrated with complete, reproducible, fully-interpreted examples, including the event tree and chain event graph. Chain event graphs are used here for the first time to summarise non-participation, data collection techniques, data reliability, and disease severity in case-control studies. We demonstrate how these features of a case-control study can be incorporated into the analysis to provide further insight, which can help to identify potential biases and lead to more accurate study results.

  5. Specimen-level phylogenetics in paleontology using the Fossilized Birth-Death model with sampled ancestors.

    PubMed

    Cau, Andrea

    2017-01-01

    Bayesian phylogenetic methods integrating simultaneously morphological and stratigraphic information have been applied increasingly among paleontologists. Most of these studies have used Bayesian methods as an alternative to the widely-used parsimony analysis, to infer macroevolutionary patterns and relationships among species-level or higher taxa. Among recently introduced Bayesian methodologies, the Fossilized Birth-Death (FBD) model allows incorporation of hypotheses on ancestor-descendant relationships in phylogenetic analyses including fossil taxa. Here, the FBD model is used to infer the relationships among an ingroup formed exclusively by fossil individuals, i.e., dipnoan tooth plates from four localities in the Ain el Guettar Formation of Tunisia. Previous analyses of this sample compared the results of phylogenetic analysis using parsimony with stratigraphic methods, inferred a high diversity (five or more genera) in the Ain el Guettar Formation, and interpreted it as an artifact inflated by depositional factors. In the analysis performed here, the uncertainty on the chronostratigraphic relationships among the specimens was included among the prior settings. The results of the analysis confirm the referral of most of the specimens to the taxa Asiatoceratodus , Equinoxiodus, Lavocatodus and Neoceratodus , but reject those to Ceratodus and Ferganoceratodus . The resulting phylogeny constrained the evolution of the Tunisian sample exclusively in the Early Cretaceous, contrasting with the previous scenario inferred by the stratigraphically-calibrated topology resulting from parsimony analysis. The phylogenetic framework also suggests that (1) the sampled localities are laterally equivalent, (2) but three localities are restricted to the youngest part of the section; both results are in agreement with previous stratigraphic analyses of these localities. The FBD model of specimen-level units provides a novel tool for phylogenetic inference among fossils but also for independent tests of stratigraphic scenarios.

  6. Data Mining Methods for Recommender Systems

    NASA Astrophysics Data System (ADS)

    Amatriain, Xavier; Jaimes*, Alejandro; Oliver, Nuria; Pujol, Josep M.

    In this chapter, we give an overview of the main Data Mining techniques used in the context of Recommender Systems. We first describe common preprocessing methods such as sampling or dimensionality reduction. Next, we review the most important classification techniques, including Bayesian Networks and Support Vector Machines. We describe the k-means clustering algorithm and discuss several alternatives. We also present association rules and related algorithms for an efficient training process. In addition to introducing these techniques, we survey their uses in Recommender Systems and present cases where they have been successfully applied.

  7. Multivariate Bayesian analysis of Gaussian, right censored Gaussian, ordered categorical and binary traits using Gibbs sampling

    PubMed Central

    Korsgaard, Inge Riis; Lund, Mogens Sandø; Sorensen, Daniel; Gianola, Daniel; Madsen, Per; Jensen, Just

    2003-01-01

    A fully Bayesian analysis using Gibbs sampling and data augmentation in a multivariate model of Gaussian, right censored, and grouped Gaussian traits is described. The grouped Gaussian traits are either ordered categorical traits (with more than two categories) or binary traits, where the grouping is determined via thresholds on the underlying Gaussian scale, the liability scale. Allowances are made for unequal models, unknown covariance matrices and missing data. Having outlined the theory, strategies for implementation are reviewed. These include joint sampling of location parameters; efficient sampling from the fully conditional posterior distribution of augmented data, a multivariate truncated normal distribution; and sampling from the conditional inverse Wishart distribution, the fully conditional posterior distribution of the residual covariance matrix. Finally, a simulated dataset was analysed to illustrate the methodology. This paper concentrates on a model where residuals associated with liabilities of the binary traits are assumed to be independent. A Bayesian analysis using Gibbs sampling is outlined for the model where this assumption is relaxed. PMID:12633531

  8. A Bayesian account of quantum histories

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Marlow, Thomas

    2006-05-15

    We investigate whether quantum history theories can be consistent with Bayesian reasoning and whether such an analysis helps clarify the interpretation of such theories. First, we summarise and extend recent work categorising two different approaches to formalising multi-time measurements in quantum theory. The standard approach consists of describing an ordered series of measurements in terms of history propositions with non-additive 'probabilities.' The non-standard approach consists of defining multi-time measurements to consist of sets of exclusive and exhaustive history propositions and recovering the single-time exclusivity of results when discussing single-time history propositions. We analyse whether such history propositions can be consistentmore » with Bayes' rule. We show that certain class of histories are given a natural Bayesian interpretation, namely, the linearly positive histories originally introduced by Goldstein and Page. Thus, we argue that this gives a certain amount of interpretational clarity to the non-standard approach. We also attempt a justification of our analysis using Cox's axioms of probability theory.« less

  9. Real-time prediction of acute cardiovascular events using hardware-implemented Bayesian networks.

    PubMed

    Tylman, Wojciech; Waszyrowski, Tomasz; Napieralski, Andrzej; Kamiński, Marek; Trafidło, Tamara; Kulesza, Zbigniew; Kotas, Rafał; Marciniak, Paweł; Tomala, Radosław; Wenerski, Maciej

    2016-02-01

    This paper presents a decision support system that aims to estimate a patient׳s general condition and detect situations which pose an immediate danger to the patient׳s health or life. The use of this system might be especially important in places such as accident and emergency departments or admission wards, where a small medical team has to take care of many patients in various general conditions. Particular stress is laid on cardiovascular and pulmonary conditions, including those leading to sudden cardiac arrest. The proposed system is a stand-alone microprocessor-based device that works in conjunction with a standard vital signs monitor, which provides input signals such as temperature, blood pressure, pulseoxymetry, ECG, and ICG. The signals are preprocessed and analysed by a set of artificial intelligence algorithms, the core of which is based on Bayesian networks. The paper focuses on the construction and evaluation of the Bayesian network, both its structure and numerical specification. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Differences in Mortality among Heroin, Cocaine, and Methamphetamine Users: A Hierarchical Bayesian Approach

    PubMed Central

    Liang, Li-Jung; Huang, David; Brecht, Mary-Lynn; Hser, Yih-ing

    2010-01-01

    Studies examining differences in mortality among long-term drug users have been limited. In this paper, we introduce a Bayesian framework that jointly models survival data using a Weibull proportional hazard model with frailty, and substance and alcohol data using mixed-effects models, to examine differences in mortality among heroin, cocaine, and methamphetamine users from five long-term follow-up studies. The traditional approach to analyzing combined survival data from numerous studies assumes that the studies are homogeneous, thus the estimates may be biased due to unobserved heterogeneity among studies. Our approach allows us to structurally combine the data from different studies while accounting for correlation among subjects within each study. Markov chain Monte Carlo facilitates the implementation of Bayesian analyses. Despite the complexity of the model, our approach is relatively straightforward to implement using WinBUGS. We demonstrate our joint modeling approach to the combined data and discuss the results from both approaches. PMID:21052518

  11. Convergence among cave catfishes: long-branch attraction and a Bayesian relative rates test.

    PubMed

    Wilcox, T P; García de León, F J; Hendrickson, D A; Hillis, D M

    2004-06-01

    Convergence has long been of interest to evolutionary biologists. Cave organisms appear to be ideal candidates for studying convergence in morphological, physiological, and developmental traits. Here we report apparent convergence in two cave-catfishes that were described on morphological grounds as congeners: Prietella phreatophila and Prietella lundbergi. We collected mitochondrial DNA sequence data from 10 species of catfishes, representing five of the seven genera in Ictaluridae, as well as seven species from a broad range of siluriform outgroups. Analysis of the sequence data under parsimony supports a monophyletic Prietella. However, both maximum-likelihood and Bayesian analyses support polyphyly of the genus, with P. lundbergi sister to Ictalurus and P. phreatophila sister to Ameiurus. The topological difference between parsimony and the other methods appears to result from long-branch attraction between the Prietella species. Similarly, the sequence data do not support several other relationships within Ictaluridae supported by morphology. We develop a new Bayesian method for examining variation in molecular rates of evolution across a phylogeny.

  12. Bayesian decoding using unsorted spikes in the rat hippocampus

    PubMed Central

    Layton, Stuart P.; Chen, Zhe; Wilson, Matthew A.

    2013-01-01

    A fundamental task in neuroscience is to understand how neural ensembles represent information. Population decoding is a useful tool to extract information from neuronal populations based on the ensemble spiking activity. We propose a novel Bayesian decoding paradigm to decode unsorted spikes in the rat hippocampus. Our approach uses a direct mapping between spike waveform features and covariates of interest and avoids accumulation of spike sorting errors. Our decoding paradigm is nonparametric, encoding model-free for representing stimuli, and extracts information from all available spikes and their waveform features. We apply the proposed Bayesian decoding algorithm to a position reconstruction task for freely behaving rats based on tetrode recordings of rat hippocampal neuronal activity. Our detailed decoding analyses demonstrate that our approach is efficient and better utilizes the available information in the nonsortable hash than the standard sorting-based decoding algorithm. Our approach can be adapted to an online encoding/decoding framework for applications that require real-time decoding, such as brain-machine interfaces. PMID:24089403

  13. A Cluster Analytic Approach to Identifying Predictors and Moderators of Psychosocial Treatment for Bipolar Depression: Results from STEP-BD

    PubMed Central

    Deckersbach, Thilo; Peters, Amy T.; Sylvia, Louisa G.; Gold, Alexandra K.; da Silva Magalhaes, Pedro Vieira; Henry, David B.; Frank, Ellen; Otto, Michael W.; Berk, Michael; Dougherty, Darin D.; Nierenberg, Andrew A.; Miklowitz, David J.

    2016-01-01

    Background We sought to address how predictors and moderators of psychotherapy for bipolar depression – identified individually in prior analyses – can inform the development of a metric for prospectively classifying treatment outcome in intensive psychotherapy (IP) versus collaborative care (CC) adjunctive to pharmacotherapy in the Systematic Treatment Enhancement Program (STEP-BD) study. Methods We conducted post-hoc analyses on 135 STEP-BD participants using cluster analysis to identify subsets of participants with similar clinical profiles and investigated this combined metric as a moderator and predictor of response to IP. We used agglomerative hierarchical cluster analyses and k-means clustering to determine the content of the clinical profiles. Logistic regression and Cox proportional hazard models were used to evaluate whether the resulting clusters predicted or moderated likelihood of recovery or time until recovery. Results The cluster analysis yielded a two-cluster solution: 1) “less-recurrent/severe” and 2) “chronic/recurrent.” Rates of recovery in IP were similar for less-recurrent/severe and chronic/recurrent participants. Less-recurrent/severe patients were more likely than chronic/recurrent patients to achieve recovery in CC (p = .040, OR = 4.56). IP yielded a faster recovery for chronic/recurrent participants, whereas CC led to recovery sooner in the less-recurrent/severe cluster (p = .034, OR = 2.62). Limitations Cluster analyses require list-wise deletion of cases with missing data so we were unable to conduct analyses on all STEP-BD participants. Conclusions A well-powered, parametric approach can distinguish patients based on illness history and provide clinicians with symptom profiles of patients that confer differential prognosis in CC vs. IP. PMID:27289316

  14. Phylogenetically marking the limits of the genus Fusarium for post-Article 59 usage

    USDA-ARS?s Scientific Manuscript database

    Fusarium (Hypocreales, Nectriaceae) is one of the most important and systematically challenging groups of mycotoxigenic, plant pathogenic, and human pathogenic fungi. We conducted maximum likelihood (ML), maximum parsimony (MP) and Bayesian (B) analyses on partial nucleotide sequences of genes encod...

  15. A Bayesian network model for predicting type 2 diabetes risk based on electronic health records

    NASA Astrophysics Data System (ADS)

    Xie, Jiang; Liu, Yan; Zeng, Xu; Zhang, Wu; Mei, Zhen

    2017-07-01

    An extensive, in-depth study of diabetes risk factors (DBRF) is of crucial importance to prevent (or reduce) the chance of suffering from type 2 diabetes (T2D). Accumulation of electronic health records (EHRs) makes it possible to build nonlinear relationships between risk factors and diabetes. However, the current DBRF researches mainly focus on qualitative analyses, and the inconformity of physical examination items makes the risk factors likely to be lost, which drives us to study the novel machine learning approach for risk model development. In this paper, we use Bayesian networks (BNs) to analyze the relationship between physical examination information and T2D, and to quantify the link between risk factors and T2D. Furthermore, with the quantitative analyses of DBRF, we adopt EHR and propose a machine learning approach based on BNs to predict the risk of T2D. The experiments demonstrate that our approach can lead to better predictive performance than the classical risk model.

  16. A taxonomic monograph of Nearctic Scolytus Geoffroy (Coleoptera, Curculionidae, Scolytinae).

    PubMed

    Smith, Sarah M; Cognato, Anthony I

    2014-01-01

    The Nearctic bark beetle genus Scolytus Geoffroy was revised based in part on a molecular and morphological phylogeny. Monophyly of the native species was tested using mitochondrial (COI) and nuclear (28S, CAD, ArgK) genes and 43 morphological characters in parsimony and Bayesian phylogenetic analyses. Parsimony analyses of molecular and combined datasets provided mixed results while Bayesian analysis recovered most nodes with posterior probabilities >90%. Native hardwood- and conifer-feeding Scolytus species were recovered as paraphyletic. Native Nearctic species were recovered as paraphyletic with hardwood-feeding species sister to Palearctic hardwood-feeding species rather than to native conifer-feeding species. The Nearctic conifer-feeding species were monophyletic. Twenty-five species were recognized. Four new synonyms were discovered: Scolytuspraeceps LeConte, 1868 (= Scolytusabietis Blackman, 1934; = Scolytusopacus Blackman, 1934), Scolytusreflexus Blackman, 1934 (= Scolytusvirgatus Bright, 1972; = Scolytuswickhami Blackman, 1934). Two species were reinstated: Scolytusfiskei Blackman, 1934 and Scolytussilvaticus Bright, 1972. A diagnosis, description, distribution, host records and images were provided for each species and a key is presented to all species.

  17. A Bayesian model averaging approach with non-informative priors for cost-effectiveness analyses.

    PubMed

    Conigliani, Caterina

    2010-07-20

    We consider the problem of assessing new and existing technologies for their cost-effectiveness in the case where data on both costs and effects are available from a clinical trial, and we address it by means of the cost-effectiveness acceptability curve. The main difficulty in these analyses is that cost data usually exhibit highly skew and heavy-tailed distributions, so that it can be extremely difficult to produce realistic probabilistic models for the underlying population distribution. Here, in order to integrate the uncertainty about the model into the analysis of cost data and into cost-effectiveness analyses, we consider an approach based on Bayesian model averaging (BMA) in the particular case of weak prior informations about the unknown parameters of the different models involved in the procedure. The main consequence of this assumption is that the marginal densities required by BMA are undetermined. However, in accordance with the theory of partial Bayes factors and in particular of fractional Bayes factors, we suggest replacing each marginal density with a ratio of integrals that can be efficiently computed via path sampling. Copyright (c) 2010 John Wiley & Sons, Ltd.

  18. A revised phylogeny of Antilopini (Bovidae, Artiodactyla) using combined mitochondrial and nuclear genes.

    PubMed

    Bärmann, Eva Verena; Rössner, Gertrud Elisabeth; Wörheide, Gert

    2013-05-01

    Antilopini (gazelles and their allies) are one of the most diverse but phylogenetically controversial groups of bovids. Here we provide a molecular phylogeny of this poorly understood taxon using combined analyses of mitochondrial (CYTB, COIII, 12S, 16S) and nuclear (KCAS, SPTBN1, PRKCI, MC1R, THYR) genes. We explore the influence of data partitioning and different analytical methods, including Bayesian inference, maximum likelihood and maximum parsimony, on the inferred relationships within Antilopini. We achieve increased resolution and support compared to previous analyses especially in the two most problematic parts of their tree. First, taxa commonly referred to as "gazelles" are recovered as paraphyletic, as the genus Gazella appears more closely related to the Indian blackbuck (Antilope cervicapra) than to the other two gazelle genera (Nanger and Eudorcas). Second, we recovered a strongly supported sister relationship between one of the dwarf antelopes (Ourebia) and the Antilopini subgroup Antilopina (Saiga, Gerenuk, Springbok, Blackbuck and gazelles). The assessment of the influence of taxon sampling, outgroup rooting, and data partitioning in Bayesian analyses helps explain the contradictory results of previous studies. Copyright © 2013 Elsevier Inc. All rights reserved.

  19. THE DYNAMICS OF MERGING CLUSTERS: A MONTE CARLO SOLUTION APPLIED TO THE BULLET AND MUSKET BALL CLUSTERS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dawson, William A., E-mail: wadawson@ucdavis.edu

    2013-08-01

    Merging galaxy clusters have become one of the most important probes of dark matter, providing evidence for dark matter over modified gravity and even constraints on the dark matter self-interaction cross-section. To properly constrain the dark matter cross-section it is necessary to understand the dynamics of the merger, as the inferred cross-section is a function of both the velocity of the collision and the observed time since collision. While the best understanding of merging system dynamics comes from N-body simulations, these are computationally intensive and often explore only a limited volume of the merger phase space allowed by observed parametermore » uncertainty. Simple analytic models exist but the assumptions of these methods invalidate their results near the collision time, plus error propagation of the highly correlated merger parameters is unfeasible. To address these weaknesses I develop a Monte Carlo method to discern the properties of dissociative mergers and propagate the uncertainty of the measured cluster parameters in an accurate and Bayesian manner. I introduce this method, verify it against an existing hydrodynamic N-body simulation, and apply it to two known dissociative mergers: 1ES 0657-558 (Bullet Cluster) and DLSCL J0916.2+2951 (Musket Ball Cluster). I find that this method surpasses existing analytic models-providing accurate (10% level) dynamic parameter and uncertainty estimates throughout the merger history. This, coupled with minimal required a priori information (subcluster mass, redshift, and projected separation) and relatively fast computation ({approx}6 CPU hours), makes this method ideal for large samples of dissociative merging clusters.« less

  20. The globular cluster systems of 54 Coma ultra-diffuse galaxies: statistical constraints from HST data

    NASA Astrophysics Data System (ADS)

    Amorisco, N. C.; Monachesi, A.; Agnello, A.; White, S. D. M.

    2018-04-01

    We use data from the HST Coma Cluster Treasury program to assess the richness of the globular cluster systems (GCSs) of 54 Coma ultra-diffuse galaxies (UDGs), 18 of which have a half-light radius exceeding 1.5 kpc. We use a hierarchical Bayesian method tested on a large number of mock data sets to account consistently for the high and spatially varying background counts in Coma. These include both background galaxies and intra-cluster globular clusters (ICGCs), which are disentangled from the population of member globular clusters (GCs) in a probabilistic fashion. We find no candidate for a GCS as rich as that of the Milky Way, our sample has GCSs typical of dwarf galaxies. For the standard relation between GCS richness and halo mass, 33 galaxies have a virial mass Mvir ≤ 1011 M⊙ at 90 per cent probability. Only three have Mvir > 1011 M⊙ with the same confidence. The mean colour and spread in colour of the UDG GCs are indistinguishable from those of the abundant population of ICGCs. The majority of UDGs in our sample are consistent with the relation between stellar mass and GC richness of `normal' dwarf galaxies. Nine systems, however, display GCSs that are richer by a factor of 3 or more (at 90 per cent probability). Six of these have sizes ≲1.4 kpc. Our results imply that the physical mechanisms responsible for the extended size of the UDGs and for the enhanced GC richness of some cluster dwarfs are at most weakly correlated.

Top