Sample records for inferred large population

  1. Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data.

    PubMed

    Bhaskar, Anand; Wang, Y X Rachel; Song, Yun S

    2015-02-01

    With the recent increase in study sample sizes in human genetics, there has been growing interest in inferring historical population demography from genomic variation data. Here, we present an efficient inference method that can scale up to very large samples, with tens or hundreds of thousands of individuals. Specifically, by utilizing analytic results on the expected frequency spectrum under the coalescent and by leveraging the technique of automatic differentiation, which allows us to compute gradients exactly, we develop a very efficient algorithm to infer piecewise-exponential models of the historical effective population size from the distribution of sample allele frequencies. Our method is orders of magnitude faster than previous demographic inference methods based on the frequency spectrum. In addition to inferring demography, our method can also accurately estimate locus-specific mutation rates. We perform extensive validation of our method on simulated data and show that it can accurately infer multiple recent epochs of rapid exponential growth, a signal that is difficult to pick up with small sample sizes. Lastly, we use our method to analyze data from recent sequencing studies, including a large-sample exome-sequencing data set of tens of thousands of individuals assayed at a few hundred genic regions. © 2015 Bhaskar et al.; Published by Cold Spring Harbor Laboratory Press.

  2. Multi-Agent Inference in Social Networks: A Finite Population Learning Approach.

    PubMed

    Fan, Jianqing; Tong, Xin; Zeng, Yao

    When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to trade off the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people's incentives and interactions in the data collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept, finite population learning , to address whether with high probability, a large fraction of people in a given finite population network can make "good" inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows.

  3. Multi-Agent Inference in Social Networks: A Finite Population Learning Approach

    PubMed Central

    Tong, Xin; Zeng, Yao

    2016-01-01

    When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to trade off the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people’s incentives and interactions in the data collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept, finite population learning, to address whether with high probability, a large fraction of people in a given finite population network can make “good” inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows. PMID:27076691

  4. Minimal-assumption inference from population-genomic data

    NASA Astrophysics Data System (ADS)

    Weissman, Daniel; Hallatschek, Oskar

    Samples of multiple complete genome sequences contain vast amounts of information about the evolutionary history of populations, much of it in the associations among polymorphisms at different loci. Current methods that take advantage of this linkage information rely on models of recombination and coalescence, limiting the sample sizes and populations that they can analyze. We introduce a method, Minimal-Assumption Genomic Inference of Coalescence (MAGIC), that reconstructs key features of the evolutionary history, including the distribution of coalescence times, by integrating information across genomic length scales without using an explicit model of recombination, demography or selection. Using simulated data, we show that MAGIC's performance is comparable to PSMC' on single diploid samples generated with standard coalescent and recombination models. More importantly, MAGIC can also analyze arbitrarily large samples and is robust to changes in the coalescent and recombination processes. Using MAGIC, we show that the inferred coalescence time histories of samples of multiple human genomes exhibit inconsistencies with a description in terms of an effective population size based on single-genome data.

  5. Plug-and-play inference for disease dynamics: measles in large and small populations as a case study

    PubMed Central

    He, Daihai; Ionides, Edward L.; King, Aaron A.

    2010-01-01

    Statistical inference for mechanistic models of partially observed dynamic systems is an active area of research. Most existing inference methods place substantial restrictions upon the form of models that can be fitted and hence upon the nature of the scientific hypotheses that can be entertained and the data that can be used to evaluate them. In contrast, the so-called plug-and-play methods require only simulations from a model and are thus free of such restrictions. We show the utility of the plug-and-play approach in the context of an investigation of measles transmission dynamics. Our novel methodology enables us to ask and answer questions that previous analyses have been unable to address. Specifically, we demonstrate that plug-and-play methods permit the development of a modelling and inference framework applicable to data from both large and small populations. We thereby obtain novel insights into the nature of heterogeneity in mixing and comment on the importance of including extra-demographic stochasticity as a means of dealing with environmental stochasticity and model misspecification. Our approach is readily applicable to many other epidemiological and ecological systems. PMID:19535416

  6. Deep Learning for Population Genetic Inference.

    PubMed

    Sheehan, Sara; Song, Yun S

    2016-03-01

    Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.

  7. Deep Learning for Population Genetic Inference

    PubMed Central

    Sheehan, Sara; Song, Yun S.

    2016-01-01

    Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme. PMID:27018908

  8. Introgression Makes Waves in Inferred Histories of Effective Population Size.

    PubMed

    Hawks, John

    2017-01-01

    Human populations have a complex history of introgression and of changing population size. Human genetic variation has been affected by both these processes, so inference of past population size depends upon the pattern of gene flow and introgression among past populations. One remarkable aspect of human population history as inferred from genetics is a consistent "wave" of larger effective population sizes, found in both African and non-African populations, that appears to reflect events prior to the last 100,000 years. I carried out a series of simulations to investigate how introgression and gene flow from genetically divergent ancestral populations affect the inference of ancestral effective population size. Both introgression and gene flow from an extinct, genetically divergent population consistently produce a wave in the history of inferred effective population size. The time and amplitude of the wave reflect the time of origin of the genetically divergent ancestral populations and the strength of introgression or gene flow. These results demonstrate that even small fractions of introgression or gene flow from ancient populations may have visible effects on the inference of effective population size.

  9. Testing AGN unification via inference from large catalogs

    NASA Astrophysics Data System (ADS)

    Nikutta, Robert; Ivezic, Zeljko; Elitzur, Moshe; Nenkova, Maia

    2018-01-01

    Source orientation and clumpiness of the central dust are the main factors in AGN classification. Type-1 QSOs are easy to observe and large samples are available (e.g. in SDSS), but obscured type-2 AGN are dimmer and redder as our line of sight is more obscured, making it difficult to obtain a complete sample. WISE has found up to a million QSOs. With only 4 bands and a relatively small aperture the analysis of individual sources is challenging, but the large sample allows inference of bulk properties at a very significant level.CLUMPY (www.clumpy.org) is arguably the most popular database of AGN torus SEDs. We model the ensemble properties of the entire WISE AGN content using regularized linear regression, with orientation-dependent CLUMPY color-color-magnitude (CCM) tracks as basis functions. We can reproduce the observed number counts per CCM bin with percent-level accuracy, and simultaneously infer the probability distributions of all torus parameters, redshifts, additional SED components, and identify type-1/2 AGN populations through their IR properties alone. We increase the statistical power of our AGN unification tests even further, by adding other datasets as axes in the regression problem. To this end, we make use of the NOAO Data Lab (datalab.noao.edu), which hosts several high-level large datasets and provides very powerful tools for handling large data, e.g. cross-matched catalogs, fast remote queries, etc.

  10. Intercoalescence time distribution of incomplete gene genealogies in temporally varying populations, and applications in population genetic inference.

    PubMed

    Chen, Hua

    2013-03-01

    Tracing back to a specific time T in the past, the genealogy of a sample of haplotypes may not have reached their common ancestor and may leave m lineages extant. For such an incomplete genealogy truncated at a specific time T in the past, the distribution and expectation of the intercoalescence times conditional on T are derived in an exact form in this paper for populations of deterministically time-varying sizes, specifically, for populations growing exponentially. The derived intercoalescence time distribution can be integrated to the coalescent-based joint allele frequency spectrum (JAFS) theory, and is useful for population genetic inference from large-scale genomic data, without relying on computationally intensive approaches, such as importance sampling and Markov Chain Monte Carlo (MCMC) methods. The inference of several important parameters relying on this derived conditional distribution is demonstrated: quantifying population growth rate and onset time, and estimating the number of ancestral lineages at a specific ancient time. Simulation studies confirm validity of the derivation and statistical efficiency of the methods using the derived intercoalescence time distribution. Two examples of real data are given to show the inference of the population growth rate of a European sample from the NIEHS Environmental Genome Project, and the number of ancient lineages of 31 mitochondrial genomes from Tibetan populations. © 2013 Blackwell Publishing Ltd/University College London.

  11. The aggregate site frequency spectrum for comparative population genomic inference.

    PubMed

    Xue, Alexander T; Hickerson, Michael J

    2015-12-01

    Understanding how assemblages of species responded to past climate change is a central goal of comparative phylogeography and comparative population genomics, an endeavour that has increasing potential to integrate with community ecology. New sequencing technology now provides the potential to perform complex demographic inference at unprecedented resolution across assemblages of nonmodel species. To this end, we introduce the aggregate site frequency spectrum (aSFS), an expansion of the site frequency spectrum to use single nucleotide polymorphism (SNP) data sets collected from multiple, co-distributed species for assemblage-level demographic inference. We describe how the aSFS is constructed over an arbitrary number of independent population samples and then demonstrate how the aSFS can differentiate various multispecies demographic histories under a wide range of sampling configurations while allowing effective population sizes and expansion magnitudes to vary independently. We subsequently couple the aSFS with a hierarchical approximate Bayesian computation (hABC) framework to estimate degree of temporal synchronicity in expansion times across taxa, including an empirical demonstration with a data set consisting of five populations of the threespine stickleback (Gasterosteus aculeatus). Corroborating what is generally understood about the recent postglacial origins of these populations, the joint aSFS/hABC analysis strongly suggests that the stickleback data are most consistent with synchronous expansion after the Last Glacial Maximum (posterior probability = 0.99). The aSFS will have general application for multilevel statistical frameworks to test models involving assemblages and/or communities, and as large-scale SNP data from nonmodel species become routine, the aSFS expands the potential for powerful next-generation comparative population genomic inference. © 2015 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.

  12. Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach

    PubMed Central

    Boitard, Simon; Rodríguez, Willy; Jay, Flora; Mona, Stefano; Austerlitz, Frédéric

    2016-01-01

    Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles. PMID:26943927

  13. Inferring the temperature dependence of population parameters: the effects of experimental design and inference algorithm

    PubMed Central

    Palamara, Gian Marco; Childs, Dylan Z; Clements, Christopher F; Petchey, Owen L; Plebani, Marco; Smith, Matthew J

    2014-01-01

    Understanding and quantifying the temperature dependence of population parameters, such as intrinsic growth rate and carrying capacity, is critical for predicting the ecological responses to environmental change. Many studies provide empirical estimates of such temperature dependencies, but a thorough investigation of the methods used to infer them has not been performed yet. We created artificial population time series using a stochastic logistic model parameterized with the Arrhenius equation, so that activation energy drives the temperature dependence of population parameters. We simulated different experimental designs and used different inference methods, varying the likelihood functions and other aspects of the parameter estimation methods. Finally, we applied the best performing inference methods to real data for the species Paramecium caudatum. The relative error of the estimates of activation energy varied between 5% and 30%. The fraction of habitat sampled played the most important role in determining the relative error; sampling at least 1% of the habitat kept it below 50%. We found that methods that simultaneously use all time series data (direct methods) and methods that estimate population parameters separately for each temperature (indirect methods) are complementary. Indirect methods provide a clearer insight into the shape of the functional form describing the temperature dependence of population parameters; direct methods enable a more accurate estimation of the parameters of such functional forms. Using both methods, we found that growth rate and carrying capacity of Paramecium caudatum scale with temperature according to different activation energies. Our study shows how careful choice of experimental design and inference methods can increase the accuracy of the inferred relationships between temperature and population parameters. The comparison of estimation methods provided here can increase the accuracy of model predictions, with important

  14. A fast least-squares algorithm for population inference

    PubMed Central

    2013-01-01

    Background Population inference is an important problem in genetics used to remove population stratification in genome-wide association studies and to detect migration patterns or shared ancestry. An individual’s genotype can be modeled as a probabilistic function of ancestral population memberships, Q, and the allele frequencies in those populations, P. The parameters, P and Q, of this binomial likelihood model can be inferred using slow sampling methods such as Markov Chain Monte Carlo methods or faster gradient based approaches such as sequential quadratic programming. This paper proposes a least-squares simplification of the binomial likelihood model motivated by a Euclidean interpretation of the genotype feature space. This results in a faster algorithm that easily incorporates the degree of admixture within the sample of individuals and improves estimates without requiring trial-and-error tuning. Results We show that the expected value of the least-squares solution across all possible genotype datasets is equal to the true solution when part of the problem has been solved, and that the variance of the solution approaches zero as its size increases. The Least-squares algorithm performs nearly as well as Admixture for these theoretical scenarios. We compare least-squares, Admixture, and FRAPPE for a variety of problem sizes and difficulties. For particularly hard problems with a large number of populations, small number of samples, or greater degree of admixture, least-squares performs better than the other methods. On simulated mixtures of real population allele frequencies from the HapMap project, Admixture estimates sparsely mixed individuals better than Least-squares. The least-squares approach, however, performs within 1.5% of the Admixture error. On individual genotypes from the HapMap project, Admixture and least-squares perform qualitatively similarly and within 1.2% of each other. Significantly, the least-squares approach nearly always converges 1

  15. A fast least-squares algorithm for population inference.

    PubMed

    Parry, R Mitchell; Wang, May D

    2013-01-23

    Population inference is an important problem in genetics used to remove population stratification in genome-wide association studies and to detect migration patterns or shared ancestry. An individual's genotype can be modeled as a probabilistic function of ancestral population memberships, Q, and the allele frequencies in those populations, P. The parameters, P and Q, of this binomial likelihood model can be inferred using slow sampling methods such as Markov Chain Monte Carlo methods or faster gradient based approaches such as sequential quadratic programming. This paper proposes a least-squares simplification of the binomial likelihood model motivated by a Euclidean interpretation of the genotype feature space. This results in a faster algorithm that easily incorporates the degree of admixture within the sample of individuals and improves estimates without requiring trial-and-error tuning. We show that the expected value of the least-squares solution across all possible genotype datasets is equal to the true solution when part of the problem has been solved, and that the variance of the solution approaches zero as its size increases. The Least-squares algorithm performs nearly as well as Admixture for these theoretical scenarios. We compare least-squares, Admixture, and FRAPPE for a variety of problem sizes and difficulties. For particularly hard problems with a large number of populations, small number of samples, or greater degree of admixture, least-squares performs better than the other methods. On simulated mixtures of real population allele frequencies from the HapMap project, Admixture estimates sparsely mixed individuals better than Least-squares. The least-squares approach, however, performs within 1.5% of the Admixture error. On individual genotypes from the HapMap project, Admixture and least-squares perform qualitatively similarly and within 1.2% of each other. Significantly, the least-squares approach nearly always converges 1.5- to 6-times faster

  16. Rare variation facilitates inferences of fine-scale population structure in humans.

    PubMed

    O'Connor, Timothy D; Fu, Wenqing; Mychaleckyj, Josyf C; Logsdon, Benjamin; Auer, Paul; Carlson, Christopher S; Leal, Suzanne M; Smith, Joshua D; Rieder, Mark J; Bamshad, Michael J; Nickerson, Deborah A; Akey, Joshua M

    2015-03-01

    Understanding the genetic structure of human populations has important implications for the design and interpretation of disease mapping studies and reconstructing human evolutionary history. To date, inferences of human population structure have primarily been made with common variants. However, recent large-scale resequencing studies have shown an abundance of rare variation in humans, which may be particularly useful for making inferences of fine-scale population structure. To this end, we used an information theory framework and extensive coalescent simulations to rigorously quantify the informativeness of rare and common variation to detect signatures of fine-scale population structure. We show that rare variation affords unique insights into patterns of recent population structure. Furthermore, to empirically assess our theoretical findings, we analyzed high-coverage exome sequences in 6,515 European and African American individuals. As predicted, rare variants are more informative than common polymorphisms in revealing a distinct cluster of European-American individuals, and subsequent analyses demonstrate that these individuals are likely of Ashkenazi Jewish ancestry. Our results provide new insights into the population structure using rare variation, which will be an important factor to account for in rare variant association studies. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  17. Fast and accurate inference of local ancestry in Latino populations

    PubMed Central

    Baran, Yael; Pasaniuc, Bogdan; Sankararaman, Sriram; Torgerson, Dara G.; Gignoux, Christopher; Eng, Celeste; Rodriguez-Cintron, William; Chapela, Rocio; Ford, Jean G.; Avila, Pedro C.; Rodriguez-Santana, Jose; Burchard, Esteban Gonzàlez; Halperin, Eran

    2012-01-01

    Motivation: It is becoming increasingly evident that the analysis of genotype data from recently admixed populations is providing important insights into medical genetics and population history. Such analyses have been used to identify novel disease loci, to understand recombination rate variation and to detect recent selection events. The utility of such studies crucially depends on accurate and unbiased estimation of the ancestry at every genomic locus in recently admixed populations. Although various methods have been proposed and shown to be extremely accurate in two-way admixtures (e.g. African Americans), only a few approaches have been proposed and thoroughly benchmarked on multi-way admixtures (e.g. Latino populations of the Americas). Results: To address these challenges we introduce here methods for local ancestry inference which leverage the structure of linkage disequilibrium in the ancestral population (LAMP-LD), and incorporate the constraint of Mendelian segregation when inferring local ancestry in nuclear family trios (LAMP-HAP). Our algorithms uniquely combine hidden Markov models (HMMs) of haplotype diversity within a novel window-based framework to achieve superior accuracy as compared with published methods. Further, unlike previous methods, the structure of our HMM does not depend on the number of reference haplotypes but on a fixed constant, and it is thereby capable of utilizing large datasets while remaining highly efficient and robust to over-fitting. Through simulations and analysis of real data from 489 nuclear trio families from the mainland US, Puerto Rico and Mexico, we demonstrate that our methods achieve superior accuracy compared with published methods for local ancestry inference in Latinos. Availability: http://lamp.icsi.berkeley.edu/lamp/lampld/ Contact: bpasaniu@hsph.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22495753

  18. Population genetics inference for longitudinally-sampled mutants under strong selection.

    PubMed

    Lacerda, Miguel; Seoighe, Cathal

    2014-11-01

    Longitudinal allele frequency data are becoming increasingly prevalent. Such samples permit statistical inference of the population genetics parameters that influence the fate of mutant variants. To infer these parameters by maximum likelihood, the mutant frequency is often assumed to evolve according to the Wright-Fisher model. For computational reasons, this discrete model is commonly approximated by a diffusion process that requires the assumption that the forces of natural selection and mutation are weak. This assumption is not always appropriate. For example, mutations that impart drug resistance in pathogens may evolve under strong selective pressure. Here, we present an alternative approximation to the mutant-frequency distribution that does not make any assumptions about the magnitude of selection or mutation and is much more computationally efficient than the standard diffusion approximation. Simulation studies are used to compare the performance of our method to that of the Wright-Fisher and Gaussian diffusion approximations. For large populations, our method is found to provide a much better approximation to the mutant-frequency distribution when selection is strong, while all three methods perform comparably when selection is weak. Importantly, maximum-likelihood estimates of the selection coefficient are severely attenuated when selection is strong under the two diffusion models, but not when our method is used. This is further demonstrated with an application to mutant-frequency data from an experimental study of bacteriophage evolution. We therefore recommend our method for estimating the selection coefficient when the effective population size is too large to utilize the discrete Wright-Fisher model. Copyright © 2014 by the Genetics Society of America.

  19. Inference of population splits and mixtures from genome-wide allele frequency data.

    PubMed

    Pickrell, Joseph K; Pritchard, Jonathan K

    2012-01-01

    Many aspects of the historical relationships between populations in a species are reflected in genetic data. Inferring these relationships from genetic data, however, remains a challenging task. In this paper, we present a statistical model for inferring the patterns of population splits and mixtures in multiple populations. In our model, the sampled populations in a species are related to their common ancestor through a graph of ancestral populations. Using genome-wide allele frequency data and a Gaussian approximation to genetic drift, we infer the structure of this graph. We applied this method to a set of 55 human populations and a set of 82 dog breeds and wild canids. In both species, we show that a simple bifurcating tree does not fully describe the data; in contrast, we infer many migration events. While some of the migration events that we find have been detected previously, many have not. For example, in the human data, we infer that Cambodians trace approximately 16% of their ancestry to a population ancestral to other extant East Asian populations. In the dog data, we infer that both the boxer and basenji trace a considerable fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to domestication and that East Asian toy breeds (the Shih Tzu and the Pekingese) result from admixture between modern toy breeds and "ancient" Asian breeds. Software implementing the model described here, called TreeMix, is available at http://treemix.googlecode.com.

  20. Inferring infection hazard in wildlife populations by linking data across individual and population scales.

    PubMed

    Pepin, Kim M; Kay, Shannon L; Golas, Ben D; Shriner, Susan S; Gilbert, Amy T; Miller, Ryan S; Graham, Andrea L; Riley, Steven; Cross, Paul C; Samuel, Michael D; Hooten, Mevin B; Hoeting, Jennifer A; Lloyd-Smith, James O; Webb, Colleen T; Buhnerkempe, Michael G

    2017-03-01

    Our ability to infer unobservable disease-dynamic processes such as force of infection (infection hazard for susceptible hosts) has transformed our understanding of disease transmission mechanisms and capacity to predict disease dynamics. Conventional methods for inferring FOI estimate a time-averaged value and are based on population-level processes. Because many pathogens exhibit epidemic cycling and FOI is the result of processes acting across the scales of individuals and populations, a flexible framework that extends to epidemic dynamics and links within-host processes to FOI is needed. Specifically, within-host antibody kinetics in wildlife hosts can be short-lived and produce patterns that are repeatable across individuals, suggesting individual-level antibody concentrations could be used to infer time since infection and hence FOI. Using simulations and case studies (influenza A in lesser snow geese and Yersinia pestis in coyotes), we argue that with careful experimental and surveillance design, the population-level FOI signal can be recovered from individual-level antibody kinetics, despite substantial individual-level variation. In addition to improving inference, the cross-scale quantitative antibody approach we describe can reveal insights into drivers of individual-based variation in disease response, and the role of poorly understood processes such as secondary infections, in population-level dynamics of disease. © 2017 John Wiley & Sons Ltd/CNRS.

  1. Inferring infection hazard in wildlife populations by linking data across individual and population scales

    USGS Publications Warehouse

    Pepin, Kim M.; Kay, Shannon L.; Golas, Ben D.; Shriner, Susan A.; Gilbert, Amy T.; Miller, Ryan S.; Graham, Andrea L.; Riley, Steven; Cross, Paul C.; Samuel, Michael D.; Hooten, Mevin B.; Hoeting, Jennifer A.; Lloyd-Smith, James O.; Webb, Colleen T.; Buhnerkempe, Michael G.

    2017-01-01

    Our ability to infer unobservable disease-dynamic processes such as force of infection (infection hazard for susceptible hosts) has transformed our understanding of disease transmission mechanisms and capacity to predict disease dynamics. Conventional methods for inferring FOI estimate a time-averaged value and are based on population-level processes. Because many pathogens exhibit epidemic cycling and FOI is the result of processes acting across the scales of individuals and populations, a flexible framework that extends to epidemic dynamics and links within-host processes to FOI is needed. Specifically, within-host antibody kinetics in wildlife hosts can be short-lived and produce patterns that are repeatable across individuals, suggesting individual-level antibody concentrations could be used to infer time since infection and hence FOI. Using simulations and case studies (influenza A in lesser snow geese and Yersinia pestis in coyotes), we argue that with careful experimental and surveillance design, the population-level FOI signal can be recovered from individual-level antibody kinetics, despite substantial individual-level variation. In addition to improving inference, the cross-scale quantitative antibody approach we describe can reveal insights into drivers of individual-based variation in disease response, and the role of poorly understood processes such as secondary infections, in population-level dynamics of disease.

  2. INFERENCE FOR INDIVIDUAL-LEVEL MODELS OF INFECTIOUS DISEASES IN LARGE POPULATIONS.

    PubMed

    Deardon, Rob; Brooks, Stephen P; Grenfell, Bryan T; Keeling, Matthew J; Tildesley, Michael J; Savill, Nicholas J; Shaw, Darren J; Woolhouse, Mark E J

    2010-01-01

    Individual Level Models (ILMs), a new class of models, are being applied to infectious epidemic data to aid in the understanding of the spatio-temporal dynamics of infectious diseases. These models are highly flexible and intuitive, and can be parameterised under a Bayesian framework via Markov chain Monte Carlo (MCMC) methods. Unfortunately, this parameterisation can be difficult to implement due to intense computational requirements when calculating the full posterior for large, or even moderately large, susceptible populations, or when missing data are present. Here we detail a methodology that can be used to estimate parameters for such large, and/or incomplete, data sets. This is done in the context of a study of the UK 2001 foot-and-mouth disease (FMD) epidemic.

  3. Hierarchical animal movement models for population-level inference

    USGS Publications Warehouse

    Hooten, Mevin B.; Buderman, Frances E.; Brost, Brian M.; Hanks, Ephraim M.; Ivans, Jacob S.

    2016-01-01

    New methods for modeling animal movement based on telemetry data are developed regularly. With advances in telemetry capabilities, animal movement models are becoming increasingly sophisticated. Despite a need for population-level inference, animal movement models are still predominantly developed for individual-level inference. Most efforts to upscale the inference to the population level are either post hoc or complicated enough that only the developer can implement the model. Hierarchical Bayesian models provide an ideal platform for the development of population-level animal movement models but can be challenging to fit due to computational limitations or extensive tuning required. We propose a two-stage procedure for fitting hierarchical animal movement models to telemetry data. The two-stage approach is statistically rigorous and allows one to fit individual-level movement models separately, then resample them using a secondary MCMC algorithm. The primary advantages of the two-stage approach are that the first stage is easily parallelizable and the second stage is completely unsupervised, allowing for an automated fitting procedure in many cases. We demonstrate the two-stage procedure with two applications of animal movement models. The first application involves a spatial point process approach to modeling telemetry data, and the second involves a more complicated continuous-time discrete-space animal movement model. We fit these models to simulated data and real telemetry data arising from a population of monitored Canada lynx in Colorado, USA.

  4. Using DNA metabarcoding for simultaneous inference of common vampire bat diet and population structure.

    PubMed

    Bohmann, Kristine; Gopalakrishnan, Shyam; Nielsen, Martin; Nielsen, Luisa Dos Santos Bay; Jones, Gareth; Streicker, Daniel G; Gilbert, M Thomas P

    2018-04-19

    Metabarcoding diet analysis has become a valuable tool in animal ecology; however, co-amplified predator sequences are not generally used for anything other than to validate predator identity. Exemplified by the common vampire bat, we demonstrate the use of metabarcoding to infer predator population structure alongside diet assessments. Growing populations of common vampire bats impact human, livestock and wildlife health in Latin America through transmission of pathogens, such as lethal rabies viruses. Techniques to determine large-scale variation in vampire bat diet and bat population structure would empower locality- and species-specific projections of disease transmission risks. However, previously used methods are not cost-effective and efficient for large-scale applications. Using bloodmeal and faecal samples from common vampire bats from coastal, Andean and Amazonian regions of Peru, we showcase metabarcoding as a scalable tool to assess vampire bat population structure and feeding preferences. Dietary metabarcoding was highly effective, detecting vertebrate prey in 93.2% of the samples. Bats predominantly preyed on domestic animals, but fed on tapirs at one Amazonian site. In addition, we identified arthropods in 9.3% of samples, likely reflecting consumption of ectoparasites. Using the same data, we document mitochondrial geographic population structure in the common vampire bat in Peru. Such simultaneous inference of vampire bat diet and population structure can enable new insights into the interplay between vampire bat ecology and disease transmission risks. Importantly, the methodology can be incorporated into metabarcoding diet studies of other animals to couple information on diet and population structure. © 2018 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.

  5. Algorithm of OMA for large-scale orthology inference

    PubMed Central

    Roth, Alexander CJ; Gonnet, Gaston H; Dessimoz, Christophe

    2008-01-01

    Background OMA is a project that aims to identify orthologs within publicly available, complete genomes. With 657 genomes analyzed to date, OMA is one of the largest projects of its kind. Results The algorithm of OMA improves upon standard bidirectional best-hit approach in several respects: it uses evolutionary distances instead of scores, considers distance inference uncertainty, includes many-to-many orthologous relations, and accounts for differential gene losses. Herein, we describe in detail the algorithm for inference of orthology and provide the rationale for parameter selection through multiple tests. Conclusion OMA contains several novel improvement ideas for orthology inference and provides a unique dataset of large-scale orthology assignments. PMID:19055798

  6. Inference in the brain: Statistics flowing in redundant population codes

    PubMed Central

    Pitkow, Xaq; Angelaki, Dora E

    2017-01-01

    It is widely believed that the brain performs approximate probabilistic inference to estimate causal variables in the world from ambiguous sensory data. To understand these computations, we need to analyze how information is represented and transformed by the actions of nonlinear recurrent neural networks. We propose that these probabilistic computations function by a message-passing algorithm operating at the level of redundant neural populations. To explain this framework, we review its underlying concepts, including graphical models, sufficient statistics, and message-passing, and then describe how these concepts could be implemented by recurrently connected probabilistic population codes. The relevant information flow in these networks will be most interpretable at the population level, particularly for redundant neural codes. We therefore outline a general approach to identify the essential features of a neural message-passing algorithm. Finally, we argue that to reveal the most important aspects of these neural computations, we must study large-scale activity patterns during moderately complex, naturalistic behaviors. PMID:28595050

  7. Multi-InDel Analysis for Ancestry Inference of Sub-Populations in China

    PubMed Central

    Sun, Kuan; Ye, Yi; Luo, Tao; Hou, Yiping

    2016-01-01

    Ancestry inference is of great interest in diverse areas of scientific researches, including the forensic biology, medical genetics and anthropology. Various methods have been published for distinguishing populations. However, few reports refer to sub-populations (like ethnic groups) within Asian populations for the limitation of markers. Several InDel loci located very tightly in physical positions were treated as one marker by us, which is multi-InDel. The multi-InDel shows potential as Ancestry Inference Marker (AIM). In this study, we performed a genome-wide scan for multi-InDels as AIM. After examining the FST distributions in the 1000 Genomes Database, 12 candidates were selected and validated for eastern Asian populations. A multiplexed assay was developed as a panel to genotype 12 multi-InDel markers simultaneously. Ancestry component analysis with STRUCTURE and principal component analysis (PCA) were employed to estimate its capability for ancestry inference. Furthermore, ancestry assignments of trial individuals were conducted. It proved to be very effective when 210 samples from Han and Tibetan individuals in China were tested. The panel consisting of multi-InDel markers exhibited considerable potency in ancestry inference, and was suggested to be applied in forensic practices and genetic population studies. PMID:28004788

  8. Bayesian Parameter Inference and Model Selection by Population Annealing in Systems Biology

    PubMed Central

    Murakami, Yohei

    2014-01-01

    Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named “posterior parameter ensemble”. We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor. PMID:25089832

  9. Inferences about ungulate population dynamics derived from age ratios

    USGS Publications Warehouse

    Harris, N.C.; Kauffman, M.J.; Mills, L.S.

    2008-01-01

    Age ratios (e.g., calf:cow for elk and fawn:doe for deer) are used regularly to monitor ungulate populations. However, it remains unclear what inferences are appropriate from this index because multiple vital rate changes can influence the observed ratio. We used modeling based on elk (Cervus elaphus) life-history to evaluate both how age ratios are influenced by stage-specific fecundity and survival and how well age ratios track population dynamics. Although all vital rates have the potential to influence calf:adult female ratios (i.e., calf:xow ratios), calf survival explained the vast majority of variation in calf:adult female ratios due to its temporal variation compared to other vital rates. Calf:adult female ratios were positively correlated with population growth rate (??) and often successfully indicated population trajectories. However, calf:adult female ratios performed poorly at detecting imposed declines in calf survival, suggesting that only the most severe declines would be rapidly detected. Our analyses clarify that managers can use accurate, unbiased age ratios to monitor arguably the most important components contributing to sustainable ungulate populations, survival rate of young and ??. However, age ratios are not useful for detecting gradual declines in survival of young or making inferences about fecundity or adult survival in ungulate populations. Therefore, age ratios coupled with independent estimates of population growth or population size are necessary to monitor ungulate population demography and dynamics closely through time.

  10. Circadian analysis of large human populations: inferences from the power grid.

    PubMed

    Stowie, Adam C; Amicarelli, Mario J; Crosier, Caitlin J; Mymko, Ryan; Glass, J David

    2015-03-01

    Few, if any studies have focused on the daily rhythmic nature of modern industrialized populations. The present study utilized real-time load data from the U.S. Pacific Northwest electrical power grid as a reflection of human operative household activity. This approach involved actigraphic analyses of continuously streaming internet data (provided in 5 min bins) from a human subject pool of approximately 43 million primarily residential users. Rhythm analyses reveal striking seasonal and intra-week differences in human activity patterns, largely devoid of manufacturing and automated load interference. Length of the diurnal activity period (alpha) is longer during the spring than the summer (16.64 h versus 15.98 h, respectively; p < 0.01). As expected, significantly more activity occurs in the solar dark phase during the winter than during the summer (6.29 h versus 2.03 h, respectively; p < 0.01). Interestingly, throughout the year a "weekend effect" is evident, where morning activity onset occurs approximately 1 h later than during the work week (5:54 am versus 6:52 am, respectively; p < 0.01). This indicates a general phase-delaying response to the absence of job-related or other weekday morning arousal cues, substantiating a preference or need to sleep longer on weekends. Finally, a shift in onset time can be seen during the transition to Day Light Saving Time, but not the transition back to Standard Time. The use of grid power load as a means for human actimetry assessment thus offers new insights into the collective diurnal activity patterns of large human populations.

  11. Inferring Admixture Histories of Human Populations Using Linkage Disequilibrium

    PubMed Central

    Loh, Po-Ru; Lipson, Mark; Patterson, Nick; Moorjani, Priya; Pickrell, Joseph K.; Reich, David; Berger, Bonnie

    2013-01-01

    Long-range migrations and the resulting admixtures between populations have been important forces shaping human genetic diversity. Most existing methods for detecting and reconstructing historical admixture events are based on allele frequency divergences or patterns of ancestry segments in chromosomes of admixed individuals. An emerging new approach harnesses the exponential decay of admixture-induced linkage disequilibrium (LD) as a function of genetic distance. Here, we comprehensively develop LD-based inference into a versatile tool for investigating admixture. We present a new weighted LD statistic that can be used to infer mixture proportions as well as dates with fewer constraints on reference populations than previous methods. We define an LD-based three-population test for admixture and identify scenarios in which it can detect admixture events that previous formal tests cannot. We further show that we can uncover phylogenetic relationships among populations by comparing weighted LD curves obtained using a suite of references. Finally, we describe several improvements to the computation and fitting of weighted LD curves that greatly increase the robustness and speed of the calculations. We implement all of these advances in a software package, ALDER, which we validate in simulations and apply to test for admixture among all populations from the Human Genome Diversity Project (HGDP), highlighting insights into the admixture history of Central African Pygmies, Sardinians, and Japanese. PMID:23410830

  12. Large-scale parentage inference with SNPs: an efficient algorithm for statistical confidence of parent pair allocations.

    PubMed

    Anderson, Eric C

    2012-11-08

    Advances in genotyping that allow tens of thousands of individuals to be genotyped at a moderate number of single nucleotide polymorphisms (SNPs) permit parentage inference to be pursued on a very large scale. The intergenerational tagging this capacity allows is revolutionizing the management of cultured organisms (cows, salmon, etc.) and is poised to do the same for scientific studies of natural populations. Currently, however, there are no likelihood-based methods of parentage inference which are implemented in a manner that allows them to quickly handle a very large number of potential parents or parent pairs. Here we introduce an efficient likelihood-based method applicable to the specialized case of cultured organisms in which both parents can be reliably sampled. We develop a Markov chain representation for the cumulative number of Mendelian incompatibilities between an offspring and its putative parents and we exploit it to develop a fast algorithm for simulation-based estimates of statistical confidence in SNP-based assignments of offspring to pairs of parents. The method is implemented in the freely available software SNPPIT. We describe the method in detail, then assess its performance in a large simulation study using known allele frequencies at 96 SNPs from ten hatchery salmon populations. The simulations verify that the method is fast and accurate and that 96 well-chosen SNPs can provide sufficient power to identify the correct pair of parents from amongst millions of candidate pairs.

  13. Comparing population structure as inferred from genealogical versus genetic information.

    PubMed

    Colonna, Vincenza; Nutile, Teresa; Ferrucci, Ronald R; Fardella, Giulio; Aversano, Mario; Barbujani, Guido; Ciullo, Marina

    2009-12-01

    Algorithms for inferring population structure from genetic data (ie, population assignment methods) have shown to effectively recognize genetic clusters in human populations. However, their performance in identifying groups of genealogically related individuals, especially in scanty-differentiated populations, has not been tested empirically thus far. For this study, we had access to both genealogical and genetic data from two closely related, isolated villages in southern Italy. We found that nearly all living individuals were included in a single pedigree, with multiple inbreeding loops. Despite F(st) between villages being a low 0.008, genetic clustering analysis identified two clusters roughly corresponding to the two villages. Average kinship between individuals (estimated from genealogies) increased at increasing values of group membership (estimated from the genetic data), showing that the observed genetic clusters represent individuals who are more closely related to each other than to random members of the population. Further, average kinship within clusters and F(st) between clusters increases with increasingly stringent membership threshold requirements. We conclude that a limited number of genetic markers is sufficient to detect structuring, and that the results of genetic analyses faithfully mirror the structuring inferred from detailed analyses of population genealogies, even when F(st) values are low, as in the case of the two villages. We then estimate the impact of observed levels of population structure on association studies using simulated data.

  14. Comparing population structure as inferred from genealogical versus genetic information

    PubMed Central

    Colonna, Vincenza; Nutile, Teresa; Ferrucci, Ronald R; Fardella, Giulio; Aversano, Mario; Barbujani, Guido; Ciullo, Marina

    2009-01-01

    Algorithms for inferring population structure from genetic data (ie, population assignment methods) have shown to effectively recognize genetic clusters in human populations. However, their performance in identifying groups of genealogically related individuals, especially in scanty-differentiated populations, has not been tested empirically thus far. For this study, we had access to both genealogical and genetic data from two closely related, isolated villages in southern Italy. We found that nearly all living individuals were included in a single pedigree, with multiple inbreeding loops. Despite Fst between villages being a low 0.008, genetic clustering analysis identified two clusters roughly corresponding to the two villages. Average kinship between individuals (estimated from genealogies) increased at increasing values of group membership (estimated from the genetic data), showing that the observed genetic clusters represent individuals who are more closely related to each other than to random members of the population. Further, average kinship within clusters and Fst between clusters increases with increasingly stringent membership threshold requirements. We conclude that a limited number of genetic markers is sufficient to detect structuring, and that the results of genetic analyses faithfully mirror the structuring inferred from detailed analyses of population genealogies, even when Fst values are low, as in the case of the two villages. We then estimate the impact of observed levels of population structure on association studies using simulated data. PMID:19550436

  15. Use of genetic data to infer population-specific ecological and phenotypic traits from mixed aggregations

    USGS Publications Warehouse

    Moran, Paul; Bromaghin, Jeffrey F.; Masuda, Michele

    2014-01-01

    Many applications in ecological genetics involve sampling individuals from a mixture of multiple biological populations and subsequently associating those individuals with the populations from which they arose. Analytical methods that assign individuals to their putative population of origin have utility in both basic and applied research, providing information about population-specific life history and habitat use, ecotoxins, pathogen and parasite loads, and many other non-genetic ecological, or phenotypic traits. Although the question is initially directed at the origin of individuals, in most cases the ultimate desire is to investigate the distribution of some trait among populations. Current practice is to assign individuals to a population of origin and study properties of the trait among individuals within population strata as if they constituted independent samples. It seemed that approach might bias population-specific trait inference. In this study we made trait inferences directly through modeling, bypassing individual assignment. We extended a Bayesian model for population mixture analysis to incorporate parameters for the phenotypic trait and compared its performance to that of individual assignment with a minimum probability threshold for assignment. The Bayesian mixture model outperformed individual assignment under some trait inference conditions. However, by discarding individuals whose origins are most uncertain, the individual assignment method provided a less complex analytical technique whose performance may be adequate for some common trait inference problems. Our results provide specific guidance for method selection under various genetic relationships among populations with different trait distributions.

  16. Use of Genetic Data to Infer Population-Specific Ecological and Phenotypic Traits from Mixed Aggregations

    PubMed Central

    Moran, Paul; Bromaghin, Jeffrey F.; Masuda, Michele

    2014-01-01

    Many applications in ecological genetics involve sampling individuals from a mixture of multiple biological populations and subsequently associating those individuals with the populations from which they arose. Analytical methods that assign individuals to their putative population of origin have utility in both basic and applied research, providing information about population-specific life history and habitat use, ecotoxins, pathogen and parasite loads, and many other non-genetic ecological, or phenotypic traits. Although the question is initially directed at the origin of individuals, in most cases the ultimate desire is to investigate the distribution of some trait among populations. Current practice is to assign individuals to a population of origin and study properties of the trait among individuals within population strata as if they constituted independent samples. It seemed that approach might bias population-specific trait inference. In this study we made trait inferences directly through modeling, bypassing individual assignment. We extended a Bayesian model for population mixture analysis to incorporate parameters for the phenotypic trait and compared its performance to that of individual assignment with a minimum probability threshold for assignment. The Bayesian mixture model outperformed individual assignment under some trait inference conditions. However, by discarding individuals whose origins are most uncertain, the individual assignment method provided a less complex analytical technique whose performance may be adequate for some common trait inference problems. Our results provide specific guidance for method selection under various genetic relationships among populations with different trait distributions. PMID:24905464

  17. iNJclust: Iterative Neighbor-Joining Tree Clustering Framework for Inferring Population Structure.

    PubMed

    Limpiti, Tulaya; Amornbunchornvej, Chainarong; Intarapanich, Apichart; Assawamakin, Anunchai; Tongsima, Sissades

    2014-01-01

    Understanding genetic differences among populations is one of the most important issues in population genetics. Genetic variations, e.g., single nucleotide polymorphisms, are used to characterize commonality and difference of individuals from various populations. This paper presents an efficient graph-based clustering framework which operates iteratively on the Neighbor-Joining (NJ) tree called the iNJclust algorithm. The framework uses well-known genetic measurements, namely the allele-sharing distance, the neighbor-joining tree, and the fixation index. The behavior of the fixation index is utilized in the algorithm's stopping criterion. The algorithm provides an estimated number of populations, individual assignments, and relationships between populations as outputs. The clustering result is reported in the form of a binary tree, whose terminal nodes represent the final inferred populations and the tree structure preserves the genetic relationships among them. The clustering performance and the robustness of the proposed algorithm are tested extensively using simulated and real data sets from bovine, sheep, and human populations. The result indicates that the number of populations within each data set is reasonably estimated, the individual assignment is robust, and the structure of the inferred population tree corresponds to the intrinsic relationships among populations within the data.

  18. Diagnostic test accuracy and prevalence inferences based on joint and sequential testing with finite population sampling.

    PubMed

    Su, Chun-Lung; Gardner, Ian A; Johnson, Wesley O

    2004-07-30

    The two-test two-population model, originally formulated by Hui and Walter, for estimation of test accuracy and prevalence estimation assumes conditionally independent tests, constant accuracy across populations and binomial sampling. The binomial assumption is incorrect if all individuals in a population e.g. child-care centre, village in Africa, or a cattle herd are sampled or if the sample size is large relative to population size. In this paper, we develop statistical methods for evaluating diagnostic test accuracy and prevalence estimation based on finite sample data in the absence of a gold standard. Moreover, two tests are often applied simultaneously for the purpose of obtaining a 'joint' testing strategy that has either higher overall sensitivity or specificity than either of the two tests considered singly. Sequential versions of such strategies are often applied in order to reduce the cost of testing. We thus discuss joint (simultaneous and sequential) testing strategies and inference for them. Using the developed methods, we analyse two real and one simulated data sets, and we compare 'hypergeometric' and 'binomial-based' inferences. Our findings indicate that the posterior standard deviations for prevalence (but not sensitivity and specificity) based on finite population sampling tend to be smaller than their counterparts for infinite population sampling. Finally, we make recommendations about how small the sample size should be relative to the population size to warrant use of the binomial model for prevalence estimation. Copyright 2004 John Wiley & Sons, Ltd.

  19. Inference and Analysis of Population Structure Using Genetic Data and Network Theory

    PubMed Central

    Greenbaum, Gili; Templeton, Alan R.; Bar-David, Shirli

    2016-01-01

    Clustering individuals to subpopulations based on genetic data has become commonplace in many genetic studies. Inference about population structure is most often done by applying model-based approaches, aided by visualization using distance-based approaches such as multidimensional scaling. While existing distance-based approaches suffer from a lack of statistical rigor, model-based approaches entail assumptions of prior conditions such as that the subpopulations are at Hardy-Weinberg equilibria. Here we present a distance-based approach for inference about population structure using genetic data by defining population structure using network theory terminology and methods. A network is constructed from a pairwise genetic-similarity matrix of all sampled individuals. The community partition, a partition of a network to dense subgraphs, is equated with population structure, a partition of the population to genetically related groups. Community-detection algorithms are used to partition the network into communities, interpreted as a partition of the population to subpopulations. The statistical significance of the structure can be estimated by using permutation tests to evaluate the significance of the partition’s modularity, a network theory measure indicating the quality of community partitions. To further characterize population structure, a new measure of the strength of association (SA) for an individual to its assigned community is presented. The strength of association distribution (SAD) of the communities is analyzed to provide additional population structure characteristics, such as the relative amount of gene flow experienced by the different subpopulations and identification of hybrid individuals. Human genetic data and simulations are used to demonstrate the applicability of the analyses. The approach presented here provides a novel, computationally efficient model-free method for inference about population structure that does not entail assumption of

  20. Inference and Analysis of Population Structure Using Genetic Data and Network Theory.

    PubMed

    Greenbaum, Gili; Templeton, Alan R; Bar-David, Shirli

    2016-04-01

    Clustering individuals to subpopulations based on genetic data has become commonplace in many genetic studies. Inference about population structure is most often done by applying model-based approaches, aided by visualization using distance-based approaches such as multidimensional scaling. While existing distance-based approaches suffer from a lack of statistical rigor, model-based approaches entail assumptions of prior conditions such as that the subpopulations are at Hardy-Weinberg equilibria. Here we present a distance-based approach for inference about population structure using genetic data by defining population structure using network theory terminology and methods. A network is constructed from a pairwise genetic-similarity matrix of all sampled individuals. The community partition, a partition of a network to dense subgraphs, is equated with population structure, a partition of the population to genetically related groups. Community-detection algorithms are used to partition the network into communities, interpreted as a partition of the population to subpopulations. The statistical significance of the structure can be estimated by using permutation tests to evaluate the significance of the partition's modularity, a network theory measure indicating the quality of community partitions. To further characterize population structure, a new measure of the strength of association (SA) for an individual to its assigned community is presented. The strength of association distribution (SAD) of the communities is analyzed to provide additional population structure characteristics, such as the relative amount of gene flow experienced by the different subpopulations and identification of hybrid individuals. Human genetic data and simulations are used to demonstrate the applicability of the analyses. The approach presented here provides a novel, computationally efficient model-free method for inference about population structure that does not entail assumption of

  1. Expectation propagation for large scale Bayesian inference of non-linear molecular networks from perturbation data.

    PubMed

    Narimani, Zahra; Beigy, Hamid; Ahmad, Ashar; Masoudi-Nejad, Ali; Fröhlich, Holger

    2017-01-01

    Inferring the structure of molecular networks from time series protein or gene expression data provides valuable information about the complex biological processes of the cell. Causal network structure inference has been approached using different methods in the past. Most causal network inference techniques, such as Dynamic Bayesian Networks and ordinary differential equations, are limited by their computational complexity and thus make large scale inference infeasible. This is specifically true if a Bayesian framework is applied in order to deal with the unavoidable uncertainty about the correct model. We devise a novel Bayesian network reverse engineering approach using ordinary differential equations with the ability to include non-linearity. Besides modeling arbitrary, possibly combinatorial and time dependent perturbations with unknown targets, one of our main contributions is the use of Expectation Propagation, an algorithm for approximate Bayesian inference over large scale network structures in short computation time. We further explore the possibility of integrating prior knowledge into network inference. We evaluate the proposed model on DREAM4 and DREAM8 data and find it competitive against several state-of-the-art existing network inference methods.

  2. Integrative inference of population history in the Ibero-Maghrebian endemic Pleurodeles waltl (Salamandridae).

    PubMed

    Gutiérrez-Rodríguez, Jorge; Barbosa, A Márcia; Martínez-Solano, Íñigo

    2017-07-01

    Inference of population histories from the molecular signatures of past demographic processes is challenging, but recent methodological advances in species distribution models and their integration in time-calibrated phylogeographic studies allow detailed reconstruction of complex biogeographic scenarios. We apply an integrative approach to infer the evolutionary history of the Iberian ribbed newt (Pleurodeles waltl), an Ibero-Maghrebian endemic with populations north and south of the Strait of Gibraltar. We analyzed an extensive multilocus dataset (mitochondrial and nuclear DNA sequences and ten polymorphic microsatellite loci) and found a deep east-west phylogeographic break in Iberian populations dating back to the Plio-Pleistocene. This break is inferred to result from vicariance associated with the formation of the Guadalquivir river basin. In contrast with previous studies, North African populations showed exclusive mtDNA haplotypes, and formed a monophyletic clade within the Eastern Iberian lineage in the mtDNA genealogy. On the other hand, microsatellites failed to recover Moroccan populations as a differentiated genetic cluster. This is interpreted to result from post-divergence gene flow based on the results of IMA2 and Migrate analyses. Thus, Moroccan populations would have originated after overseas dispersal from the Iberian Peninsula in the Pleistocene, with subsequent gene flow in more recent times, implying at least two trans-marine dispersal events. We modeled the distribution of the species and of each lineage, and projected these models back in time to infer climatically favourable areas during the mid-Holocene, the last glacial maximum (LGM) and the last interglacial (LIG), to reconstruct more recent population dynamics. We found minor differences in climatic favourability across lineages, suggesting intraspecific niche conservatism. Genetic diversity was significantly correlated with the intersection of environmental favourability in the LIG and

  3. Inference of Evolutionary Jumps in Large Phylogenies using Lévy Processes

    PubMed Central

    Duchen, Pablo; Leuenberger, Christoph; Szilágyi, Sándor M.; Harmon, Luke; Eastman, Jonathan; Schweizer, Manuel

    2017-01-01

    Abstract Although it is now widely accepted that the rate of phenotypic evolution may not necessarily be constant across large phylogenies, the frequency and phylogenetic position of periods of rapid evolution remain unclear. In his highly influential view of evolution, G. G. Simpson supposed that such evolutionary jumps occur when organisms transition into so-called new adaptive zones, for instance after dispersal into a new geographic area, after rapid climatic changes, or following the appearance of an evolutionary novelty. Only recently, large, accurate and well calibrated phylogenies have become available that allow testing this hypothesis directly, yet inferring evolutionary jumps remains computationally very challenging. Here, we develop a computationally highly efficient algorithm to accurately infer the rate and strength of evolutionary jumps as well as their phylogenetic location. Following previous work we model evolutionary jumps as a compound process, but introduce a novel approach to sample jump configurations that does not require matrix inversions and thus naturally scales to large trees. We then make use of this development to infer evolutionary jumps in Anolis lizards and Loriinii parrots where we find strong signal for such jumps at the basis of clades that transitioned into new adaptive zones, just as postulated by Simpson’s hypothesis. [evolutionary jump; Lévy process; phenotypic evolution; punctuated equilibrium; quantitative traits. PMID:28204787

  4. Inference of Evolutionary Jumps in Large Phylogenies using Lévy Processes.

    PubMed

    Duchen, Pablo; Leuenberger, Christoph; Szilágyi, Sándor M; Harmon, Luke; Eastman, Jonathan; Schweizer, Manuel; Wegmann, Daniel

    2017-11-01

    Although it is now widely accepted that the rate of phenotypic evolution may not necessarily be constant across large phylogenies, the frequency and phylogenetic position of periods of rapid evolution remain unclear. In his highly influential view of evolution, G. G. Simpson supposed that such evolutionary jumps occur when organisms transition into so-called new adaptive zones, for instance after dispersal into a new geographic area, after rapid climatic changes, or following the appearance of an evolutionary novelty. Only recently, large, accurate and well calibrated phylogenies have become available that allow testing this hypothesis directly, yet inferring evolutionary jumps remains computationally very challenging. Here, we develop a computationally highly efficient algorithm to accurately infer the rate and strength of evolutionary jumps as well as their phylogenetic location. Following previous work we model evolutionary jumps as a compound process, but introduce a novel approach to sample jump configurations that does not require matrix inversions and thus naturally scales to large trees. We then make use of this development to infer evolutionary jumps in Anolis lizards and Loriinii parrots where we find strong signal for such jumps at the basis of clades that transitioned into new adaptive zones, just as postulated by Simpson's hypothesis. [evolutionary jump; Lévy process; phenotypic evolution; punctuated equilibrium; quantitative traits. The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

  5. Copy-number analysis and inference of subclonal populations in cancer genomes using Sclust.

    PubMed

    Cun, Yupeng; Yang, Tsun-Po; Achter, Viktor; Lang, Ulrich; Peifer, Martin

    2018-06-01

    The genomes of cancer cells constantly change during pathogenesis. This evolutionary process can lead to the emergence of drug-resistant mutations in subclonal populations, which can hinder therapeutic intervention in patients. Data derived from massively parallel sequencing can be used to infer these subclonal populations using tumor-specific point mutations. The accurate determination of copy-number changes and tumor impurity is necessary to reliably infer subclonal populations by mutational clustering. This protocol describes how to use Sclust, a copy-number analysis method with a recently developed mutational clustering approach. In a series of simulations and comparisons with alternative methods, we have previously shown that Sclust accurately determines copy-number states and subclonal populations. Performance tests show that the method is computationally efficient, with copy-number analysis and mutational clustering taking <10 min. Sclust is designed such that even non-experts in computational biology or bioinformatics with basic knowledge of the Linux/Unix command-line syntax should be able to carry out analyses of subclonal populations.

  6. Molecular hyperdiversity and evolution in very large populations.

    PubMed

    Cutter, Asher D; Jovelin, Richard; Dey, Alivia

    2013-04-01

    The genomic density of sequence polymorphisms critically affects the sensitivity of inferences about ongoing sequence evolution, function and demographic history. Most animal and plant genomes have relatively low densities of polymorphisms, but some species are hyperdiverse with neutral nucleotide heterozygosity exceeding 5%. Eukaryotes with extremely large populations, mimicking bacterial and viral populations, present novel opportunities for studying molecular evolution in sexually reproducing taxa with complex development. In particular, hyperdiverse species can help answer controversial questions about the evolution of genome complexity, the limits of natural selection, modes of adaptation and subtleties of the mutation process. However, such systems have some inherent complications and here we identify topics in need of theoretical developments. Close relatives of the model organisms Caenorhabditis elegans and Drosophila melanogaster provide known examples of hyperdiverse eukaryotes, encouraging functional dissection of resulting molecular evolutionary patterns. We recommend how best to exploit hyperdiverse populations for analysis, for example, in quantifying the impact of noncrossover recombination in genomes and for determining the identity and micro-evolutionary selective pressures on noncoding regulatory elements. © 2013 Blackwell Publishing Ltd.

  7. SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies

    PubMed Central

    Bouaziz, Matthieu; Paccard, Caroline; Guedj, Mickael; Ambroise, Christophe

    2012-01-01

    Inferring the structure of populations has many applications for genetic research. In addition to providing information for evolutionary studies, it can be used to account for the bias induced by population stratification in association studies. To this end, many algorithms have been proposed to cluster individuals into genetically homogeneous sub-populations. The parametric algorithms, such as Structure, are very popular but their underlying complexity and their high computational cost led to the development of faster parametric alternatives such as Admixture. Alternatives to these methods are the non-parametric approaches. Among this category, AWclust has proven efficient but fails to properly identify population structure for complex datasets. We present in this article a new clustering algorithm called Spectral Hierarchical clustering for the Inference of Population Structure (SHIPS), based on a divisive hierarchical clustering strategy, allowing a progressive investigation of population structure. This method takes genetic data as input to cluster individuals into homogeneous sub-populations and with the use of the gap statistic estimates the optimal number of such sub-populations. SHIPS was applied to a set of simulated discrete and admixed datasets and to real SNP datasets, that are data from the HapMap and Pan-Asian SNP consortium. The programs Structure, Admixture, AWclust and PCAclust were also investigated in a comparison study. SHIPS and the parametric approach Structure were the most accurate when applied to simulated datasets both in terms of individual assignments and estimation of the correct number of clusters. The analysis of the results on the real datasets highlighted that the clusterings of SHIPS were the more consistent with the population labels or those produced by the Admixture program. The performances of SHIPS when applied to SNP data, along with its relatively low computational cost and its ease of use make this method a promising

  8. Inferring the demographic history of European Ficedula flycatcher populations

    PubMed Central

    2013-01-01

    Background Inference of population and species histories and population stratification using genetic data is important for discriminating between different speciation scenarios and for correct interpretation of genome scans for signs of adaptive evolution and trait association. Here we use data from 24 intronic loci re-sequenced in population samples of two closely related species, the pied flycatcher and the collared flycatcher. Results We applied Isolation-Migration models, assignment analyses and estimated the genetic differentiation and diversity between species and between populations within species. The data indicate a divergence time between the species of <1 million years, significantly shorter than previous estimates using mtDNA, point to a scenario with unidirectional gene-flow from the pied flycatcher into the collared flycatcher and imply that barriers to hybridisation are still permeable in a recently established hybrid zone. Furthermore, we detect significant population stratification, predominantly between the Spanish population and other pied flycatcher populations. Conclusions Our results provide further evidence for a divergence process where different genomic regions may be at different stages of speciation. We also conclude that forthcoming analyses of genotype-phenotype relations in these ecological model species should be designed to take population stratification into account. PMID:23282063

  9. PyClone: statistical inference of clonal population structure in cancer.

    PubMed

    Roth, Andrew; Khattra, Jaswinder; Yap, Damian; Wan, Adrian; Laks, Emma; Biele, Justina; Ha, Gavin; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P

    2014-04-01

    We introduce PyClone, a statistical model for inference of clonal population structures in cancers. PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy-number changes and normal-cell contamination. Single-cell sequencing validation demonstrates PyClone's accuracy.

  10. Fine-scale population dynamics in a marine fish species inferred from dynamic state-space models.

    PubMed

    Rogers, Lauren A; Storvik, Geir O; Knutsen, Halvor; Olsen, Esben M; Stenseth, Nils C

    2017-07-01

    Identifying the spatial scale of population structuring is critical for the conservation of natural populations and for drawing accurate ecological inferences. However, population studies often use spatially aggregated data to draw inferences about population trends and drivers, potentially masking ecologically relevant population sub-structure and dynamics. The goals of this study were to investigate how population dynamics models with and without spatial structure affect inferences on population trends and the identification of intrinsic drivers of population dynamics (e.g. density dependence). Specifically, we developed dynamic, age-structured, state-space models to test different hypotheses regarding the spatial structure of a population complex of coastal Atlantic cod (Gadus morhua). Data were from a 93-year survey of juvenile (age 0 and 1) cod sampled along >200 km of the Norwegian Skagerrak coast. We compared two models: one which assumes all sampled cod belong to one larger population, and a second which assumes that each fjord contains a unique population with locally determined dynamics. Using the best supported model, we then reconstructed the historical spatial and temporal dynamics of Skagerrak coastal cod. Cross-validation showed that the spatially structured model with local dynamics had better predictive ability. Furthermore, posterior predictive checks showed that a model which assumes one homogeneous population failed to capture the spatial correlation pattern present in the survey data. The spatially structured model indicated that population trends differed markedly among fjords, as did estimates of population parameters including density-dependent survival. Recent biomass was estimated to be at a near-record low all along the coast, but the finer scale model indicated that the decline occurred at different times in different regions. Warm temperatures were associated with poor recruitment, but local changes in habitat and fishing pressure may

  11. Improving inferences in population studies of rare species that are detected imperfectly

    USGS Publications Warehouse

    MacKenzie, D.I.; Nichols, J.D.; Sutton, N.; Kawanishi, K.; Bailey, L.L.

    2005-01-01

    For the vast majority of cases, it is highly unlikely that all the individuals of a population will be encountered during a study. Furthermore, it is unlikely that a constant fraction of the population is encountered over times, locations, or species to be compared. Hence, simple counts usually will not be good indices of population size. We recommend that detection probabilities (the probability of including an individual in a count) be estimated and incorporated into inference procedures. However, most techniques for estimating detection probability require moderate sample sizes, which may not be achievable when studying rare species. In order to improve the reliability of inferences from studies of rare species, we suggest two general approaches that researchers may wish to consider that incorporate the concept of imperfect detectability: (1) borrowing information about detectability or the other quantities of interest from other times, places, or species; and (2) using state variables other than abundance (e.g., species richness and occupancy). We illustrate these suggestions with examples and discuss the relative benefits and drawbacks of each approach.

  12. Explaining Inference on a Population of Independent Agents Using Bayesian Networks

    ERIC Educational Resources Information Center

    Sutovsky, Peter

    2013-01-01

    The main goal of this research is to design, implement, and evaluate a novel explanation method, the hierarchical explanation method (HEM), for explaining Bayesian network (BN) inference when the network is modeling a population of conditionally independent agents, each of which is modeled as a subnetwork. For example, consider disease-outbreak…

  13. Sample Size and Correlational Inference

    ERIC Educational Resources Information Center

    Anderson, Richard B.; Doherty, Michael E.; Friedrich, Jeff C.

    2008-01-01

    In 4 studies, the authors examined the hypothesis that the structure of the informational environment makes small samples more informative than large ones for drawing inferences about population correlations. The specific purpose of the studies was to test predictions arising from the signal detection simulations of R. B. Anderson, M. E. Doherty,…

  14. Hierarchical modeling and inference in ecology: The analysis of data from populations, metapopulations and communities

    USGS Publications Warehouse

    Royle, J. Andrew; Dorazio, Robert M.

    2008-01-01

    A guide to data collection, modeling and inference strategies for biological survey data using Bayesian and classical statistical methods. This book describes a general and flexible framework for modeling and inference in ecological systems based on hierarchical models, with a strict focus on the use of probability models and parametric inference. Hierarchical models represent a paradigm shift in the application of statistics to ecological inference problems because they combine explicit models of ecological system structure or dynamics with models of how ecological systems are observed. The principles of hierarchical modeling are developed and applied to problems in population, metapopulation, community, and metacommunity systems. The book provides the first synthetic treatment of many recent methodological advances in ecological modeling and unifies disparate methods and procedures. The authors apply principles of hierarchical modeling to ecological problems, including * occurrence or occupancy models for estimating species distribution * abundance models based on many sampling protocols, including distance sampling * capture-recapture models with individual effects * spatial capture-recapture models based on camera trapping and related methods * population and metapopulation dynamic models * models of biodiversity, community structure and dynamics.

  15. Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories.

    PubMed

    Chockalingam, Sriram; Aluru, Maneesha; Aluru, Srinivas

    2016-09-19

    Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp.

  16. TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data

    PubMed Central

    Roth, Andrew; Khattra, Jaswinder; Ho, Julie; Yap, Damian; Prentice, Leah M.; Melnyk, Nataliya; McPherson, Andrew; Bashashati, Ali; Laks, Emma; Biele, Justina; Ding, Jiarui; Le, Alan; Rosner, Jamie; Shumansky, Karey; Marra, Marco A.; Gilks, C. Blake; Huntsman, David G.; McAlpine, Jessica N.; Aparicio, Samuel

    2014-01-01

    The evolution of cancer genomes within a single tumor creates mixed cell populations with divergent somatic mutational landscapes. Inference of tumor subpopulations has been disproportionately focused on the assessment of somatic point mutations, whereas computational methods targeting evolutionary dynamics of copy number alterations (CNA) and loss of heterozygosity (LOH) in whole-genome sequencing data remain underdeveloped. We present a novel probabilistic model, TITAN, to infer CNA and LOH events while accounting for mixtures of cell populations, thereby estimating the proportion of cells harboring each event. We evaluate TITAN on idealized mixtures, simulating clonal populations from whole-genome sequences taken from genomically heterogeneous ovarian tumor sites collected from the same patient. In addition, we show in 23 whole genomes of breast tumors that the inference of CNA and LOH using TITAN critically informs population structure and the nature of the evolving cancer genome. Finally, we experimentally validated subclonal predictions using fluorescence in situ hybridization (FISH) and single-cell sequencing from an ovarian cancer patient sample, thereby recapitulating the key modeling assumptions of TITAN. PMID:25060187

  17. Causal Inferences with Large Scale Assessment Data: Using a Validity Framework

    ERIC Educational Resources Information Center

    Rutkowski, David; Delandshere, Ginette

    2016-01-01

    To answer the calls for stronger evidence by the policy community, educational researchers and their associated organizations increasingly demand more studies that can yield causal inferences. International large scale assessments (ILSAs) have been targeted as a rich data sources for causal research. It is in this context that we take up a…

  18. TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data.

    PubMed

    Ha, Gavin; Roth, Andrew; Khattra, Jaswinder; Ho, Julie; Yap, Damian; Prentice, Leah M; Melnyk, Nataliya; McPherson, Andrew; Bashashati, Ali; Laks, Emma; Biele, Justina; Ding, Jiarui; Le, Alan; Rosner, Jamie; Shumansky, Karey; Marra, Marco A; Gilks, C Blake; Huntsman, David G; McAlpine, Jessica N; Aparicio, Samuel; Shah, Sohrab P

    2014-11-01

    The evolution of cancer genomes within a single tumor creates mixed cell populations with divergent somatic mutational landscapes. Inference of tumor subpopulations has been disproportionately focused on the assessment of somatic point mutations, whereas computational methods targeting evolutionary dynamics of copy number alterations (CNA) and loss of heterozygosity (LOH) in whole-genome sequencing data remain underdeveloped. We present a novel probabilistic model, TITAN, to infer CNA and LOH events while accounting for mixtures of cell populations, thereby estimating the proportion of cells harboring each event. We evaluate TITAN on idealized mixtures, simulating clonal populations from whole-genome sequences taken from genomically heterogeneous ovarian tumor sites collected from the same patient. In addition, we show in 23 whole genomes of breast tumors that the inference of CNA and LOH using TITAN critically informs population structure and the nature of the evolving cancer genome. Finally, we experimentally validated subclonal predictions using fluorescence in situ hybridization (FISH) and single-cell sequencing from an ovarian cancer patient sample, thereby recapitulating the key modeling assumptions of TITAN. © 2014 Ha et al.; Published by Cold Spring Harbor Laboratory Press.

  19. Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hero, Alfred O.; Rajaratnam, Bala

    When can reliable inference be drawn in the ‘‘Big Data’’ context? This article presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large-scale inference. In large-scale data applications like genomics, connectomics, and eco-informatics, the data set is often variable rich but sample starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than the number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for ‘‘Big Data.’’ Sample complexity, however, hasmore » received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; and 3) the purely high-dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high-dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. We demonstrate various regimes of correlation mining based on the unifying perspective of high-dimensional learning rates and sample complexity for different structured covariance models and different

  20. Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining

    PubMed Central

    Hero, Alfred O.; Rajaratnam, Bala

    2015-01-01

    When can reliable inference be drawn in fue “Big Data” context? This paper presents a framework for answering this fundamental question in the context of correlation mining, wifu implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics fue dataset is often variable-rich but sample-starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than fue number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for “Big Data”. Sample complexity however has received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address fuis gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where fue variable dimension is fixed and fue sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa cale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables fua t are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. we demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks. PMID:27087700

  1. Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining

    DOE PAGES

    Hero, Alfred O.; Rajaratnam, Bala

    2015-12-09

    When can reliable inference be drawn in the ‘‘Big Data’’ context? This article presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large-scale inference. In large-scale data applications like genomics, connectomics, and eco-informatics, the data set is often variable rich but sample starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than the number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for ‘‘Big Data.’’ Sample complexity, however, hasmore » received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; and 3) the purely high-dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high-dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. We demonstrate various regimes of correlation mining based on the unifying perspective of high-dimensional learning rates and sample complexity for different structured covariance models and different

  2. Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining.

    PubMed

    Hero, Alfred O; Rajaratnam, Bala

    2016-01-01

    When can reliable inference be drawn in fue "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, wifu implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics fue dataset is often variable-rich but sample-starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than fue number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data". Sample complexity however has received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address fuis gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where fue variable dimension is fixed and fue sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa cale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables fua t are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. we demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks.

  3. A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks

    PubMed Central

    Yin, Junming; Ho, Qirong; Xing, Eric P.

    2014-01-01

    We propose a scalable approach for making inference about latent spaces of large networks. With a succinct representation of networks as a bag of triangular motifs, a parsimonious statistical model, and an efficient stochastic variational inference algorithm, we are able to analyze real networks with over a million vertices and hundreds of latent roles on a single machine in a matter of hours, a setting that is out of reach for many existing methods. When compared to the state-of-the-art probabilistic approaches, our method is several orders of magnitude faster, with competitive or improved accuracy for latent space recovery and link prediction. PMID:25400487

  4. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness.

    PubMed

    Conomos, Matthew P; Miller, Michael B; Thornton, Timothy A

    2015-05-01

    Population structure inference with genetic data has been motivated by a variety of applications in population genetics and genetic association studies. Several approaches have been proposed for the identification of genetic ancestry differences in samples where study participants are assumed to be unrelated, including principal components analysis (PCA), multidimensional scaling (MDS), and model-based methods for proportional ancestry estimation. Many genetic studies, however, include individuals with some degree of relatedness, and existing methods for inferring genetic ancestry fail in related samples. We present a method, PC-AiR, for robust population structure inference in the presence of known or cryptic relatedness. PC-AiR utilizes genome-screen data and an efficient algorithm to identify a diverse subset of unrelated individuals that is representative of all ancestries in the sample. The PC-AiR method directly performs PCA on the identified ancestry representative subset and then predicts components of variation for all remaining individuals based on genetic similarities. In simulation studies and in applications to real data from Phase III of the HapMap Project, we demonstrate that PC-AiR provides a substantial improvement over existing approaches for population structure inference in related samples. We also demonstrate significant efficiency gains, where a single axis of variation from PC-AiR provides better prediction of ancestry in a variety of structure settings than using 10 (or more) components of variation from widely used PCA and MDS approaches. Finally, we illustrate that PC-AiR can provide improved population stratification correction over existing methods in genetic association studies with population structure and relatedness. © 2015 WILEY PERIODICALS, INC.

  5. Inference of Population Structure using Dense Haplotype Data

    PubMed Central

    Lawson, Daniel John; Hellenthal, Garrett

    2012-01-01

    The advent of genome-wide dense variation data provides an opportunity to investigate ancestry in unprecedented detail, but presents new statistical challenges. We propose a novel inference framework that aims to efficiently capture information on population structure provided by patterns of haplotype similarity. Each individual in a sample is considered in turn as a recipient, whose chromosomes are reconstructed using chunks of DNA donated by the other individuals. Results of this “chromosome painting” can be summarized as a “coancestry matrix,” which directly reveals key information about ancestral relationships among individuals. If markers are viewed as independent, we show that this matrix almost completely captures the information used by both standard Principal Components Analysis (PCA) and model-based approaches such as STRUCTURE in a unified manner. Furthermore, when markers are in linkage disequilibrium, the matrix combines information across successive markers to increase the ability to discern fine-scale population structure using PCA. In parallel, we have developed an efficient model-based approach to identify discrete populations using this matrix, which offers advantages over PCA in terms of interpretability and over existing clustering algorithms in terms of speed, number of separable populations, and sensitivity to subtle population structure. We analyse Human Genome Diversity Panel data for 938 individuals and 641,000 markers, and we identify 226 populations reflecting differences on continental, regional, local, and family scales. We present multiple lines of evidence that, while many methods capture similar information among strongly differentiated groups, more subtle population structure in human populations is consistently present at a much finer level than currently available geographic labels and is only captured by the haplotype-based approach. The software used for this article, ChromoPainter and fineSTRUCTURE, is available from

  6. Inferring human population size and separation history from multiple genome sequences.

    PubMed

    Schiffels, Stephan; Durbin, Richard

    2014-08-01

    The availability of complete human genome sequences from populations across the world has given rise to new population genetic inference methods that explicitly model ancestral relationships under recombination and mutation. So far, application of these methods to evolutionary history more recent than 20,000-30,000 years ago and to population separations has been limited. Here we present a new method that overcomes these shortcomings. The multiple sequentially Markovian coalescent (MSMC) analyzes the observed pattern of mutations in multiple individuals, focusing on the first coalescence between any two individuals. Results from applying MSMC to genome sequences from nine populations across the world suggest that the genetic separation of non-African ancestors from African Yoruban ancestors started long before 50,000 years ago and give information about human population history as recent as 2,000 years ago, including the bottleneck in the peopling of the Americas and separations within Africa, East Asia and Europe.

  7. Thinking too positive? Revisiting current methods of population genetic selection inference.

    PubMed

    Bank, Claudia; Ewing, Gregory B; Ferrer-Admettla, Anna; Foll, Matthieu; Jensen, Jeffrey D

    2014-12-01

    In the age of next-generation sequencing, the availability of increasing amounts and improved quality of data at decreasing cost ought to allow for a better understanding of how natural selection is shaping the genome than ever before. However, alternative forces, such as demography and background selection (BGS), obscure the footprints of positive selection that we would like to identify. In this review, we illustrate recent developments in this area, and outline a roadmap for improved selection inference. We argue (i) that the development and obligatory use of advanced simulation tools is necessary for improved identification of selected loci, (ii) that genomic information from multiple time points will enhance the power of inference, and (iii) that results from experimental evolution should be utilized to better inform population genomic studies. Copyright © 2014 Elsevier Ltd. All rights reserved.

  8. Southeast Asian origins of five Hill Tribe populations and correlation of genetic to linguistic relationships inferred with genome-wide SNP data

    PubMed Central

    Listman, JB; Malison, RT; Sanichwankul, K; Ittiwut, C; Mutirangura, A; Gelernter, J

    2010-01-01

    In Thailand, the term Hill Tribe is used to describe populations whose members traditionally practice slash and burn agriculture and reside in the mountains. These tribes are thought to have migrated throughout Asia for up to 5,000 years, including migrations through Southern China and/or Southeast Asia. There have been continuous migrations southward from China into Thailand for approximately the past thousand years and the present geographic range of any given tribe straddles multiple political borders. As none of these populations have autochthonous scripts, written histories have until recently, been externally produced. Northern Asian, Tibetan, and Siberian origins of Hill Tribes have been proposed. All purport endogamy and have non-mutually intelligible languages. In order to test hypotheses regarding the geographic origins of these populations, relatedness and migrations among them and neighboring populations, and whether their genetic relationships correspond with their linguistic relationships, we analyzed 2445 genome-wide SNP markers in 118 individuals from five Thai Hill Tribe populations (Akha, Hmong, Karen, Lahu, and Lisu), 90 individuals from majority Thai populations, and 826 individuals from Asian and Oceanean HGDP and HapMap populations using a Bayesian clustering method. Considering these results within the context of results of recent large-scale studies of Asian geographic genetic variation allows us to infer a shared Southeast Asian origin of these five Hill Tribe populations as well ancestry components that distinguish among them seen in successive levels of clustering. In addition, the inferred level of shared ancestry among the Hill Tribes corresponds well to relationships among their languages. PMID:20979205

  9. Southeast Asian origins of five Hill Tribe populations and correlation of genetic to linguistic relationships inferred with genome-wide SNP data.

    PubMed

    Listman, J B; Malison, R T; Sanichwankul, K; Ittiwut, C; Mutirangura, A; Gelernter, J

    2011-02-01

    In Thailand, the term Hill Tribe is used to describe populations whose members traditionally practice slash and burn agriculture and reside in the mountains. These tribes are thought to have migrated throughout Asia for up to 5,000 years, including migrations through Southern China and/or Southeast Asia. There have been continuous migrations southward from China into Thailand for approximately the past thousand years and the present geographic range of any given tribe straddles multiple political borders. As none of these populations have autochthonous scripts, written histories have until recently, been externally produced. Northern Asian, Tibetan, and Siberian origins of Hill Tribes have been proposed. All purport endogamy and have nonmutually intelligible languages. To test hypotheses regarding the geographic origins of these populations, relatedness and migrations among them and neighboring populations, and whether their genetic relationships correspond with their linguistic relationships, we analyzed 2,445 genome-wide SNP markers in 118 individuals from five Thai Hill Tribe populations (Akha, Hmong, Karen, Lahu, and Lisu), 90 individuals from majority Thai populations, and 826 individuals from Asian and Oceanean HGDP and HapMap populations using a Bayesian clustering method. Considering these results within the context of results ofrecent large-scale studies of Asian geographic genetic variation allows us to infer a shared Southeast Asian origin of these five Hill Tribe populations as well ancestry components that distinguish among them seen in successive levels of clustering. In addition, the inferred level of shared ancestry among the Hill Tribes corresponds well to relationships among their languages. 2010 Wiley-Liss, Inc.

  10. Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation.

    PubMed

    Cornuet, Jean-Marie; Santos, Filipe; Beaumont, Mark A; Robert, Christian P; Marin, Jean-Michel; Balding, David J; Guillemaud, Thomas; Estoup, Arnaud

    2008-12-01

    Genetic data obtained on population samples convey information about their evolutionary history. Inference methods can extract part of this information but they require sophisticated statistical techniques that have been made available to the biologist community (through computer programs) only for simple and standard situations typically involving a small number of samples. We propose here a computer program (DIY ABC) for inference based on approximate Bayesian computation (ABC), in which scenarios can be customized by the user to fit many complex situations involving any number of populations and samples. Such scenarios involve any combination of population divergences, admixtures and population size changes. DIY ABC can be used to compare competing scenarios, estimate parameters for one or more scenarios and compute bias and precision measures for a given scenario and known values of parameters (the current version applies to unlinked microsatellite data). This article describes key methods used in the program and provides its main features. The analysis of one simulated and one real dataset, both with complex evolutionary scenarios, illustrates the main possibilities of DIY ABC. The software DIY ABC is freely available at http://www.montpellier.inra.fr/CBGP/diyabc.

  11. Bayesian Population Genomic Inference of Crossing Over and Gene Conversion

    PubMed Central

    Padhukasahasram, Badri; Rannala, Bruce

    2011-01-01

    Meiotic recombination is a fundamental cellular mechanism in sexually reproducing organisms and its different forms, crossing over and gene conversion both play an important role in shaping genetic variation in populations. Here, we describe a coalescent-based full-likelihood Markov chain Monte Carlo (MCMC) method for jointly estimating the crossing-over, gene-conversion, and mean tract length parameters from population genomic data under a Bayesian framework. Although computationally more expensive than methods that use approximate likelihoods, the relative efficiency of our method is expected to be optimal in theory. Furthermore, it is also possible to obtain a posterior sample of genealogies for the data using this method. We first check the performance of the new method on simulated data and verify its correctness. We also extend the method for inference under models with variable gene-conversion and crossing-over rates and demonstrate its ability to identify recombination hotspots. Then, we apply the method to two empirical data sets that were sequenced in the telomeric regions of the X chromosome of Drosophila melanogaster. Our results indicate that gene conversion occurs more frequently than crossing over in the su-w and su-s gene sequences while the local rates of crossing over as inferred by our program are not low. The mean tract lengths for gene-conversion events are estimated to be ∼70 bp and 430 bp, respectively, for these data sets. Finally, we discuss ideas and optimizations for reducing the execution time of our algorithm. PMID:21840857

  12. [Efficiency of 27-plex single nucleotide polymorphism multiplex system for ancestry inference in different populations].

    PubMed

    Feng, Xing-Ling; Sun, Qi-Fan; Liu, Hong; Wei, Yi-Liang; DU, Wei-An; Li, Cai-Xia; Chen, Ling; Liu, Chao

    2016-04-20

    To validate the efficiency of 27-plex single nucleotide polymorphism (SNP) multiplex system for ancestry inference. The 27-plex SNP system was validated for its sensitivity and species specificity. A total of 533 samples were collected from African, Southern Chinese Han, China's ethic minorities (Yi, Hui, Miao, Tibet, and Uygur), European, Central Asian, Western Asian, Southern Asian, Southeast Asian and South American populations for clustering analysis of the genotypes by citing 3 representative continental ancestral groups [East Asia (CHB), Europe (CEU), and Africa (YRI)] from HapMap database. The system sensitivity is 0.125 ng. Twenty and six genotypes were detected in chimpanzee and monkeys, respectively. Except in rs10496971, no more products were found in other animals. The system was capable of differentiating intercontinental populations but not of distinguishing between East Asian and Southeast Asian population or between Southern Chinese Han population and Chinese Ethnic populations (Hui, Miao, Yi and Tibet). This system achieved a 100% accuracy for intercontinental population source inference for 46 blind test samples. 27-plex SNPs multiplex system has a high sensitivity and species specificity and can correctly differentiate the ancestry origins of individuals from African, European and East Asian for criminal case investigation. But this system is not capable of distinguishing subpopulation groups and more specific ancestry-informative markers are needed to improve its recognition of Southeast Asian and Chinese ethnic populations.

  13. Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation

    PubMed Central

    Cornuet, Jean-Marie; Santos, Filipe; Beaumont, Mark A.; Robert, Christian P.; Marin, Jean-Michel; Balding, David J.; Guillemaud, Thomas; Estoup, Arnaud

    2008-01-01

    Summary: Genetic data obtained on population samples convey information about their evolutionary history. Inference methods can extract part of this information but they require sophisticated statistical techniques that have been made available to the biologist community (through computer programs) only for simple and standard situations typically involving a small number of samples. We propose here a computer program (DIY ABC) for inference based on approximate Bayesian computation (ABC), in which scenarios can be customized by the user to fit many complex situations involving any number of populations and samples. Such scenarios involve any combination of population divergences, admixtures and population size changes. DIY ABC can be used to compare competing scenarios, estimate parameters for one or more scenarios and compute bias and precision measures for a given scenario and known values of parameters (the current version applies to unlinked microsatellite data). This article describes key methods used in the program and provides its main features. The analysis of one simulated and one real dataset, both with complex evolutionary scenarios, illustrates the main possibilities of DIY ABC. Availability: The software DIY ABC is freely available at http://www.montpellier.inra.fr/CBGP/diyabc. Contact: j.cornuet@imperial.ac.uk Supplementary information: Supplementary data are also available at http://www.montpellier.inra.fr/CBGP/diyabc PMID:18842597

  14. Large-scale two-photon imaging revealed super-sparse population codes in the V1 superficial layer of awake monkeys.

    PubMed

    Tang, Shiming; Zhang, Yimeng; Li, Zhihao; Li, Ming; Liu, Fang; Jiang, Hongfei; Lee, Tai Sing

    2018-04-26

    One general principle of sensory information processing is that the brain must optimize efficiency by reducing the number of neurons that process the same information. The sparseness of the sensory representations in a population of neurons reflects the efficiency of the neural code. Here, we employ large-scale two-photon calcium imaging to examine the responses of a large population of neurons within the superficial layers of area V1 with single-cell resolution, while simultaneously presenting a large set of natural visual stimuli, to provide the first direct measure of the population sparseness in awake primates. The results show that only 0.5% of neurons respond strongly to any given natural image - indicating a ten-fold increase in the inferred sparseness over previous measurements. These population activities are nevertheless necessary and sufficient to discriminate visual stimuli with high accuracy, suggesting that the neural code in the primary visual cortex is both super-sparse and highly efficient. © 2018, Tang et al.

  15. Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data

    PubMed Central

    Gutenkunst, Ryan N.; Hernandez, Ryan D.; Williamson, Scott H.; Bustamante, Carlos D.

    2009-01-01

    Demographic models built from genetic data play important roles in illuminating prehistorical events and serving as null models in genome scans for selection. We introduce an inference method based on the joint frequency spectrum of genetic variants within and between populations. For candidate models we numerically compute the expected spectrum using a diffusion approximation to the one-locus, two-allele Wright-Fisher process, involving up to three simultaneous populations. Our approach is a composite likelihood scheme, since linkage between neutral loci alters the variance but not the expectation of the frequency spectrum. We thus use bootstraps incorporating linkage to estimate uncertainties for parameters and significance values for hypothesis tests. Our method can also incorporate selection on single sites, predicting the joint distribution of selected alleles among populations experiencing a bevy of evolutionary forces, including expansions, contractions, migrations, and admixture. We model human expansion out of Africa and the settlement of the New World, using 5 Mb of noncoding DNA resequenced in 68 individuals from 4 populations (YRI, CHB, CEU, and MXL) by the Environmental Genome Project. We infer divergence between West African and Eurasian populations 140 thousand years ago (95% confidence interval: 40–270 kya). This is earlier than other genetic studies, in part because we incorporate migration. We estimate the European (CEU) and East Asian (CHB) divergence time to be 23 kya (95% c.i.: 17–43 kya), long after archeological evidence places modern humans in Europe. Finally, we estimate divergence between East Asians (CHB) and Mexican-Americans (MXL) of 22 kya (95% c.i.: 16.3–26.9 kya), and our analysis yields no evidence for subsequent migration. Furthermore, combining our demographic model with a previously estimated distribution of selective effects among newly arising amino acid mutations accurately predicts the frequency spectrum of

  16. Hierarchical Bayesian inference of the initial mass function in composite stellar populations

    NASA Astrophysics Data System (ADS)

    Dries, M.; Trager, S. C.; Koopmans, L. V. E.; Popping, G.; Somerville, R. S.

    2018-03-01

    The initial mass function (IMF) is a key ingredient in many studies of galaxy formation and evolution. Although the IMF is often assumed to be universal, there is continuing evidence that it is not universal. Spectroscopic studies that derive the IMF of the unresolved stellar populations of a galaxy often assume that this spectrum can be described by a single stellar population (SSP). To alleviate these limitations, in this paper we have developed a unique hierarchical Bayesian framework for modelling composite stellar populations (CSPs). Within this framework, we use a parametrized IMF prior to regulate a direct inference of the IMF. We use this new framework to determine the number of SSPs that is required to fit a set of realistic CSP mock spectra. The CSP mock spectra that we use are based on semi-analytic models and have an IMF that varies as a function of stellar velocity dispersion of the galaxy. Our results suggest that using a single SSP biases the determination of the IMF slope to a higher value than the true slope, although the trend with stellar velocity dispersion is overall recovered. If we include more SSPs in the fit, the Bayesian evidence increases significantly and the inferred IMF slopes of our mock spectra converge, within the errors, to their true values. Most of the bias is already removed by using two SSPs instead of one. We show that we can reconstruct the variable IMF of our mock spectra for signal-to-noise ratios exceeding ˜75.

  17. Hierarchial mark-recapture models: a framework for inference about demographic processes

    USGS Publications Warehouse

    Link, W.A.; Barker, R.J.

    2004-01-01

    The development of sophisticated mark-recapture models over the last four decades has provided fundamental tools for the study of wildlife populations, allowing reliable inference about population sizes and demographic rates based on clearly formulated models for the sampling processes. Mark-recapture models are now routinely described by large numbers of parameters. These large models provide the next challenge to wildlife modelers: the extraction of signal from noise in large collections of parameters. Pattern among parameters can be described by strong, deterministic relations (as in ultrastructural models) but is more flexibly and credibly modeled using weaker, stochastic relations. Trend in survival rates is not likely to be manifest by a sequence of values falling precisely on a given parametric curve; rather, if we could somehow know the true values, we might anticipate a regression relation between parameters and explanatory variables, in which true value equals signal plus noise. Hierarchical models provide a useful framework for inference about collections of related parameters. Instead of regarding parameters as fixed but unknown quantities, we regard them as realizations of stochastic processes governed by hyperparameters. Inference about demographic processes is based on investigation of these hyperparameters. We advocate the Bayesian paradigm as a natural, mathematically and scientifically sound basis for inference about hierarchical models. We describe analysis of capture-recapture data from an open population based on hierarchical extensions of the Cormack-Jolly-Seber model. In addition to recaptures of marked animals, we model first captures of animals and losses on capture, and are thus able to estimate survival probabilities w (i.e., the complement of death or permanent emigration) and per capita growth rates f (i.e., the sum of recruitment and immigration rates). Covariation in these rates, a feature of demographic interest, is explicitly

  18. Inferring population-level contact heterogeneity from common epidemic data

    PubMed Central

    Stack, J. Conrad; Bansal, Shweta; Kumar, V. S. Anil; Grenfell, Bryan

    2013-01-01

    Models of infectious disease spread that incorporate contact heterogeneity through contact networks are an important tool for epidemiologists studying disease dynamics and assessing intervention strategies. One of the challenges of contact network epidemiology has been the difficulty of collecting individual and population-level data needed to develop an accurate representation of the underlying host population's contact structure. In this study, we evaluate the utility of common epidemiological measures (R0, epidemic peak size, duration and final size) for inferring the degree of heterogeneity in a population's unobserved contact structure through a Bayesian approach. We test the method using ground truth data and find that some of these epidemiological metrics are effective at classifying contact heterogeneity. The classification is also consistent across pathogen transmission probabilities, and so can be applied even when this characteristic is unknown. In particular, the reproductive number, R0, turns out to be a poor classifier of the degree heterogeneity, while, unexpectedly, final epidemic size is a powerful predictor of network structure across the range of heterogeneity. We also evaluate our framework on empirical epidemiological data from past and recent outbreaks to demonstrate its application in practice and to gather insights about the relevance of particular contact structures for both specific systems and general classes of infectious disease. We thus introduce a simple approach that can shed light on the unobserved connectivity of a host population given epidemic data. Our study has the potential to inform future data-collection efforts and study design by driving our understanding of germane epidemic measures, and highlights a general inferential approach to learning about host contact structure in contemporary or historic populations of humans and animals. PMID:23034353

  19. Scabies in residential care homes: Modelling, inference and interventions for well-connected population sub-units

    PubMed Central

    Middleton, Jo; Güttel, Stefan; Cassell, Jackie; Ross, Joshua

    2018-01-01

    In the context of an ageing population, understanding the transmission of infectious diseases such as scabies through well-connected sub-units of the population, such as residential care homes, is particularly important for the design of efficient interventions to mitigate against the effects of those diseases. Here, we present a modelling methodology based on the efficient solution of a large-scale system of linear differential equations that allows statistical calibration of individual-based random models to real data on scabies in residential care homes. In particular, we review and benchmark different numerical methods for the integration of the differential equation system, and then select the most appropriate of these methods to perform inference using Markov chain Monte Carlo. We test the goodness-of-fit of this model using posterior predictive intervals and propagate forward the resulting parameter uncertainty in a Bayesian framework to consider the economic cost of delayed interventions against scabies, quantifying the benefits of prompt action in the event of detection. We also revisit the previous methodology used to assess the safety of treatments in small population sub-units—in this context ivermectin—and demonstrate that even a very slight relaxation of the implicit assumption of homogeneous death rates significantly increases the plausibility of the hypothesis that ivermectin does not cause excess mortality based upon the data of Barkwell and Shields. PMID:29579037

  20. Geographic population structure analysis of worldwide human populations infers their biogeographical origins

    PubMed Central

    Elhaik, Eran; Tatarinova, Tatiana; Chebotarev, Dmitri; Piras, Ignazio S.; Maria Calò, Carla; De Montis, Antonella; Atzori, Manuela; Marini, Monica; Tofanelli, Sergio; Francalacci, Paolo; Pagani, Luca; Tyler-Smith, Chris; Xue, Yali; Cucca, Francesco; Schurr, Theodore G.; Gaieski, Jill B.; Melendez, Carlalynne; Vilar, Miguel G.; Owings, Amanda C.; Gómez, Rocío; Fujita, Ricardo; Santos, Fabrício R.; Comas, David; Balanovsky, Oleg; Balanovska, Elena; Zalloua, Pierre; Soodyall, Himla; Pitchappan, Ramasamy; GaneshPrasad, ArunKumar; Hammer, Michael; Matisoo-Smith, Lisa; Wells, R. Spencer; Acosta, Oscar; Adhikarla, Syama; Adler, Christina J.; Bertranpetit, Jaume; Clarke, Andrew C.; Cooper, Alan; Der Sarkissian, Clio S. I.; Haak, Wolfgang; Haber, Marc; Jin, Li; Kaplan, Matthew E.; Li, Hui; Li, Shilin; Martínez-Cruz, Begoña; Merchant, Nirav C.; Mitchell, John R.; Parida, Laxmi; Platt, Daniel E.; Quintana-Murci, Lluis; Renfrew, Colin; Lacerda, Daniela R.; Royyuru, Ajay K.; Sandoval, Jose Raul; Santhakumari, Arun Varatharajan; Soria Hernanz, David F.; Swamikrishnan, Pandikumar; Ziegle, Janet S.

    2014-01-01

    The search for a method that utilizes biological information to predict humans’ place of origin has occupied scientists for millennia. Over the past four decades, scientists have employed genetic data in an effort to achieve this goal but with limited success. While biogeographical algorithms using next-generation sequencing data have achieved an accuracy of 700 km in Europe, they were inaccurate elsewhere. Here we describe the Geographic Population Structure (GPS) algorithm and demonstrate its accuracy with three data sets using 40,000–130,000 SNPs. GPS placed 83% of worldwide individuals in their country of origin. Applied to over 200 Sardinians villagers, GPS placed a quarter of them in their villages and most of the rest within 50 km of their villages. GPS’s accuracy and power to infer the biogeography of worldwide individuals down to their country or, in some cases, village, of origin, underscores the promise of admixture-based methods for biogeography and has ramifications for genetic ancestry testing. PMID:24781250

  1. Multi-agent based control of large-scale complex systems employing distributed dynamic inference engine

    NASA Astrophysics Data System (ADS)

    Zhang, Daili

    Increasing societal demand for automation has led to considerable efforts to control large-scale complex systems, especially in the area of autonomous intelligent control methods. The control system of a large-scale complex system needs to satisfy four system level requirements: robustness, flexibility, reusability, and scalability. Corresponding to the four system level requirements, there arise four major challenges. First, it is difficult to get accurate and complete information. Second, the system may be physically highly distributed. Third, the system evolves very quickly. Fourth, emergent global behaviors of the system can be caused by small disturbances at the component level. The Multi-Agent Based Control (MABC) method as an implementation of distributed intelligent control has been the focus of research since the 1970s, in an effort to solve the above-mentioned problems in controlling large-scale complex systems. However, to the author's best knowledge, all MABC systems for large-scale complex systems with significant uncertainties are problem-specific and thus difficult to extend to other domains or larger systems. This situation is partly due to the control architecture of multiple agents being determined by agent to agent coupling and interaction mechanisms. Therefore, the research objective of this dissertation is to develop a comprehensive, generalized framework for the control system design of general large-scale complex systems with significant uncertainties, with the focus on distributed control architecture design and distributed inference engine design. A Hybrid Multi-Agent Based Control (HyMABC) architecture is proposed by combining hierarchical control architecture and module control architecture with logical replication rings. First, it decomposes a complex system hierarchically; second, it combines the components in the same level as a module, and then designs common interfaces for all of the components in the same module; third, replications

  2. The role of familiarity in binary choice inferences.

    PubMed

    Honda, Hidehito; Abe, Keiga; Matsuka, Toshihiko; Yamagishi, Kimihiko

    2011-07-01

    In research on the recognition heuristic (Goldstein & Gigerenzer, Psychological Review, 109, 75-90, 2002), knowledge of recognized objects has been categorized as "recognized" or "unrecognized" without regard to the degree of familiarity of the recognized object. In the present article, we propose a new inference model--familiarity-based inference. We hypothesize that when subjective knowledge levels (familiarity) of recognized objects differ, the degree of familiarity of recognized objects will influence inferences. Specifically, people are predicted to infer that the more familiar object in a pair of two objects has a higher criterion value on the to-be-judged dimension. In two experiments, using a binary choice task, we examined inferences about populations in a pair of two cities. Results support predictions of familiarity-based inference. Participants inferred that the more familiar city in a pair was more populous. Statistical modeling showed that individual differences in familiarity-based inference lie in the sensitivity to differences in familiarity. In addition, we found that familiarity-based inference can be generally regarded as an ecologically rational inference. Furthermore, when cue knowledge about the inference criterion was available, participants made inferences based on the cue knowledge about population instead of familiarity. Implications of the role of familiarity in psychological processes are discussed.

  3. Unusually large earthquakes inferred from tsunami deposits along the Kuril trench

    USGS Publications Warehouse

    Nanayama, F.; Satake, K.; Furukawa, R.; Shimokawa, K.; Atwater, B.F.; Shigeno, K.; Yamaki, S.

    2003-01-01

    The Pacific plate converges with northeastern Eurasia at a rate of 8-9 m per century along the Kamchatka, Kuril and Japan trenches. Along the southern Kuril trench, which faces the Japanese island of Hokkaido, this fast subduction has recurrently generated earthquakes with magnitudes of up to ???8 over the past two centuries. These historical events, on rupture segments 100-200 km long, have been considered characteristic of Hokkaido's plate-boundary earthquakes. But here we use deposits of prehistoric tsunamis to infer the infrequent occurrence of larger earthquakes generated from longer ruptures. Many of these tsunami deposits form sheets of sand that extend kilometres inland from the deposits of historical tsunamis. Stratigraphic series of extensive sand sheets, intercalated with dated volcanic-ash layers, show that such unusually large tsunamis occurred about every 500 years on average over the past 2,000-7,000 years, most recently ???350 years ago. Numerical simulations of these tsunamis are best explained by earthquakes that individually rupture multiple segments along the southern Kuril trench. We infer that such multi-segment earthquakes persistently recur among a larger number of single-segment events.

  4. Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment.

    PubMed

    Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che

    2014-01-16

    To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high

  5. Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment

    PubMed Central

    2014-01-01

    Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel

  6. Spatially explicit inference for open populations: estimating demographic parameters from camera-trap studies

    USGS Publications Warehouse

    Gardner, Beth; Reppucci, Juan; Lucherini, Mauro; Royle, J. Andrew

    2010-01-01

    We develop a hierarchical capture–recapture model for demographically open populations when auxiliary spatial information about location of capture is obtained. Such spatial capture–recapture data arise from studies based on camera trapping, DNA sampling, and other situations in which a spatial array of devices records encounters of unique individuals. We integrate an individual-based formulation of a Jolly-Seber type model with recently developed spatially explicit capture–recapture models to estimate density and demographic parameters for survival and recruitment. We adopt a Bayesian framework for inference under this model using the method of data augmentation which is implemented in the software program WinBUGS. The model was motivated by a camera trapping study of Pampas cats Leopardus colocolo from Argentina, which we present as an illustration of the model in this paper. We provide estimates of density and the first quantitative assessment of vital rates for the Pampas cat in the High Andes. The precision of these estimates is poor due likely to the sparse data set. Unlike conventional inference methods which usually rely on asymptotic arguments, Bayesian inferences are valid in arbitrary sample sizes, and thus the method is ideal for the study of rare or endangered species for which small data sets are typical.

  7. Spatially explicit inference for open populations: estimating demographic parameters from camera-trap studies.

    PubMed

    Gardner, Beth; Reppucci, Juan; Lucherini, Mauro; Royle, J Andrew

    2010-11-01

    We develop a hierarchical capture-recapture model for demographically open populations when auxiliary spatial information about location of capture is obtained. Such spatial capture-recapture data arise from studies based on camera trapping, DNA sampling, and other situations in which a spatial array of devices records encounters of unique individuals. We integrate an individual-based formulation of a Jolly-Seber type model with recently developed spatially explicit capture-recapture models to estimate density and demographic parameters for survival and recruitment. We adopt a Bayesian framework for inference under this model using the method of data augmentation which is implemented in the software program WinBUGS. The model was motivated by a camera trapping study of Pampas cats Leopardus colocolo from Argentina, which we present as an illustration of the model in this paper. We provide estimates of density and the first quantitative assessment of vital rates for the Pampas cat in the High Andes. The precision of these estimates is poor due likely to the sparse data set. Unlike conventional inference methods which usually rely on asymptotic arguments, Bayesian inferences are valid in arbitrary sample sizes, and thus the method is ideal for the study of rare or endangered species for which small data sets are typical.

  8. Population genetic inference from personal genome data: impact of ancestry and admixture on human genomic variation.

    PubMed

    Kidd, Jeffrey M; Gravel, Simon; Byrnes, Jake; Moreno-Estrada, Andres; Musharoff, Shaila; Bryc, Katarzyna; Degenhardt, Jeremiah D; Brisbin, Abra; Sheth, Vrunda; Chen, Rong; McLaughlin, Stephen F; Peckham, Heather E; Omberg, Larsson; Bormann Chung, Christina A; Stanley, Sarah; Pearlstein, Kevin; Levandowsky, Elizabeth; Acevedo-Acevedo, Suehelay; Auton, Adam; Keinan, Alon; Acuña-Alonzo, Victor; Barquera-Lozano, Rodrigo; Canizales-Quinteros, Samuel; Eng, Celeste; Burchard, Esteban G; Russell, Archie; Reynolds, Andy; Clark, Andrew G; Reese, Martin G; Lincoln, Stephen E; Butte, Atul J; De La Vega, Francisco M; Bustamante, Carlos D

    2012-10-05

    Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from 11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas-70% of the European ancestry in today's African Americans dates back to European gene flow happening only 7-8 generations ago. Copyright © 2012 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  9. Population Genetic Inference from Personal Genome Data: Impact of Ancestry and Admixture on Human Genomic Variation

    PubMed Central

    Kidd, Jeffrey M.; Gravel, Simon; Byrnes, Jake; Moreno-Estrada, Andres; Musharoff, Shaila; Bryc, Katarzyna; Degenhardt, Jeremiah D.; Brisbin, Abra; Sheth, Vrunda; Chen, Rong; McLaughlin, Stephen F.; Peckham, Heather E.; Omberg, Larsson; Bormann Chung, Christina A.; Stanley, Sarah; Pearlstein, Kevin; Levandowsky, Elizabeth; Acevedo-Acevedo, Suehelay; Auton, Adam; Keinan, Alon; Acuña-Alonzo, Victor; Barquera-Lozano, Rodrigo; Canizales-Quinteros, Samuel; Eng, Celeste; Burchard, Esteban G.; Russell, Archie; Reynolds, Andy; Clark, Andrew G.; Reese, Martin G.; Lincoln, Stephen E.; Butte, Atul J.; De La Vega, Francisco M.; Bustamante, Carlos D.

    2012-01-01

    Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from 11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas—70% of the European ancestry in today’s African Americans dates back to European gene flow happening only 7–8 generations ago. PMID:23040495

  10. Inferred vs Realized Patterns of Gene Flow: An Analysis of Population Structure in the Andros Island Rock Iguana

    PubMed Central

    Colosimo, Giuliano; Knapp, Charles R.; Wallace, Lisa E.; Welch, Mark E.

    2014-01-01

    Ecological data, the primary source of information on patterns and rates of migration, can be integrated with genetic data to more accurately describe the realized connectivity between geographically isolated demes. In this paper we implement this approach and discuss its implications for managing populations of the endangered Andros Island Rock Iguana, Cyclura cychlura cychlura. This iguana is endemic to Andros, a highly fragmented landmass of large islands and smaller cays. Field observations suggest that geographically isolated demes were panmictic due to high, inferred rates of gene flow. We expand on these observations using 16 polymorphic microsatellites to investigate the genetic structure and rates of gene flow from 188 Andros Iguanas collected across 23 island sites. Bayesian clustering of specimens assigned individuals to three distinct genotypic clusters. An analysis of molecular variance (AMOVA) indicates that allele frequency differences are responsible for a significant portion of the genetic variance across the three defined clusters (Fst =  0.117, p0.01). These clusters are associated with larger islands and satellite cays isolated by broad water channels with strong currents. These findings imply that broad water channels present greater obstacles to gene flow than was inferred from field observation alone. Additionally, rates of gene flow were indirectly estimated using BAYESASS 3.0. The proportion of individuals originating from within each identified cluster varied from 94.5 to 98.7%, providing further support for local isolation. Our assessment reveals a major disparity between inferred and realized gene flow. We discuss our results in a conservation perspective for species inhabiting highly fragmented landscapes. PMID:25229344

  11. How Large Asexual Populations Adapt

    NASA Astrophysics Data System (ADS)

    Desai, Michael

    2007-03-01

    We often think of beneficial mutations as being rare, and of adaptation as a sequence of selected substitutions: a beneficial mutation occurs, spreads through a population in a selective sweep, then later another beneficial mutation occurs, and so on. This simple picture is the basis for much of our intuition about adaptive evolution, and underlies a number of practical techniques for analyzing sequence data. Yet many large and mostly asexual populations -- including a wide variety of unicellular organisms and viruses -- live in a very different world. In these populations, beneficial mutations are common, and frequently interfere or cooperate with one another as they all attempt to sweep simultaneously. This radically changes the way these populations adapt: rather than an orderly sequence of selective sweeps, evolution is a constant swarm of competing and interfering mutations. I will describe some aspects of these dynamics, including why large asexual populations cannot evolve very quickly and the character of the diversity they maintain. I will explain how this changes our expectations of sequence data, how sex can help a population adapt, and the potential role of ``mutator'' phenotypes with abnormally high mutation rates. Finally, I will discuss comparisons of these predictions with evolution experiments in laboratory yeast populations.

  12. Inference about density and temporary emigration in unmarked populations

    USGS Publications Warehouse

    Chandler, Richard B.; Royle, J. Andrew; King, David I.

    2011-01-01

    Few species are distributed uniformly in space, and populations of mobile organisms are rarely closed with respect to movement, yet many models of density rely upon these assumptions. We present a hierarchical model allowing inference about the density of unmarked populations subject to temporary emigration and imperfect detection. The model can be fit to data collected using a variety of standard survey methods such as repeated point counts in which removal sampling, double-observer sampling, or distance sampling is used during each count. Simulation studies demonstrated that parameter estimators are unbiased when temporary emigration is either "completely random" or is determined by the size and location of home ranges relative to survey points. We also applied the model to repeated removal sampling data collected on Chestnut-sided Warblers (Dendroica pensylvancia) in the White Mountain National Forest, USA. The density estimate from our model, 1.09 birds/ha, was similar to an estimate of 1.11 birds/ha produced by an intensive spot-mapping effort. Our model is also applicable when processes other than temporary emigration affect the probability of being available for detection, such as in studies using cue counts. Functions to implement the model have been added to the R package unmarked.

  13. Low-Pass Genome-Wide Sequencing and Variant Inference Using Identity-by-Descent in an Isolated Human Population

    PubMed Central

    Gusev, A.; Shah, M. J.; Kenny, E. E.; Ramachandran, A.; Lowe, J. K.; Salit, J.; Lee, C. C.; Levandowsky, E. C.; Weaver, T. N.; Doan, Q. C.; Peckham, H. E.; McLaughlin, S. F.; Lyons, M. R.; Sheth, V. N.; Stoffel, M.; De La Vega, F. M.; Friedman, J. M.; Breslow, J. L.

    2012-01-01

    Whole-genome sequencing in an isolated population with few founders directly ascertains variants from the population bottleneck that may be rare elsewhere. In such populations, shared haplotypes allow imputation of variants in unsequenced samples without resorting to complex statistical methods as in studies of outbred cohorts. We focus on an isolated population cohort from the Pacific Island of Kosrae, Micronesia, where we previously collected SNP array and rich phenotype data for the majority of the population. We report identification of long regions with haplotypes co-inherited between pairs of individuals and methodology to leverage such shared genetic content for imputation. Our estimates show that sequencing as few as 40 personal genomes allows for inference in up to 60% of the 3000-person cohort at the average locus. We ascertained a pilot data set of whole-genome sequences from seven Kosraean individuals, with average 5× coverage. This assay identified 5,735,306 unique sites of which 1,212,831 were previously unknown. Additionally, these variants are unusually enriched for alleles that are rare in other populations when compared to geographic neighbors (published Korean genome SJK). We used the presence of shared haplotypes between the seven Kosraen individuals to estimate expected imputation accuracy of known and novel homozygous variants at 99.6% and 97.3%, respectively. This study presents whole-genome analysis of a homogenous isolate population with emphasis on optimal rare variant inference. PMID:22135348

  14. HIERARCHICAL PROBABILISTIC INFERENCE OF COSMIC SHEAR

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schneider, Michael D.; Dawson, William A.; Hogg, David W.

    2015-07-01

    Point estimators for the shearing of galaxy images induced by gravitational lensing involve a complex inverse problem in the presence of noise, pixelization, and model uncertainties. We present a probabilistic forward modeling approach to gravitational lensing inference that has the potential to mitigate the biased inferences in most common point estimators and is practical for upcoming lensing surveys. The first part of our statistical framework requires specification of a likelihood function for the pixel data in an imaging survey given parameterized models for the galaxies in the images. We derive the lensing shear posterior by marginalizing over all intrinsic galaxymore » properties that contribute to the pixel data (i.e., not limited to galaxy ellipticities) and learn the distributions for the intrinsic galaxy properties via hierarchical inference with a suitably flexible conditional probabilitiy distribution specification. We use importance sampling to separate the modeling of small imaging areas from the global shear inference, thereby rendering our algorithm computationally tractable for large surveys. With simple numerical examples we demonstrate the improvements in accuracy from our importance sampling approach, as well as the significance of the conditional distribution specification for the intrinsic galaxy properties when the data are generated from an unknown number of distinct galaxy populations with different morphological characteristics.« less

  15. Hierarchical Probabilistic Inference of Cosmic Shear

    NASA Astrophysics Data System (ADS)

    Schneider, Michael D.; Hogg, David W.; Marshall, Philip J.; Dawson, William A.; Meyers, Joshua; Bard, Deborah J.; Lang, Dustin

    2015-07-01

    Point estimators for the shearing of galaxy images induced by gravitational lensing involve a complex inverse problem in the presence of noise, pixelization, and model uncertainties. We present a probabilistic forward modeling approach to gravitational lensing inference that has the potential to mitigate the biased inferences in most common point estimators and is practical for upcoming lensing surveys. The first part of our statistical framework requires specification of a likelihood function for the pixel data in an imaging survey given parameterized models for the galaxies in the images. We derive the lensing shear posterior by marginalizing over all intrinsic galaxy properties that contribute to the pixel data (i.e., not limited to galaxy ellipticities) and learn the distributions for the intrinsic galaxy properties via hierarchical inference with a suitably flexible conditional probabilitiy distribution specification. We use importance sampling to separate the modeling of small imaging areas from the global shear inference, thereby rendering our algorithm computationally tractable for large surveys. With simple numerical examples we demonstrate the improvements in accuracy from our importance sampling approach, as well as the significance of the conditional distribution specification for the intrinsic galaxy properties when the data are generated from an unknown number of distinct galaxy populations with different morphological characteristics.

  16. A Bayesian random effects discrete-choice model for resource selection: Population-level selection inference

    USGS Publications Warehouse

    Thomas, D.L.; Johnson, D.; Griffith, B.

    2006-01-01

    Modeling the probability of use of land units characterized by discrete and continuous measures, we present a Bayesian random-effects model to assess resource selection. This model provides simultaneous estimation of both individual- and population-level selection. Deviance information criterion (DIC), a Bayesian alternative to AIC that is sample-size specific, is used for model selection. Aerial radiolocation data from 76 adult female caribou (Rangifer tarandus) and calf pairs during 1 year on an Arctic coastal plain calving ground were used to illustrate models and assess population-level selection of landscape attributes, as well as individual heterogeneity of selection. Landscape attributes included elevation, NDVI (a measure of forage greenness), and land cover-type classification. Results from the first of a 2-stage model-selection procedure indicated that there is substantial heterogeneity among cow-calf pairs with respect to selection of the landscape attributes. In the second stage, selection of models with heterogeneity included indicated that at the population-level, NDVI and land cover class were significant attributes for selection of different landscapes by pairs on the calving ground. Population-level selection coefficients indicate that the pairs generally select landscapes with higher levels of NDVI, but the relationship is quadratic. The highest rate of selection occurs at values of NDVI less than the maximum observed. Results for land cover-class selections coefficients indicate that wet sedge, moist sedge, herbaceous tussock tundra, and shrub tussock tundra are selected at approximately the same rate, while alpine and sparsely vegetated landscapes are selected at a lower rate. Furthermore, the variability in selection by individual caribou for moist sedge and sparsely vegetated landscapes is large relative to the variability in selection of other land cover types. The example analysis illustrates that, while sometimes computationally intense, a

  17. Inferring individual-level processes from population-level patterns in cultural evolution.

    PubMed

    Kandler, Anne; Wilder, Bryan; Fortunato, Laura

    2017-09-01

    Our species is characterized by a great degree of cultural variation, both within and between populations. Understanding how group-level patterns of culture emerge from individual-level behaviour is a long-standing question in the biological and social sciences. We develop a simulation model capturing demographic and cultural dynamics relevant to human cultural evolution, focusing on the interface between population-level patterns and individual-level processes. The model tracks the distribution of variants of cultural traits across individuals in a population over time, conditioned on different pathways for the transmission of information between individuals. From these data, we obtain theoretical expectations for a range of statistics commonly used to capture population-level characteristics (e.g. the degree of cultural diversity). Consistent with previous theoretical work, our results show that the patterns observed at the level of groups are rooted in the interplay between the transmission pathways and the age structure of the population. We also explore whether, and under what conditions, the different pathways can be distinguished based on their group-level signatures, in an effort to establish theoretical limits to inference. Our results show that the temporal dynamic of cultural change over time retains a stronger signature than the cultural composition of the population at a specific point in time. Overall, the results suggest a shift in focus from identifying the one individual-level process that likely produced the observed data to excluding those that likely did not. We conclude by discussing the implications for empirical studies of human cultural evolution.

  18. Inferring individual-level processes from population-level patterns in cultural evolution

    PubMed Central

    Wilder, Bryan

    2017-01-01

    Our species is characterized by a great degree of cultural variation, both within and between populations. Understanding how group-level patterns of culture emerge from individual-level behaviour is a long-standing question in the biological and social sciences. We develop a simulation model capturing demographic and cultural dynamics relevant to human cultural evolution, focusing on the interface between population-level patterns and individual-level processes. The model tracks the distribution of variants of cultural traits across individuals in a population over time, conditioned on different pathways for the transmission of information between individuals. From these data, we obtain theoretical expectations for a range of statistics commonly used to capture population-level characteristics (e.g. the degree of cultural diversity). Consistent with previous theoretical work, our results show that the patterns observed at the level of groups are rooted in the interplay between the transmission pathways and the age structure of the population. We also explore whether, and under what conditions, the different pathways can be distinguished based on their group-level signatures, in an effort to establish theoretical limits to inference. Our results show that the temporal dynamic of cultural change over time retains a stronger signature than the cultural composition of the population at a specific point in time. Overall, the results suggest a shift in focus from identifying the one individual-level process that likely produced the observed data to excluding those that likely did not. We conclude by discussing the implications for empirical studies of human cultural evolution. PMID:28989786

  19. Large Magellanic Cloud Planetary Nebula Morphology: Probing Stellar Populations and Evolution.

    PubMed

    Stanghellini; Shaw; Balick; Blades

    2000-05-10

    Planetary nebulae (PNe) in the Large Magellanic Cloud (LMC) offer the unique opportunity to study both the population and evolution of low- and intermediate-mass stars, by means of the morphological type of the nebula. Using observations from our LMC PN morphological survey, and including images available in the Hubble Space Telescope Data Archive and published chemical abundances, we find that asymmetry in PNe is strongly correlated with a younger stellar population, as indicated by the abundance of elements that are unaltered by stellar evolution (Ne, Ar, and S). While similar results have been obtained for Galactic PNe, this is the first demonstration of the relationship for extragalactic PNe. We also examine the relation between morphology and abundance of the products of stellar evolution. We found that asymmetric PNe have higher nitrogen and lower carbon abundances than symmetric PNe. Our two main results are broadly consistent with the predictions of stellar evolution if the progenitors of asymmetric PNe have on average larger masses than the progenitors of symmetric PNe. The results bear on the question of formation mechanisms for asymmetric PNe-specifically, that the genesis of PNe structure should relate strongly to the population type, and by inference the mass, of the progenitor star and less strongly on whether the central star is a member of a close binary system.

  20. Velocity-based movement modeling for individual and population level inference

    USGS Publications Warehouse

    Hanks, Ephraim M.; Hooten, Mevin B.; Johnson, Devin S.; Sterling, Jeremy T.

    2011-01-01

    Understanding animal movement and resource selection provides important information about the ecology of the animal, but an animal's movement and behavior are not typically constant in time. We present a velocity-based approach for modeling animal movement in space and time that allows for temporal heterogeneity in an animal's response to the environment, allows for temporal irregularity in telemetry data, and accounts for the uncertainty in the location information. Population-level inference on movement patterns and resource selection can then be made through cluster analysis of the parameters related to movement and behavior. We illustrate this approach through a study of northern fur seal (Callorhinus ursinus) movement in the Bering Sea, Alaska, USA. Results show sex differentiation, with female northern fur seals exhibiting stronger response to environmental variables.

  1. Velocity-Based Movement Modeling for Individual and Population Level Inference

    PubMed Central

    Hanks, Ephraim M.; Hooten, Mevin B.; Johnson, Devin S.; Sterling, Jeremy T.

    2011-01-01

    Understanding animal movement and resource selection provides important information about the ecology of the animal, but an animal's movement and behavior are not typically constant in time. We present a velocity-based approach for modeling animal movement in space and time that allows for temporal heterogeneity in an animal's response to the environment, allows for temporal irregularity in telemetry data, and accounts for the uncertainty in the location information. Population-level inference on movement patterns and resource selection can then be made through cluster analysis of the parameters related to movement and behavior. We illustrate this approach through a study of northern fur seal (Callorhinus ursinus) movement in the Bering Sea, Alaska, USA. Results show sex differentiation, with female northern fur seals exhibiting stronger response to environmental variables. PMID:21931584

  2. Causal inference between bioavailability of heavy metals and environmental factors in a large-scale region.

    PubMed

    Liu, Yuqiong; Du, Qingyun; Wang, Qi; Yu, Huanyun; Liu, Jianfeng; Tian, Yu; Chang, Chunying; Lei, Jing

    2017-07-01

    The causation between bioavailability of heavy metals and environmental factors are generally obtained from field experiments at local scales at present, and lack sufficient evidence from large scales. However, inferring causation between bioavailability of heavy metals and environmental factors across large-scale regions is challenging. Because the conventional correlation-based approaches used for causation assessments across large-scale regions, at the expense of actual causation, can result in spurious insights. In this study, a general approach framework, Intervention calculus when the directed acyclic graph (DAG) is absent (IDA) combined with the backdoor criterion (BC), was introduced to identify causation between the bioavailability of heavy metals and the potential environmental factors across large-scale regions. We take the Pearl River Delta (PRD) in China as a case study. The causal structures and effects were identified based on the concentrations of heavy metals (Zn, As, Cu, Hg, Pb, Cr, Ni and Cd) in soil (0-20 cm depth) and vegetable (lettuce) and 40 environmental factors (soil properties, extractable heavy metals and weathering indices) in 94 samples across the PRD. Results show that the bioavailability of heavy metals (Cd, Zn, Cr, Ni and As) was causally influenced by soil properties and soil weathering factors, whereas no causal factor impacted the bioavailability of Cu, Hg and Pb. No latent factor was found between the bioavailability of heavy metals and environmental factors. The causation between the bioavailability of heavy metals and environmental factors at field experiments is consistent with that on a large scale. The IDA combined with the BC provides a powerful tool to identify causation between the bioavailability of heavy metals and environmental factors across large-scale regions. Causal inference in a large system with the dynamic changes has great implications for system-based risk management. Copyright © 2017 Elsevier Ltd. All

  3. Inferring population structure and demographic history using Y-STR data from worldwide populations.

    PubMed

    Xu, Hongyang; Wang, Chuan-Chao; Shrestha, Rukesh; Wang, Ling-Xiang; Zhang, Manfei; He, Yungang; Kidd, Judith R; Kidd, Kenneth K; Jin, Li; Li, Hui

    2015-02-01

    The Y chromosome is one of the best genetic materials to explore the evolutionary history of human populations. Global analyses of Y chromosomal short tandem repeats (STRs) data can reveal very interesting world population structures and histories. However, previous Y-STR works tended to focus on small geographical ranges or only included limited sample sizes. In this study, we have investigated population structure and demographic history using 17 Y chromosomal STRs data of 979 males from 44 worldwide populations. The largest genetic distances have been observed between pairs of African and non-African populations. American populations with the lowest genetic diversities also showed large genetic distances and coancestry coefficients with other populations, whereas Eurasian populations displayed close genetic affinities. African populations tend to have the oldest time to the most recent common ancestors (TMRCAs), the largest effective population sizes and the earliest expansion times, whereas the American, Siberian, Melanesian, and isolated Atayal populations have the most recent TMRCAs and expansion times, and the smallest effective population sizes. This clear geographic pattern is well consistent with serial founder model for the origin of populations outside Africa. The Y-STR dataset presented here provides the most detailed view of worldwide population structure and human male demographic history, and additionally will be of great benefit to future forensic applications and population genetic studies.

  4. Inferring Demographic History Using Two-Locus Statistics.

    PubMed

    Ragsdale, Aaron P; Gutenkunst, Ryan N

    2017-06-01

    Population demographic history may be learned from contemporary genetic variation data. Methods based on aggregating the statistics of many single loci into an allele frequency spectrum (AFS) have proven powerful, but such methods ignore potentially informative patterns of linkage disequilibrium (LD) between neighboring loci. To leverage such patterns, we developed a composite-likelihood framework for inferring demographic history from aggregated statistics of pairs of loci. Using this framework, we show that two-locus statistics are more sensitive to demographic history than single-locus statistics such as the AFS. In particular, two-locus statistics escape the notorious confounding of depth and duration of a bottleneck, and they provide a means to estimate effective population size based on the recombination rather than mutation rate. We applied our approach to a Zambian population of Drosophila melanogaster Notably, using both single- and two-locus statistics, we inferred a substantially lower ancestral effective population size than previous works and did not infer a bottleneck history. Together, our results demonstrate the broad potential for two-locus statistics to enable powerful population genetic inference. Copyright © 2017 by the Genetics Society of America.

  5. Quantitative inference of population response properties across eccentricity from motion-induced maps in macaque V1

    PubMed Central

    Chen, Ming; Wu, Si; Lu, Haidong D.; Roe, Anna W.

    2013-01-01

    Interpreting population responses in the primary visual cortex (V1) remains a challenge especially with the advent of techniques measuring activations of large cortical areas simultaneously with high precision. For successful interpretation, a quantitatively precise model prediction is of great importance. In this study, we investigate how accurate a spatiotemporal filter (STF) model predicts average response profiles to coherently drifting random dot motion obtained by optical imaging of intrinsic signals in V1 of anesthetized macaques. We establish that orientation difference maps, obtained by subtracting orthogonal axis-of-motion, invert with increasing drift speeds, consistent with the motion streak effect. Consistent with perception, the speed at which the map inverts (the critical speed) depends on cortical eccentricity and systematically increases from foveal to parafoveal. We report that critical speeds and response maps to drifting motion are excellently reproduced by the STF model. Our study thus suggests that the STF model is quantitatively accurate enough to be used as a first model of choice for interpreting responses obtained with intrinsic imaging methods in V1. We show further that this good quantitative correspondence opens the possibility to infer otherwise not easily accessible population receptive field properties from responses to complex stimuli, such as drifting random dot motions. PMID:23197457

  6. Multimodel inference to quantify the relative importance of abiotic factors in the population dynamics of marine zooplankton

    NASA Astrophysics Data System (ADS)

    Everaert, Gert; Deschutter, Yana; De Troch, Marleen; Janssen, Colin R.; De Schamphelaere, Karel

    2018-05-01

    The effect of multiple stressors on marine ecosystems remains poorly understood and most of the knowledge available is related to phytoplankton. To partly address this knowledge gap, we tested if combining multimodel inference with generalized additive modelling could quantify the relative contribution of environmental variables on the population dynamics of a zooplankton species in the Belgian part of the North Sea. Hence, we have quantified the relative contribution of oceanographic variables (e.g. water temperature, salinity, nutrient concentrations, and chlorophyll a concentrations) and anthropogenic chemicals (i.e. polychlorinated biphenyls) to the density of Acartia clausi. We found that models with water temperature and chlorophyll a concentration explained ca. 73% of the population density of the marine copepod. Multimodel inference in combination with regression-based models are a generic way to disentangle and quantify multiple stressor-induced changes in marine ecosystems. Future-oriented simulations of copepod densities suggested increased copepod densities under predicted environmental changes.

  7. Reliability of dose volume constraint inference from clinical data.

    PubMed

    Lutz, C M; Møller, D S; Hoffmann, L; Knap, M M; Alber, M

    2017-04-21

    Dose volume histogram points (DVHPs) frequently serve as dose constraints in radiotherapy treatment planning. An experiment was designed to investigate the reliability of DVHP inference from clinical data for multiple cohort sizes and complication incidence rates. The experimental background was radiation pneumonitis in non-small cell lung cancer and the DVHP inference method was based on logistic regression. From 102 NSCLC real-life dose distributions and a postulated DVHP model, an 'ideal' cohort was generated where the most predictive model was equal to the postulated model. A bootstrap and a Cohort Replication Monte Carlo (CoRepMC) approach were applied to create 1000 equally sized populations each. The cohorts were then analyzed to establish inference frequency distributions. This was applied to nine scenarios for cohort sizes of 102 (1), 500 (2) to 2000 (3) patients (by sampling with replacement) and three postulated DVHP models. The Bootstrap was repeated for a 'non-ideal' cohort, where the most predictive model did not coincide with the postulated model. The Bootstrap produced chaotic results for all models of cohort size 1 for both the ideal and non-ideal cohorts. For cohort size 2 and 3, the distributions for all populations were more concentrated around the postulated DVHP. For the CoRepMC, the inference frequency increased with cohort size and incidence rate. Correct inference rates  >[Formula: see text] were only achieved by cohorts with more than 500 patients. Both Bootstrap and CoRepMC indicate that inference of the correct or approximate DVHP for typical cohort sizes is highly uncertain. CoRepMC results were less spurious than Bootstrap results, demonstrating the large influence that randomness in dose-response has on the statistical analysis.

  8. Reliability of dose volume constraint inference from clinical data

    NASA Astrophysics Data System (ADS)

    Lutz, C. M.; Møller, D. S.; Hoffmann, L.; Knap, M. M.; Alber, M.

    2017-04-01

    Dose volume histogram points (DVHPs) frequently serve as dose constraints in radiotherapy treatment planning. An experiment was designed to investigate the reliability of DVHP inference from clinical data for multiple cohort sizes and complication incidence rates. The experimental background was radiation pneumonitis in non-small cell lung cancer and the DVHP inference method was based on logistic regression. From 102 NSCLC real-life dose distributions and a postulated DVHP model, an ‘ideal’ cohort was generated where the most predictive model was equal to the postulated model. A bootstrap and a Cohort Replication Monte Carlo (CoRepMC) approach were applied to create 1000 equally sized populations each. The cohorts were then analyzed to establish inference frequency distributions. This was applied to nine scenarios for cohort sizes of 102 (1), 500 (2) to 2000 (3) patients (by sampling with replacement) and three postulated DVHP models. The Bootstrap was repeated for a ‘non-ideal’ cohort, where the most predictive model did not coincide with the postulated model. The Bootstrap produced chaotic results for all models of cohort size 1 for both the ideal and non-ideal cohorts. For cohort size 2 and 3, the distributions for all populations were more concentrated around the postulated DVHP. For the CoRepMC, the inference frequency increased with cohort size and incidence rate. Correct inference rates  >85 % were only achieved by cohorts with more than 500 patients. Both Bootstrap and CoRepMC indicate that inference of the correct or approximate DVHP for typical cohort sizes is highly uncertain. CoRepMC results were less spurious than Bootstrap results, demonstrating the large influence that randomness in dose-response has on the statistical analysis.

  9. Statistical inference from capture data on closed animal populations

    USGS Publications Warehouse

    Otis, David L.; Burnham, Kenneth P.; White, Gary C.; Anderson, David R.

    1978-01-01

    The estimation of animal abundance is an important problem in both the theoretical and applied biological sciences. Serious work to develop estimation methods began during the 1950s, with a few attempts before that time. The literature on estimation methods has increased tremendously during the past 25 years (Cormack 1968, Seber 1973). However, in large part, the problem remains unsolved. Past efforts toward comprehensive and systematic estimation of density (D) or population size (N) have been inadequate, in general. While more than 200 papers have been published on the subject, one is generally left without a unified approach to the estimation of abundance of an animal population This situation is unfortunate because a number of pressing research problems require such information. In addition, a wide array of environmental assessment studies and biological inventory programs require the estimation of animal abundance. These needs have been further emphasized by the requirement for the preparation of Environmental Impact Statements imposed by the National Environmental Protection Act in 1970. This publication treats inference procedures for certain types of capture data on closed animal populations. This includes multiple capture-recapture studies (variously called capture-mark-recapture, mark-recapture, or tag-recapture studies) involving livetrapping techniques and removal studies involving kill traps or at least temporary removal of captured individuals during the study. Animals do not necessarily need to be physically trapped; visual sightings of marked animals and electrofishing studies also produce data suitable for the methods described in this monograph. To provide a frame of reference for what follows, we give an exampled of a capture-recapture experiment to estimate population size of small animals using live traps. The general field experiment is similar for all capture-recapture studies (a removal study is, of course, slightly different). A typical

  10. SNPs and Haplotypes in Native American Populations

    PubMed Central

    Kidd, Judith R.; Friedlaender, Françoise; Pakstis, Andrew J.; Furtado, Manohar; Fang, Rixun; Wang, Xudong; Nievergelt, Caroline M.; Kidd, Kenneth K.

    2013-01-01

    Autosomal DNA polymorphisms can provide new information and understanding of both the origins of and relationships among modern Native American populations. At the same time that autosomal markers can be highly informative, they are also susceptible to ascertainment biases in the selection of the markers to use. Identifying markers that can be used for ancestry inference among Native American populations can be considered separate from identifying markers to further the quest for history. In the current study we are using data on nine Native American populations to compare the results based on a large haplotype-based dataset with relatively small independent sets of SNPs. We are interested in what types of limited datasets an individual laboratory might be able to collect are best for addressing two different questions of interest. First, how well can we differentiate the Native American populations and/or infer ancestry by assigning an individual to her population(s) of origin? Second, how well can we infer the historical/evolutionary relationships among Native American populations and their Eurasian origins. We conclude that only a large comprehensive dataset involving multiple autosomal markers on multiple populations will be able to answer both questions; different small sets of markers are able to answer only one or the other of these questions. Using our largest dataset we see a general increasing distance from Old World populations from North to South in the New World except for an unexplained close relationship between our Maya and Quechua samples. PMID:21913176

  11. The large impact process inferred from the geology of lunar multiring basins

    NASA Technical Reports Server (NTRS)

    Spudis, Paul D.

    1992-01-01

    The nature of the impact process has been inferred through the study of the geology of a wide variety of impact crater types and sizes. Some of the largest craters known are the multiring basins found in ancient terrains of the terrestrial planets. Of these features, those found on the Moon possess the most extensive and diverse data coverage, including morphological, geochemical, geophysical, and sample data. The study of the geology of lunar basins over the past 10 years has given us a rudimentary understanding of how these large structures have formed and evolved. The topics covered include basin morphology, basin ejecta, basin excavation, and basin ring formation.

  12. Gaussian process-based Bayesian nonparametric inference of population size trajectories from gene genealogies.

    PubMed

    Palacios, Julia A; Minin, Vladimir N

    2013-03-01

    Changes in population size influence genetic diversity of the population and, as a result, leave a signature of these changes in individual genomes in the population. We are interested in the inverse problem of reconstructing past population dynamics from genomic data. We start with a standard framework based on the coalescent, a stochastic process that generates genealogies connecting randomly sampled individuals from the population of interest. These genealogies serve as a glue between the population demographic history and genomic sequences. It turns out that only the times of genealogical lineage coalescences contain information about population size dynamics. Viewing these coalescent times as a point process, estimating population size trajectories is equivalent to estimating a conditional intensity of this point process. Therefore, our inverse problem is similar to estimating an inhomogeneous Poisson process intensity function. We demonstrate how recent advances in Gaussian process-based nonparametric inference for Poisson processes can be extended to Bayesian nonparametric estimation of population size dynamics under the coalescent. We compare our Gaussian process (GP) approach to one of the state-of-the-art Gaussian Markov random field (GMRF) methods for estimating population trajectories. Using simulated data, we demonstrate that our method has better accuracy and precision. Next, we analyze two genealogies reconstructed from real sequences of hepatitis C and human Influenza A viruses. In both cases, we recover more believed aspects of the viral demographic histories than the GMRF approach. We also find that our GP method produces more reasonable uncertainty estimates than the GMRF method. Copyright © 2013, The International Biometric Society.

  13. How to infer relative fitness from a sample of genomic sequences.

    PubMed

    Dayarian, Adel; Shraiman, Boris I

    2014-07-01

    Mounting evidence suggests that natural populations can harbor extensive fitness diversity with numerous genomic loci under selection. It is also known that genealogical trees for populations under selection are quantifiably different from those expected under neutral evolution and described statistically by Kingman's coalescent. While differences in the statistical structure of genealogies have long been used as a test for the presence of selection, the full extent of the information that they contain has not been exploited. Here we demonstrate that the shape of the reconstructed genealogical tree for a moderately large number of random genomic samples taken from a fitness diverse, but otherwise unstructured, asexual population can be used to predict the relative fitness of individuals within the sample. To achieve this we define a heuristic algorithm, which we test in silico, using simulations of a Wright-Fisher model for a realistic range of mutation rates and selection strength. Our inferred fitness ranking is based on a linear discriminator that identifies rapidly coalescing lineages in the reconstructed tree. Inferred fitness ranking correlates strongly with actual fitness, with a genome in the top 10% ranked being in the top 20% fittest with false discovery rate of 0.1-0.3, depending on the mutation/selection parameters. The ranking also enables us to predict the genotypes that future populations inherit from the present one. While the inference accuracy increases monotonically with sample size, samples of 200 nearly saturate the performance. We propose that our approach can be used for inferring relative fitness of genomes obtained in single-cell sequencing of tumors and in monitoring viral outbreaks. Copyright © 2014 by the Genetics Society of America.

  14. Population genetics of mouse lemur vomeronasal receptors: current versus past selection and demographic inference.

    PubMed

    Hohenbrink, Philipp; Mundy, Nicholas I; Radespiel, Ute

    2017-01-21

    A major effort is underway to use population genetic approaches to identify loci involved in adaptation. One issue that has so far received limited attention is whether loci that show a phylogenetic signal of positive selection in the past also show evidence of ongoing positive selection at the population level. We address this issue using vomeronasal receptors (VRs), a diverse gene family in mammals involved in intraspecific communication and predator detection. In mouse lemurs, we previously demonstrated that both subfamilies of VRs (V1Rs and V2Rs) show a strong signal of directional selection in interspecific analyses. We predicted that ongoing sexual selection and/or co-evolution with predators may lead to current directional or balancing selection on VRs. Here, we re-sequence 17 VRs and perform a suite of selection and demographic analyses in sympatric populations of two species of mouse lemurs (Microcebus murinus and M. ravelobensis) in northwestern Madagascar. M. ravelobensis had consistently higher genetic diversity at VRs than M. murinus. In general, we find little evidence for positive selection, with most loci evolving under purifying selection and one locus even showing evidence of functional loss in M. ravelobensis. However, a few loci in M. ravelobensis show potential evidence of positive selection. Using mismatch distributions and expansion models, we infer a more recent colonisation of the habitat by M. murinus than by M. ravelobensis, which most likely speciated in this region earlier on. These findings suggest that the analysis of VR variation is useful in inferring demographic and phylogeographic history of mouse lemurs. In conclusion, this study reveals a substantial heterogeneity over time in selection on VR loci, suggesting that VR evolution is episodic.

  15. Inferred Paternity and Male Reproductive Success in a Killer Whale (Orcinus orca) Population.

    PubMed

    Ford, Michael J; Hanson, M Bradley; Hempelmann, Jennifer A; Ayres, Katherine L; Emmons, Candice K; Schorr, Gregory S; Baird, Robin W; Balcomb, Kenneth C; Wasser, Samuel K; Parsons, Kim M; Balcomb-Bartok, Kelly

    2011-01-01

    We used data from 78 individuals at 26 microsatellite loci to infer parental and sibling relationships within a community of fish-eating ("resident") eastern North Pacific killer whales (Orcinus orca). Paternity analysis involving 15 mother/calf pairs and 8 potential fathers and whole-pedigree analysis of the entire sample produced consistent results. The variance in male reproductive success was greater than expected by chance and similar to that of other aquatic mammals. Although the number of confirmed paternities was small, reproductive success appeared to increase with male age and size. We found no evidence that males from outside this small population sired any of the sampled individuals. In contrast to previous results in a different population, many offspring were the result of matings within the same "pod" (long-term social group). Despite this pattern of breeding within social groups, we found no evidence of offspring produced by matings between close relatives, and the average internal relatedness of individuals was significantly less than expected if mating were random. The population's estimated effective size was <30 or about 1/3 of the current census size. Patterns of allele frequency variation were consistent with a population bottleneck.

  16. Inferring the distribution and demography of an invasive species from sighting data: the red fox incursion into Tasmania.

    PubMed

    Caley, Peter; Ramsey, David S L; Barry, Simon C

    2015-01-01

    A recent study has inferred that the red fox (Vulpes vulpes) is now widespread in Tasmania as of 2010, based on the extraction of fox DNA from predator scats. Heuristically, this inference appears at first glance to be at odds with the lack of recent confirmed discoveries of either road-killed foxes--the last of which occurred in 2006, or hunter killed foxes--the most recent in 2001. This paper demonstrates a method to codify this heuristic analysis and produce inferences consistent with assumptions and data. It does this by formalising the analysis in a transparent and repeatable manner to make inference on the past, present and future distribution of an invasive species. It utilizes Approximate Bayesian Computation to make inferences. Importantly, the method is able to inform management of invasive species within realistic time frames, and can be applied widely. We illustrate the technique using the Tasmanian fox data. Based on the pattern of carcass discoveries of foxes in Tasmania, we infer that the population of foxes in Tasmania is most likely extinct, or restricted in distribution and demographically weak as of 2013. It is possible, though unlikely, that that population is widespread and/or demographically robust. This inference is largely at odds with the inference from the predator scat survey data. Our results suggest the chances of successfully eradicating the introduced red fox population in Tasmania may be significantly higher than previously thought.

  17. Inferring the Distribution and Demography of an Invasive Species from Sighting Data: The Red Fox Incursion into Tasmania

    PubMed Central

    Caley, Peter; Ramsey, David S. L.; Barry, Simon C.

    2015-01-01

    A recent study has inferred that the red fox (Vulpes vulpes) is now widespread in Tasmania as of 2010, based on the extraction of fox DNA from predator scats. Heuristically, this inference appears at first glance to be at odds with the lack of recent confirmed discoveries of either road-killed foxes—the last of which occurred in 2006, or hunter killed foxes—the most recent in 2001. This paper demonstrates a method to codify this heuristic analysis and produce inferences consistent with assumptions and data. It does this by formalising the analysis in a transparent and repeatable manner to make inference on the past, present and future distribution of an invasive species. It utilizes Approximate Bayesian Computation to make inferences. Importantly, the method is able to inform management of invasive species within realistic time frames, and can be applied widely. We illustrate the technique using the Tasmanian fox data. Based on the pattern of carcass discoveries of foxes in Tasmania, we infer that the population of foxes in Tasmania is most likely extinct, or restricted in distribution and demographically weak as of 2013. It is possible, though unlikely, that that population is widespread and/or demographically robust. This inference is largely at odds with the inference from the predator scat survey data. Our results suggest the chances of successfully eradicating the introduced red fox population in Tasmania may be significantly higher than previously thought. PMID:25602618

  18. Inferring Microbial Fitness Landscapes

    DTIC Science & Technology

    2016-02-25

    infer from data the determinants of microbial evolution with sufficient resolution that we can quantify 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND...Distribution Unlimited UU UU UU UU 25-02-2016 1-Oct-2012 30-Sep-2015 Final Report: Inferring Microbial Fitness Landscapes The views, opinions and/or findings...Triangle Park, NC 27709-2211 evolution, fitness landscapes, epistasis, microbial populations REPORT DOCUMENTATION PAGE 11. SPONSOR/MONITOR’S REPORT

  19. ddClone: joint statistical inference of clonal populations from single cell and bulk tumour sequencing data.

    PubMed

    Salehi, Sohrab; Steif, Adi; Roth, Andrew; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P

    2017-03-01

    Next-generation sequencing (NGS) of bulk tumour tissue can identify constituent cell populations in cancers and measure their abundance. This requires computational deconvolution of allelic counts from somatic mutations, which may be incapable of fully resolving the underlying population structure. Single cell sequencing (SCS) is a more direct method, although its replacement of NGS is impeded by technical noise and sampling limitations. We propose ddClone, which analytically integrates NGS and SCS data, leveraging their complementary attributes through joint statistical inference. We show on real and simulated datasets that ddClone produces more accurate results than can be achieved by either method alone.

  20. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs.

    PubMed

    Dilthey, Alexander T; Gourraud, Pierre-Antoine; Mentzer, Alexander J; Cereb, Nezih; Iqbal, Zamin; McVean, Gil

    2016-10-01

    Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30-250 CPU hours per sample) remain a significant

  1. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs

    PubMed Central

    Dilthey, Alexander T.; Gourraud, Pierre-Antoine; McVean, Gil

    2016-01-01

    Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30–250 CPU hours per sample) remain a significant

  2. Natural Selection in Large Populations

    NASA Astrophysics Data System (ADS)

    Desai, Michael

    2011-03-01

    I will discuss theoretical and experimental approaches to the evolutionary dynamics and population genetics of natural selection in large populations. In these populations, many mutations are often present simultaneously, and because recombination is limited, selection cannot act on them all independently. Rather, it can only affect whole combinations of mutations linked together on the same chromosome. Methods common in theoretical population genetics have been of limited utility in analyzing this coupling between the fates of different mutations. In the past few years it has become increasingly clear that this is a crucial gap in our understanding, as sequence data has begun to show that selection appears to act pervasively on many linked sites in a wide range of populations, including viruses, microbes, Drosophila, and humans. I will describe approaches that combine analytical tools drawn from statistical physics and dynamical systems with traditional methods in theoretical population genetics to address this problem, and describe how experiments in budding yeast can help us directly observe these evolutionary dynamics.

  3. Making inference from wildlife collision data: inferring predator absence from prey strikes

    PubMed Central

    Hosack, Geoffrey R.; Barry, Simon C.

    2017-01-01

    Wildlife collision data are ubiquitous, though challenging for making ecological inference due to typically irreducible uncertainty relating to the sampling process. We illustrate a new approach that is useful for generating inference from predator data arising from wildlife collisions. By simply conditioning on a second prey species sampled via the same collision process, and by using a biologically realistic numerical response functions, we can produce a coherent numerical response relationship between predator and prey. This relationship can then be used to make inference on the population size of the predator species, including the probability of extinction. The statistical conditioning enables us to account for unmeasured variation in factors influencing the runway strike incidence for individual airports and to enable valid comparisons. A practical application of the approach for testing hypotheses about the distribution and abundance of a predator species is illustrated using the hypothesized red fox incursion into Tasmania, Australia. We estimate that conditional on the numerical response between fox and lagomorph runway strikes on mainland Australia, the predictive probability of observing no runway strikes of foxes in Tasmania after observing 15 lagomorph strikes is 0.001. We conclude there is enough evidence to safely reject the null hypothesis that there is a widespread red fox population in Tasmania at a population density consistent with prey availability. The method is novel and has potential wider application. PMID:28243534

  4. Efficient computation of the joint sample frequency spectra for multiple populations.

    PubMed

    Kamm, John A; Terhorst, Jonathan; Song, Yun S

    2017-01-01

    A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences and provides a highly efficient dimensional reduction of large-scale population genomic variation data. Recently, there has been much interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on. SFS-based inference methods require accurate computation of the expected SFS under a given demographic model. Although much methodological progress has been made, existing methods suffer from numerical instability and high computational complexity when multiple populations are involved and the sample size is large. In this paper, we present new analytic formulas and algorithms that enable accurate, efficient computation of the expected joint SFS for thousands of individuals sampled from hundreds of populations related by a complex demographic model with arbitrary population size histories (including piecewise-exponential growth). Our results are implemented in a new software package called momi (MOran Models for Inference). Through an empirical study we demonstrate our improvements to numerical stability and computational complexity.

  5. Efficient computation of the joint sample frequency spectra for multiple populations

    PubMed Central

    Kamm, John A.; Terhorst, Jonathan; Song, Yun S.

    2016-01-01

    A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences and provides a highly efficient dimensional reduction of large-scale population genomic variation data. Recently, there has been much interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on. SFS-based inference methods require accurate computation of the expected SFS under a given demographic model. Although much methodological progress has been made, existing methods suffer from numerical instability and high computational complexity when multiple populations are involved and the sample size is large. In this paper, we present new analytic formulas and algorithms that enable accurate, efficient computation of the expected joint SFS for thousands of individuals sampled from hundreds of populations related by a complex demographic model with arbitrary population size histories (including piecewise-exponential growth). Our results are implemented in a new software package called momi (MOran Models for Inference). Through an empirical study we demonstrate our improvements to numerical stability and computational complexity. PMID:28239248

  6. The effects of inference method, population sampling, and gene sampling on species tree inferences: an empirical study in slender salamanders (Plethodontidae: Batrachoseps).

    PubMed

    Jockusch, Elizabeth L; Martínez-Solano, Iñigo; Timpe, Elizabeth K

    2015-01-01

    Species tree methods are now widely used to infer the relationships among species from multilocus data sets. Many methods have been developed, which differ in whether gene and species trees are estimated simultaneously or sequentially, and in how gene trees are used to infer the species tree. While these methods perform well on simulated data, less is known about what impacts their performance on empirical data. We used a data set including five nuclear genes and one mitochondrial gene for 22 species of Batrachoseps to compare the effects of method of analysis, within-species sampling and gene sampling on species tree inferences. For this data set, the choice of inference method had the largest effect on the species tree topology. Exclusion of individual loci had large effects in *BEAST and STEM, but not in MP-EST. Different loci carried the greatest leverage in these different methods, showing that the causes of their disproportionate effects differ. Even though substantial information was present in the nuclear loci, the mitochondrial gene dominated the *BEAST species tree. This leverage is inherent to the mtDNA locus and results from its high variation and lower assumed ploidy. This mtDNA leverage may be problematic when mtDNA has undergone introgression, as is likely in this data set. By contrast, the leverage of RAG1 in STEM analyses does not reflect properties inherent to the locus, but rather results from a gene tree that is strongly discordant with all others, and is best explained by introgression between distantly related species. Within-species sampling was also important, especially in *BEAST analyses, as shown by differences in tree topology across 100 subsampled data sets. Despite the sensitivity of the species tree methods to multiple factors, five species groups, the relationships among these, and some relationships within them, are generally consistently resolved for Batrachoseps. © The Author(s) 2014. Published by Oxford University Press, on

  7. Measuring happiness in large population

    NASA Astrophysics Data System (ADS)

    Wenas, Annabelle; Sjahputri, Smita; Takwin, Bagus; Primaldhi, Alfindra; Muhamad, Roby

    2016-01-01

    The ability to know emotional states for large number of people is important, for example, to ensure the effectiveness of public policies. In this study, we propose a measure of happiness that can be used in large scale population that is based on the analysis of Indonesian language lexicons. Here, we incorporate human assessment of Indonesian words, then quantify happiness on large-scale of texts gathered from twitter conversations. We used two psychological constructs to measure happiness: valence and arousal. We found that Indonesian words have tendency towards positive emotions. We also identified several happiness patterns during days of the week, hours of the day, and selected conversation topics.

  8. Inference of the Distribution of Selection Coefficients for New Nonsynonymous Mutations Using Large Samples

    PubMed Central

    Kim, Bernard Y.; Huber, Christian D.; Lohmueller, Kirk E.

    2017-01-01

    The distribution of fitness effects (DFE) has considerable importance in population genetics. To date, estimates of the DFE come from studies using a small number of individuals. Thus, estimates of the proportion of moderately to strongly deleterious new mutations may be unreliable because such variants are unlikely to be segregating in the data. Additionally, the true functional form of the DFE is unknown, and estimates of the DFE differ significantly between studies. Here we present a flexible and computationally tractable method, called Fit∂a∂i, to estimate the DFE of new mutations using the site frequency spectrum from a large number of individuals. We apply our approach to the frequency spectrum of 1300 Europeans from the Exome Sequencing Project ESP6400 data set, 1298 Danes from the LuCamp data set, and 432 Europeans from the 1000 Genomes Project to estimate the DFE of deleterious nonsynonymous mutations. We infer significantly fewer (0.38–0.84 fold) strongly deleterious mutations with selection coefficient |s| > 0.01 and more (1.24–1.43 fold) weakly deleterious mutations with selection coefficient |s| < 0.001 compared to previous estimates. Furthermore, a DFE that is a mixture distribution of a point mass at neutrality plus a gamma distribution fits better than a gamma distribution in two of the three data sets. Our results suggest that nearly neutral forces play a larger role in human evolution than previously thought. PMID:28249985

  9. Benchmarking Relatedness Inference Methods with Genome-Wide Data from Thousands of Relatives.

    PubMed

    Ramstetter, Monica D; Dyer, Thomas D; Lehman, Donna M; Curran, Joanne E; Duggirala, Ravindranath; Blangero, John; Mezey, Jason G; Williams, Amy L

    2017-09-01

    Inferring relatedness from genomic data is an essential component of genetic association studies, population genetics, forensics, and genealogy. While numerous methods exist for inferring relatedness, thorough evaluation of these approaches in real data has been lacking. Here, we report an assessment of 12 state-of-the-art pairwise relatedness inference methods using a data set with 2485 individuals contained in several large pedigrees that span up to six generations. We find that all methods have high accuracy (92-99%) when detecting first- and second-degree relationships, but their accuracy dwindles to <43% for seventh-degree relationships. However, most identical by descent (IBD) segment-based methods inferred seventh-degree relatives correct to within one relatedness degree for >76% of relative pairs. Overall, the most accurate methods are Estimation of Recent Shared Ancestry (ERSA) and approaches that compute total IBD sharing using the output from GERMLINE and Refined IBD to infer relatedness. Combining information from the most accurate methods provides little accuracy improvement, indicating that novel approaches, such as new methods that leverage relatedness signals from multiple samples, are needed to achieve a sizeable jump in performance. Copyright © 2017 Ramstetter et al.

  10. WKB theory of large deviations in stochastic populations

    NASA Astrophysics Data System (ADS)

    Assaf, Michael; Meerson, Baruch

    2017-06-01

    Stochasticity can play an important role in the dynamics of biologically relevant populations. These span a broad range of scales: from intra-cellular populations of molecules to population of cells and then to groups of plants, animals and people. Large deviations in stochastic population dynamics—such as those determining population extinction, fixation or switching between different states—are presently in a focus of attention of statistical physicists. We review recent progress in applying different variants of dissipative WKB approximation (after Wentzel, Kramers and Brillouin) to this class of problems. The WKB approximation allows one to evaluate the mean time and/or probability of population extinction, fixation and switches resulting from either intrinsic (demographic) noise, or a combination of the demographic noise and environmental variations, deterministic or random. We mostly cover well-mixed populations, single and multiple, but also briefly consider populations on heterogeneous networks and spatial populations. The spatial setting also allows one to study large fluctuations of the speed of biological invasions. Finally, we briefly discuss possible directions of future work.

  11. Orientation Encoding and Viewpoint Invariance in Face Recognition: Inferring Neural Properties from Large-Scale Signals.

    PubMed

    Ramírez, Fernando M

    2018-05-01

    Viewpoint-invariant face recognition is thought to be subserved by a distributed network of occipitotemporal face-selective areas that, except for the human anterior temporal lobe, have been shown to also contain face-orientation information. This review begins by highlighting the importance of bilateral symmetry for viewpoint-invariant recognition and face-orientation perception. Then, monkey electrophysiological evidence is surveyed describing key tuning properties of face-selective neurons-including neurons bimodally tuned to mirror-symmetric face-views-followed by studies combining functional magnetic resonance imaging (fMRI) and multivariate pattern analyses to probe the representation of face-orientation and identity information in humans. Altogether, neuroimaging studies suggest that face-identity is gradually disentangled from face-orientation information along the ventral visual processing stream. The evidence seems to diverge, however, regarding the prevalent form of tuning of neural populations in human face-selective areas. In this context, caveats possibly leading to erroneous inferences regarding mirror-symmetric coding are exposed, including the need to distinguish angular from Euclidean distances when interpreting multivariate pattern analyses. On this basis, this review argues that evidence from the fusiform face area is best explained by a view-sensitive code reflecting head angular disparity, consistent with a role of this area in face-orientation perception. Finally, the importance is stressed of explicit models relating neural properties to large-scale signals.

  12. Streamlining and Large Ancestral Genomes in Archaea Inferred with a Phylogenetic Birth-and-Death Model

    PubMed Central

    Miklós, István

    2009-01-01

    Homologous genes originate from a common ancestor through vertical inheritance, duplication, or horizontal gene transfer. Entire homolog families spawned by a single ancestral gene can be identified across multiple genomes based on protein sequence similarity. The sequences, however, do not always reveal conclusively the history of large families. To study the evolution of complete gene repertoires, we propose here a mathematical framework that does not rely on resolved gene family histories. We show that so-called phylogenetic profiles, formed by family sizes across multiple genomes, are sufficient to infer principal evolutionary trends. The main novelty in our approach is an efficient algorithm to compute the likelihood of a phylogenetic profile in a model of birth-and-death processes acting on a phylogeny. We examine known gene families in 28 archaeal genomes using a probabilistic model that involves lineage- and family-specific components of gene acquisition, duplication, and loss. The model enables us to consider all possible histories when inferring statistics about archaeal evolution. According to our reconstruction, most lineages are characterized by a net loss of gene families. Major increases in gene repertoire have occurred only a few times. Our reconstruction underlines the importance of persistent streamlining processes in shaping genome composition in Archaea. It also suggests that early archaeal genomes were as complex as typical modern ones, and even show signs, in the case of the methanogenic ancestor, of an extremely large gene repertoire. PMID:19570746

  13. Evaluating the Influence of the Microsatellite Marker Set on the Genetic Structure Inferred in Pyrus communis L.

    PubMed Central

    Urrestarazu, Jorge; Royo, José B.; Santesteban, Luis G.; Miranda, Carlos

    2015-01-01

    Fingerprinting information can be used to elucidate in a robust manner the genetic structure of germplasm collections, allowing a more rational and fine assessment of genetic resources. Bayesian model-based approaches are nowadays majorly preferred to infer genetic structure, but it is still largely unresolved how marker sets should be built in order to obtain a robust inference. The objective was to evaluate, in Pyrus germplasm collections, the influence of the SSR marker set size on the genetic structure inferred, also evaluating the influence of the criterion used to select those markers. Inferences were performed considering an increasing number of SSR markers that ranged from just two up to 25, incorporated one at a time into the analysis. The influence of the number of SSR markers used was evaluated comparing the number of populations and the strength of the signal detected, and also the similarity of the genotype assignments to populations between analyses. In order to test if those results were influenced by the criterion used to select the SSRs, several choosing scenarios based on the discrimination power or the fixation index values of the SSRs were tested. Our results indicate that population structure could be inferred accurately once a certain SSR number threshold was reached, which depended on the underlying structure within the genotypes, but the method used to select the markers included on each set appeared not to be very relevant. The minimum number of SSRs required to provide robust structure inferences and adequate measurements of the differentiation, even when low differentiation levels exist within populations, was proved similar to that of the complete list of recommended markers for fingerprinting. When a SSR set size similar to the minimum marker sets recommended for fingerprinting it is used, only major divisions or moderate (F ST>0.05) differentiation of the germplasm are detected. PMID:26382618

  14. COSMOABC: Likelihood-free inference via Population Monte Carlo Approximate Bayesian Computation

    NASA Astrophysics Data System (ADS)

    Ishida, E. E. O.; Vitenti, S. D. P.; Penna-Lima, M.; Cisewski, J.; de Souza, R. S.; Trindade, A. M. M.; Cameron, E.; Busti, V. C.; COIN Collaboration

    2015-11-01

    Approximate Bayesian Computation (ABC) enables parameter inference for complex physical systems in cases where the true likelihood function is unknown, unavailable, or computationally too expensive. It relies on the forward simulation of mock data and comparison between observed and synthetic catalogues. Here we present COSMOABC, a Python ABC sampler featuring a Population Monte Carlo variation of the original ABC algorithm, which uses an adaptive importance sampling scheme. The code is very flexible and can be easily coupled to an external simulator, while allowing to incorporate arbitrary distance and prior functions. As an example of practical application, we coupled COSMOABC with the NUMCOSMO library and demonstrate how it can be used to estimate posterior probability distributions over cosmological parameters based on measurements of galaxy clusters number counts without computing the likelihood function. COSMOABC is published under the GPLv3 license on PyPI and GitHub and documentation is available at http://goo.gl/SmB8EX.

  15. In defence of model-based inference in phylogeography

    PubMed Central

    Beaumont, Mark A.; Nielsen, Rasmus; Robert, Christian; Hey, Jody; Gaggiotti, Oscar; Knowles, Lacey; Estoup, Arnaud; Panchal, Mahesh; Corander, Jukka; Hickerson, Mike; Sisson, Scott A.; Fagundes, Nelson; Chikhi, Lounès; Beerli, Peter; Vitalis, Renaud; Cornuet, Jean-Marie; Huelsenbeck, John; Foll, Matthieu; Yang, Ziheng; Rousset, Francois; Balding, David; Excoffier, Laurent

    2017-01-01

    Recent papers have promoted the view that model-based methods in general, and those based on Approximate Bayesian Computation (ABC) in particular, are flawed in a number of ways, and are therefore inappropriate for the analysis of phylogeographic data. These papers further argue that Nested Clade Phylogeographic Analysis (NCPA) offers the best approach in statistical phylogeography. In order to remove the confusion and misconceptions introduced by these papers, we justify and explain the reasoning behind model-based inference. We argue that ABC is a statistically valid approach, alongside other computational statistical techniques that have been successfully used to infer parameters and compare models in population genetics. We also examine the NCPA method and highlight numerous deficiencies, either when used with single or multiple loci. We further show that the ages of clades are carelessly used to infer ages of demographic events, that these ages are estimated under a simple model of panmixia and population stationarity but are then used under different and unspecified models to test hypotheses, a usage the invalidates these testing procedures. We conclude by encouraging researchers to study and use model-based inference in population genetics. PMID:29284924

  16. Inference of higher-order relationships in the cycads from a large chloroplast data set.

    PubMed

    Rai, Hardeep S; O'Brien, Heath E; Reeves, Patrick A; Olmstead, Richard G; Graham, Sean W

    2003-11-01

    We investigated higher-order relationships in the cycads, an ancient group of seed-bearing plants, by examining a large portion of the chloroplast genome from seven species chosen to exemplify our current understanding of taxonomic diversity in the order. The regions considered span approximately 13.5 kb of unaligned data per taxon, and comprise a diverse range of coding sequences, introns and intergenic spacers dispersed throughout the plastid genome. Our results provide substantial support for most of the inferred backbone of cycad phylogeny, and weak evidence that the sister-group of the cycads among living seed plants is Ginkgo biloba. Cycas (representing Cycadaceae) is the sister-group of the remaining cycads; Dioon is part of the next most basal split. Two of the three commonly recognized families of cycads (Zamiaceae and Stangeriaceae) are not monophyletic; Stangeria is embedded within Zamiaceae, close to Zamia and Ceratozamia, and not closely allied to the other genus of Stangeriaceae, Bowenia. In contrast to the other seed plants, cycad chloroplast genomes share two features with Ginkgo: a reduced rate of evolution and an elevated transition:transversion ratio. We demonstrate that the latter aspect of their molecular evolution is unlikely to have affected inference of cycad relationships in the context of seed-plant wide analyses.

  17. minet: A R/Bioconductor package for inferring large transcriptional networks using mutual information.

    PubMed

    Meyer, Patrick E; Lafitte, Frédéric; Bontempi, Gianluca

    2008-10-29

    This paper presents the R/Bioconductor package minet (version 1.1.6) which provides a set of functions to infer mutual information networks from a dataset. Once fed with a microarray dataset, the package returns a network where nodes denote genes, edges model statistical dependencies between genes and the weight of an edge quantifies the statistical evidence of a specific (e.g transcriptional) gene-to-gene interaction. Four different entropy estimators are made available in the package minet (empirical, Miller-Madow, Schurmann-Grassberger and shrink) as well as four different inference methods, namely relevance networks, ARACNE, CLR and MRNET. Also, the package integrates accuracy assessment tools, like F-scores, PR-curves and ROC-curves in order to compare the inferred network with a reference one. The package minet provides a series of tools for inferring transcriptional networks from microarray data. It is freely available from the Comprehensive R Archive Network (CRAN) as well as from the Bioconductor website.

  18. Generative inference for cultural evolution.

    PubMed

    Kandler, Anne; Powell, Adam

    2018-04-05

    One of the major challenges in cultural evolution is to understand why and how various forms of social learning are used in human populations, both now and in the past. To date, much of the theoretical work on social learning has been done in isolation of data, and consequently many insights focus on revealing the learning processes or the distributions of cultural variants that are expected to have evolved in human populations. In population genetics, recent methodological advances have allowed a greater understanding of the explicit demographic and/or selection mechanisms that underlie observed allele frequency distributions across the globe, and their change through time. In particular, generative frameworks-often using coalescent-based simulation coupled with approximate Bayesian computation (ABC)-have provided robust inferences on the human past, with no reliance on a priori assumptions of equilibrium. Here, we demonstrate the applicability and utility of generative inference approaches to the field of cultural evolution. The framework advocated here uses observed population-level frequency data directly to establish the likely presence or absence of particular hypothesized learning strategies. In this context, we discuss the problem of equifinality and argue that, in the light of sparse cultural data and the multiplicity of possible social learning processes, the exclusion of those processes inconsistent with the observed data might be the most instructive outcome. Finally, we summarize the findings of generative inference approaches applied to a number of case studies.This article is part of the theme issue 'Bridging cultural gaps: interdisciplinary studies in human cultural evolution'. © 2018 The Author(s).

  19. A generative inference framework for analysing patterns of cultural change in sparse population data with evidence for fashion trends in LBK culture.

    PubMed

    Kandler, Anne; Shennan, Stephen

    2015-12-06

    Cultural change can be quantified by temporal changes in frequency of different cultural artefacts and it is a central question to identify what underlying cultural transmission processes could have caused the observed frequency changes. Observed changes, however, often describe the dynamics in samples of the population of artefacts, whereas transmission processes act on the whole population. Here we develop a modelling framework aimed at addressing this inference problem. To do so, we firstly generate population structures from which the observed sample could have been drawn randomly and then determine theoretical samples at a later time t2 produced under the assumption that changes in frequencies are caused by a specific transmission process. Thereby we also account for the potential effect of time-averaging processes in the generation of the observed sample. Subsequent statistical comparisons (e.g. using Bayesian inference) of the theoretical and observed samples at t2 can establish which processes could have produced the observed frequency data. In this way, we infer underlying transmission processes directly from available data without any equilibrium assumption. We apply this framework to a dataset describing pottery from settlements of some of the first farmers in Europe (the LBK culture) and conclude that the observed frequency dynamic of different types of decorated pottery is consistent with age-dependent selection, a preference for 'young' pottery types which is potentially indicative of fashion trends. © 2015 The Author(s).

  20. Inferring genetic connectivity in real populations, exemplified by coastal and oceanic Atlantic cod.

    PubMed

    Spies, Ingrid; Hauser, Lorenz; Jorde, Per Erik; Knutsen, Halvor; Punt, André E; Rogers, Lauren A; Stenseth, Nils Chr

    2018-05-08

    Genetic data are commonly used to estimate connectivity between putative populations, but translating them to demographic dispersal rates is complicated. Theoretical equations that infer a migration rate based on the genetic estimator F ST , such as Wright's equation, F ST ≈ 1/(4 N e m + 1), make assumptions that do not apply to most real populations. How complexities inherent to real populations affect migration was exemplified by Atlantic cod in the North Sea and Skagerrak and was examined within an age-structured model that incorporated genetic markers. Migration was determined under various scenarios by varying the number of simulated migrants until the mean simulated level of genetic differentiation matched a fixed level of genetic differentiation equal to empirical estimates. Parameters that decreased the N e / N t ratio (where N e is the effective and N t is the total population size), such as high fishing mortality and high fishing gear selectivity, increased the number of migrants required to achieve empirical levels of genetic differentiation. Higher maturity-at-age and lower selectivity increased N e / N t and decreased migration when genetic differentiation was fixed. Changes in natural mortality, fishing gear selectivity, and maturity-at-age within expected limits had a moderate effect on migration when genetic differentiation was held constant. Changes in population size had the greatest effect on the number of migrants to achieve fixed levels of F ST , particularly when genetic differentiation was low, F ST ≈ 10 -3 Highly variable migration patterns, compared with constant migration, resulted in higher variance in genetic differentiation and higher extreme values. Results are compared with and provide insight into the use of theoretical equations to estimate migration among real populations. Copyright © 2018 the Author(s). Published by PNAS.

  1. Studies in the extensively automatic construction of large odds-based inference networks from structured data. Examples from medical, bioinformatics, and health insurance claims data.

    PubMed

    Robson, B; Boray, S

    2018-04-01

    Theoretical and methodological principles are presented for the construction of very large inference nets for odds calculations, composed of hundreds or many thousands or more of elements, in this paper generated by structured data mining. It is argued that the usual small inference nets can sometimes represent rather simple, arbitrary estimates. Examples of applications in clinical and public health data analysis, medical claims data and detection of irregular entries, and bioinformatics data, are presented. Construction of large nets benefits from application of a theory of expected information for sparse data and the Dirac notation and algebra. The extent to which these are important here is briefly discussed. Purposes of the study include (a) exploration of the properties of large inference nets and a perturbation and tacit conditionality models, (b) using these to propose simpler models including one that a physician could use routinely, analogous to a "risk score", (c) examination of the merit of describing optimal performance in a single measure that combines accuracy, specificity, and sensitivity in place of a ROC curve, and (d) relationship to methods for detecting anomalous and potentially fraudulent data. Copyright © 2018 Elsevier Ltd. All rights reserved.

  2. Network Model-Assisted Inference from Respondent-Driven Sampling Data.

    PubMed

    Gile, Krista J; Handcock, Mark S

    2015-06-01

    Respondent-Driven Sampling is a widely-used method for sampling hard-to-reach human populations by link-tracing over their social networks. Inference from such data requires specialized techniques because the sampling process is both partially beyond the control of the researcher, and partially implicitly defined. Therefore, it is not generally possible to directly compute the sampling weights for traditional design-based inference, and likelihood inference requires modeling the complex sampling process. As an alternative, we introduce a model-assisted approach, resulting in a design-based estimator leveraging a working network model. We derive a new class of estimators for population means and a corresponding bootstrap standard error estimator. We demonstrate improved performance compared to existing estimators, including adjustment for an initial convenience sample. We also apply the method and an extension to the estimation of HIV prevalence in a high-risk population.

  3. Estimating trends in alligator populations from nightlight survey data

    USGS Publications Warehouse

    Fujisaki, Ikuko; Mazzotti, Frank J.; Dorazio, Robert M.; Rice, Kenneth G.; Cherkiss, Michael; Jeffery, Brian

    2011-01-01

    Nightlight surveys are commonly used to evaluate status and trends of crocodilian populations, but imperfect detection caused by survey- and location-specific factors makes it difficult to draw population inferences accurately from uncorrected data. We used a two-stage hierarchical model comprising population abundance and detection probability to examine recent abundance trends of American alligators (Alligator mississippiensis) in subareas of Everglades wetlands in Florida using nightlight survey data. During 2001–2008, there were declining trends in abundance of small and/or medium sized animals in a majority of subareas, whereas abundance of large sized animals had either demonstrated an increased or unclear trend. For small and large sized class animals, estimated detection probability declined as water depth increased. Detection probability of small animals was much lower than for larger size classes. The declining trend of smaller alligators may reflect a natural population response to the fluctuating environment of Everglades wetlands under modified hydrology. It may have negative implications for the future of alligator populations in this region, particularly if habitat conditions do not favor recruitment of offspring in the near term. Our study provides a foundation to improve inferences made from nightlight surveys of other crocodilian populations.

  4. Causal inference and the data-fusion problem

    PubMed Central

    Bareinboim, Elias; Pearl, Judea

    2016-01-01

    We review concepts, principles, and tools that unify current approaches to causal analysis and attend to new challenges presented by big data. In particular, we address the problem of data fusion—piecing together multiple datasets collected under heterogeneous conditions (i.e., different populations, regimes, and sampling methods) to obtain valid answers to queries of interest. The availability of multiple heterogeneous datasets presents new opportunities to big data analysts, because the knowledge that can be acquired from combined data would not be possible from any individual source alone. However, the biases that emerge in heterogeneous environments require new analytical tools. Some of these biases, including confounding, sampling selection, and cross-population biases, have been addressed in isolation, largely in restricted parametric models. We here present a general, nonparametric framework for handling these biases and, ultimately, a theoretical solution to the problem of data fusion in causal inference tasks. PMID:27382148

  5. Inferring epidemiological parameters from phylogenies using regression-ABC: A comparative study

    PubMed Central

    Gascuel, Olivier

    2017-01-01

    Inferring epidemiological parameters such as the R0 from time-scaled phylogenies is a timely challenge. Most current approaches rely on likelihood functions, which raise specific issues that range from computing these functions to finding their maxima numerically. Here, we present a new regression-based Approximate Bayesian Computation (ABC) approach, which we base on a large variety of summary statistics intended to capture the information contained in the phylogeny and its corresponding lineage-through-time plot. The regression step involves the Least Absolute Shrinkage and Selection Operator (LASSO) method, which is a robust machine learning technique. It allows us to readily deal with the large number of summary statistics, while avoiding resorting to Markov Chain Monte Carlo (MCMC) techniques. To compare our approach to existing ones, we simulated target trees under a variety of epidemiological models and settings, and inferred parameters of interest using the same priors. We found that, for large phylogenies, the accuracy of our regression-ABC is comparable to that of likelihood-based approaches involving birth-death processes implemented in BEAST2. Our approach even outperformed these when inferring the host population size with a Susceptible-Infected-Removed epidemiological model. It also clearly outperformed a recent kernel-ABC approach when assuming a Susceptible-Infected epidemiological model with two host types. Lastly, by re-analyzing data from the early stages of the recent Ebola epidemic in Sierra Leone, we showed that regression-ABC provides more realistic estimates for the duration parameters (latency and infectiousness) than the likelihood-based method. Overall, ABC based on a large variety of summary statistics and a regression method able to perform variable selection and avoid overfitting is a promising approach to analyze large phylogenies. PMID:28263987

  6. Network Model-Assisted Inference from Respondent-Driven Sampling Data

    PubMed Central

    Gile, Krista J.; Handcock, Mark S.

    2015-01-01

    Summary Respondent-Driven Sampling is a widely-used method for sampling hard-to-reach human populations by link-tracing over their social networks. Inference from such data requires specialized techniques because the sampling process is both partially beyond the control of the researcher, and partially implicitly defined. Therefore, it is not generally possible to directly compute the sampling weights for traditional design-based inference, and likelihood inference requires modeling the complex sampling process. As an alternative, we introduce a model-assisted approach, resulting in a design-based estimator leveraging a working network model. We derive a new class of estimators for population means and a corresponding bootstrap standard error estimator. We demonstrate improved performance compared to existing estimators, including adjustment for an initial convenience sample. We also apply the method and an extension to the estimation of HIV prevalence in a high-risk population. PMID:26640328

  7. Clumpak: a program for identifying clustering modes and packaging population structure inferences across K.

    PubMed

    Kopelman, Naama M; Mayzel, Jonathan; Jakobsson, Mattias; Rosenberg, Noah A; Mayrose, Itay

    2015-09-01

    The identification of the genetic structure of populations from multilocus genotype data has become a central component of modern population-genetic data analysis. Application of model-based clustering programs often entails a number of steps, in which the user considers different modelling assumptions, compares results across different predetermined values of the number of assumed clusters (a parameter typically denoted K), examines multiple independent runs for each fixed value of K, and distinguishes among runs belonging to substantially distinct clustering solutions. Here, we present Clumpak (Cluster Markov Packager Across K), a method that automates the postprocessing of results of model-based population structure analyses. For analysing multiple independent runs at a single K value, Clumpak identifies sets of highly similar runs, separating distinct groups of runs that represent distinct modes in the space of possible solutions. This procedure, which generates a consensus solution for each distinct mode, is performed by the use of a Markov clustering algorithm that relies on a similarity matrix between replicate runs, as computed by the software Clumpp. Next, Clumpak identifies an optimal alignment of inferred clusters across different values of K, extending a similar approach implemented for a fixed K in Clumpp and simplifying the comparison of clustering results across different K values. Clumpak incorporates additional features, such as implementations of methods for choosing K and comparing solutions obtained by different programs, models, or data subsets. Clumpak, available at http://clumpak.tau.ac.il, simplifies the use of model-based analyses of population structure in population genetics and molecular ecology. © 2015 John Wiley & Sons Ltd.

  8. Bayesian Inference on the Radio-quietness of Gamma-ray Pulsars

    NASA Astrophysics Data System (ADS)

    Yu, Hoi-Fung; Hui, Chung Yue; Kong, Albert K. H.; Takata, Jumpei

    2018-04-01

    For the first time we demonstrate using a robust Bayesian approach to analyze the populations of radio-quiet (RQ) and radio-loud (RL) gamma-ray pulsars. We quantify their differences and obtain their distributions of the radio-cone opening half-angle δ and the magnetic inclination angle α by Bayesian inference. In contrast to the conventional frequentist point estimations that might be non-representative when the distribution is highly skewed or multi-modal, which is often the case when data points are scarce, Bayesian statistics displays the complete posterior distribution that the uncertainties can be readily obtained regardless of the skewness and modality. We found that the spin period, the magnetic field strength at the light cylinder, the spin-down power, the gamma-ray-to-X-ray flux ratio, and the spectral curvature significance of the two groups of pulsars exhibit significant differences at the 99% level. Using Bayesian inference, we are able to infer the values and uncertainties of δ and α from the distribution of RQ and RL pulsars. We found that δ is between 10° and 35° and the distribution of α is skewed toward large values.

  9. A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals

    PubMed Central

    Browning, Brian L.; Browning, Sharon R.

    2009-01-01

    We present methods for imputing data for ungenotyped markers and for inferring haplotype phase in large data sets of unrelated individuals and parent-offspring trios. Our methods make use of known haplotype phase when it is available, and our methods are computationally efficient so that the full information in large reference panels with thousands of individuals is utilized. We demonstrate that substantial gains in imputation accuracy accrue with increasingly large reference panel sizes, particularly when imputing low-frequency variants, and that unphased reference panels can provide highly accurate genotype imputation. We place our methodology in a unified framework that enables the simultaneous use of unphased and phased data from trios and unrelated individuals in a single analysis. For unrelated individuals, our imputation methods produce well-calibrated posterior genotype probabilities and highly accurate allele-frequency estimates. For trios, our haplotype-inference method is four orders of magnitude faster than the gold-standard PHASE program and has excellent accuracy. Our methods enable genotype imputation to be performed with unphased trio or unrelated reference panels, thus accounting for haplotype-phase uncertainty in the reference panel. We present a useful measure of imputation accuracy, allelic R2, and show that this measure can be estimated accurately from posterior genotype probabilities. Our methods are implemented in version 3.0 of the BEAGLE software package. PMID:19200528

  10. The Recombination Landscape in Wild House Mice Inferred Using Population Genomic Data.

    PubMed

    Booker, Tom R; Ness, Rob W; Keightley, Peter D

    2017-09-01

    Characterizing variation in the rate of recombination across the genome is important for understanding several evolutionary processes. Previous analysis of the recombination landscape in laboratory mice has revealed that the different subspecies have different suites of recombination hotspots. It is unknown, however, whether hotspots identified in laboratory strains reflect the hotspot diversity of natural populations or whether broad-scale variation in the rate of recombination is conserved between subspecies. In this study, we constructed fine-scale recombination rate maps for a natural population of the Eastern house mouse, Mus musculus castaneus We performed simulations to assess the accuracy of recombination rate inference in the presence of phase errors, and we used a novel approach to quantify phase error. The spatial distribution of recombination events is strongly positively correlated between our castaneus map, and a map constructed using inbred lines derived predominantly from M. m. domesticus Recombination hotspots in wild castaneus show little overlap, however, with the locations of double-strand breaks in wild-derived house mouse strains. Finally, we also find that genetic diversity in M. m. castaneus is positively correlated with the rate of recombination, consistent with pervasive natural selection operating in the genome. Our study suggests that recombination rate variation is conserved at broad scales between house mouse subspecies, but it is not strongly conserved at fine scales. Copyright © 2017 by the Genetics Society of America.

  11. A potential large and persistent black carbon forcing over Northern Pacific inferred from satellite observations.

    PubMed

    Li, Zhongshu; Liu, Junfeng; Mauzerall, Denise L; Li, Xiaoyuan; Fan, Songmiao; Horowitz, Larry W; He, Cenlin; Yi, Kan; Tao, Shu

    2017-03-07

    Black carbon (BC) aerosol strongly absorbs solar radiation, which warms climate. However, accurate estimation of BC's climate effect is limited by the uncertainties of its spatiotemporal distribution, especially over remote oceanic areas. The HIAPER Pole-to-Pole Observation (HIPPO) program from 2009 to 2011 intercepted multiple snapshots of BC profiles over Pacific in various seasons, and revealed a 2 to 5 times overestimate of BC by current global models. In this study, we compared the measurements from aircraft campaigns and satellites, and found a robust association between BC concentrations and satellite-retrieved CO, tropospheric NO 2 , and aerosol optical depth (AOD) (R 2  > 0.8). This establishes a basis to construct a satellite-based column BC approximation (sBC*) over remote oceans. The inferred sBC* shows that Asian outflows in spring bring much more BC aerosols to the mid-Pacific than those occurring in other seasons. In addition, inter-annual variability of sBC* is seen over the Northern Pacific, with abundances varying consistently with the springtime Pacific/North American (PNA) index. Our sBC* dataset infers a widespread overestimation of BC loadings and BC Direct Radiative Forcing by current models over North Pacific, which further suggests that large uncertainties exist on aerosol-climate interactions over other remote oceanic areas beyond Pacific.

  12. Genealogical and evolutionary inference with the human Y chromosome.

    PubMed

    Stumpf, M P; Goldstein, D B

    2001-03-02

    Population genetics has emerged as a powerful tool for unraveling human history. In addition to the study of mitochondrial and autosomal DNA, attention has recently focused on Y-chromosome variation. Ambiguities and inaccuracies in data analysis, however, pose an important obstacle to further development of the field. Here we review the methods available for genealogical inference using Y-chromosome data. Approaches can be divided into those that do and those that do not use an explicit population model in genealogical inference. We describe the strengths and weaknesses of these model-based and model-free approaches, as well as difficulties associated with the mutation process that affect both methods. In the case of genealogical inference using microsatellite loci, we use coalescent simulations to show that relatively simple generalizations of the mutation process can greatly increase the accuracy of genealogical inference. Because model-free and model-based approaches have different biases and limitations, we conclude that there is considerable benefit in the continued use of both types of approaches.

  13. Linked 4-Way Multimodal Brain Differences in Schizophrenia in a Large Chinese Han Population.

    PubMed

    Liu, Shengfeng; Wang, Haiying; Song, Ming; Lv, Luxian; Cui, Yue; Liu, Yong; Fan, Lingzhong; Zuo, Nianming; Xu, Kaibin; Du, Yuhui; Yu, Qingbao; Luo, Na; Qi, Shile; Yang, Jian; Xie, Sangma; Li, Jian; Chen, Jun; Chen, Yunchun; Wang, Huaning; Guo, Hua; Wan, Ping; Yang, Yongfeng; Li, Peng; Lu, Lin; Yan, Hao; Yan, Jun; Wang, Huiling; Zhang, Hongxing; Zhang, Dai; Calhoun, Vince D; Jiang, Tianzi; Sui, Jing

    2018-04-20

    Multimodal fusion has been regarded as a promising tool to discover covarying patterns of multiple imaging types impaired in brain diseases, such as schizophrenia (SZ). In this article, we aim to investigate the covarying abnormalities underlying SZ in a large Chinese Han population (307 SZs, 298 healthy controls [HCs]). Four types of magnetic resonance imaging (MRI) features, including regional homogeneity (ReHo) from resting-state functional MRI, gray matter volume (GM) from structural MRI, fractional anisotropy (FA) from diffusion MRI, and functional network connectivity (FNC) resulted from group independent component analysis, were jointly analyzed by a data-driven multivariate fusion method. Results suggest that a widely distributed network disruption appears in SZ patients, with synchronous changes in both functional and structural regions, especially the basal ganglia network, salience network (SAN), and the frontoparietal network. Such a multimodal coalteration was also replicated in another independent Chinese sample (40 SZs, 66 HCs). Our results on auditory verbal hallucination (AVH) also provide evidence for the hypothesis that prefrontal hypoactivation and temporal hyperactivation in SZ may lead to failure of executive control and inhibition, which is relevant to AVH. In addition, impaired working memory performance was found associated with GM reduction and FA decrease in SZ in prefrontal and superior temporal area, in both discovery and replication datasets. In summary, by leveraging multiple imaging and clinical information into one framework to observe brain in multiple views, we can integrate multiple inferences about SZ from large-scale population and offer unique perspectives regarding the missing links between the brain function and structure that may not be achieved by separate unimodal analyses.

  14. Estimating trends in alligator populations from nightlight survey data

    USGS Publications Warehouse

    Fujisaki, Ikuko; Mazzotti, F.J.; Dorazio, R.M.; Rice, K.G.; Cherkiss, M.; Jeffery, B.

    2011-01-01

    Nightlight surveys are commonly used to evaluate status and trends of crocodilian populations, but imperfect detection caused by survey- and location-specific factors makes it difficult to draw population inferences accurately from uncorrected data. We used a two-stage hierarchical model comprising population abundance and detection probability to examine recent abundance trends of American alligators (Alligator mississippiensis) in subareas of Everglades wetlands in Florida using nightlight survey data. During 2001-2008, there were declining trends in abundance of small and/or medium sized animals in a majority of subareas, whereas abundance of large sized animals had either demonstrated an increased or unclear trend. For small and large sized class animals, estimated detection probability declined as water depth increased. Detection probability of small animals was much lower than for larger size classes. The declining trend of smaller alligators may reflect a natural population response to the fluctuating environment of Everglades wetlands under modified hydrology. It may have negative implications for the future of alligator populations in this region, particularly if habitat conditions do not favor recruitment of offspring in the near term. Our study provides a foundation to improve inferences made from nightlight surveys of other crocodilian populations. ?? 2011 US Government.

  15. Straightforward Inference of Ancestry and Admixture Proportions through Ancestry-Informative Insertion Deletion Multiplexing

    PubMed Central

    Pereira, Rui; Phillips, Christopher; Pinto, Nádia; Santos, Carla; dos Santos, Sidney Emanuel Batista; Amorim, António; Carracedo, Ángel; Gusmão, Leonor

    2012-01-01

    Ancestry-informative markers (AIMs) show high allele frequency divergence between different ancestral or geographically distant populations. These genetic markers are especially useful in inferring the likely ancestral origin of an individual or estimating the apportionment of ancestry components in admixed individuals or populations. The study of AIMs is of great interest in clinical genetics research, particularly to detect and correct for population substructure effects in case-control association studies, but also in population and forensic genetics studies. This work presents a set of 46 ancestry-informative insertion deletion polymorphisms selected to efficiently measure population admixture proportions of four different origins (African, European, East Asian and Native American). All markers are analyzed in short fragments (under 230 basepairs) through a single PCR followed by capillary electrophoresis (CE) allowing a very simple one tube PCR-to-CE approach. HGDP-CEPH diversity panel samples from the four groups, together with Oceanians, were genotyped to evaluate the efficiency of the assay in clustering populations from different continental origins and to establish reference databases. In addition, other populations from diverse geographic origins were tested using the HGDP-CEPH samples as reference data. The results revealed that the AIM-INDEL set developed is highly efficient at inferring the ancestry of individuals and provides good estimates of ancestry proportions at the population level. In conclusion, we have optimized the multiplexed genotyping of 46 AIM-INDELs in a simple and informative assay, enabling a more straightforward alternative to the commonly available AIM-SNP typing methods dependent on complex, multi-step protocols or implementation of large-scale genotyping technologies. PMID:22272242

  16. Low but significant genetic differentiation underlies biologically meaningful phenotypic divergence in a large Atlantic salmon population.

    PubMed

    Aykanat, Tutku; Johnston, Susan E; Orell, Panu; Niemelä, Eero; Erkinaro, Jaakko; Primmer, Craig R

    2015-10-01

    Despite decades of research assessing the genetic structure of natural populations, the biological meaning of low yet significant genetic divergence often remains unclear due to a lack of associated phenotypic and ecological information. At the same time, structured populations with low genetic divergence and overlapping boundaries can potentially provide excellent models to study adaptation and reproductive isolation in cases where high-resolution genetic markers and relevant phenotypic and life history information are available. Here, we combined single nucleotide polymorphism (SNP)-based population inference with extensive phenotypic and life history data to identify potential biological mechanisms driving fine-scale subpopulation differentiation in Atlantic salmon (Salmo salar) from the Teno River, a major salmon river in Europe. Two sympatrically occurring subpopulations had low but significant genetic differentiation (FST  = 0.018) and displayed marked differences in the distribution of life history strategies, including variation in juvenile growth rate, age at maturity and size within age classes. Large, late-maturing individuals were virtually absent from one of the two subpopulations, and there were significant differences in juvenile growth rates and size at age after oceanic migration between individuals in the respective subpopulations. Our findings suggest that different evolutionary processes affect each subpopulation and that hybridization and subsequent selection may maintain low genetic differentiation without hindering adaptive divergence. © 2015 John Wiley & Sons Ltd.

  17. Causal inference as an emerging statistical approach in neurology: an example for epilepsy in the elderly.

    PubMed

    Moura, Lidia Mvr; Westover, M Brandon; Kwasnik, David; Cole, Andrew J; Hsu, John

    2017-01-01

    The elderly population faces an increasing number of cases of chronic neurological conditions, such as epilepsy and Alzheimer's disease. Because the elderly with epilepsy are commonly excluded from randomized controlled clinical trials, there are few rigorous studies to guide clinical practice. When the elderly are eligible for trials, they either rarely participate or frequently have poor adherence to therapy, thus limiting both generalizability and validity. In contrast, large observational data sets are increasingly available, but are susceptible to bias when using common analytic approaches. Recent developments in causal inference-analytic approaches also introduce the possibility of emulating randomized controlled trials to yield valid estimates. We provide a practical example of the application of the principles of causal inference to a large observational data set of patients with epilepsy. This review also provides a framework for comparative-effectiveness research in chronic neurological conditions.

  18. Inferring the economic standard of living and health from cohort height: Evidence from modern populations in developing countries.

    PubMed

    Akachi, Yoko; Canning, David

    2015-12-01

    Average adult height is a physical measure of the biological standard of living of a population. While the biological and economic standards of living of a population are very different concepts, they are linked and may empirically move together. If this is so, then cohort heights can also be used to make inferences about the economic standard of living and health of a population when other data are not available. We investigate how informative this approach is in terms of inferring income, nutrition, and mortality using data on heights from developing countries over the last 50 years for female cohorts born 1951-1992. We find no evidence that the absolute differences in adult height across countries are associated with different economic living standards. Within countries, however, faster increases in adult cohort height over time are associated with more rapid growth of GDP per capita, life expectancy, and nutritional intake. Using our instrumental variable approach, each centimeter gain in height is associated with a 6% increase in income per capita, a reduction in infant mortality of 7 per thousand (or an 1.25 year increase in life expectancy), and an increase in nutrition of 64 calories and 2 grams of protein per person per day relative to the global trend. We find that increases in cohort height can predict increases in income even for countries not used in the estimation of the relationship. This suggests our approach has predictive power out of sample for countries where we lack income and health data. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.

  19. Effects of Sample Selection Bias on the Accuracy of Population Structure and Ancestry Inference

    PubMed Central

    Shringarpure, Suyash; Xing, Eric P.

    2014-01-01

    Population stratification is an important task in genetic analyses. It provides information about the ancestry of individuals and can be an important confounder in genome-wide association studies. Public genotyping projects have made a large number of datasets available for study. However, practical constraints dictate that of a geographical/ethnic population, only a small number of individuals are genotyped. The resulting data are a sample from the entire population. If the distribution of sample sizes is not representative of the populations being sampled, the accuracy of population stratification analyses of the data could be affected. We attempt to understand the effect of biased sampling on the accuracy of population structure analysis and individual ancestry recovery. We examined two commonly used methods for analyses of such datasets, ADMIXTURE and EIGENSOFT, and found that the accuracy of recovery of population structure is affected to a large extent by the sample used for analysis and how representative it is of the underlying populations. Using simulated data and real genotype data from cattle, we show that sample selection bias can affect the results of population structure analyses. We develop a mathematical framework for sample selection bias in models for population structure and also proposed a correction for sample selection bias using auxiliary information about the sample. We demonstrate that such a correction is effective in practice using simulated and real data. PMID:24637351

  20. An algorithm for computing the gene tree probability under the multispecies coalescent and its application in the inference of population tree

    PubMed Central

    2016-01-01

    Motivation: Gene tree represents the evolutionary history of gene lineages that originate from multiple related populations. Under the multispecies coalescent model, lineages may coalesce outside the species (population) boundary. Given a species tree (with branch lengths), the gene tree probability is the probability of observing a specific gene tree topology under the multispecies coalescent model. There are two existing algorithms for computing the exact gene tree probability. The first algorithm is due to Degnan and Salter, where they enumerate all the so-called coalescent histories for the given species tree and the gene tree topology. Their algorithm runs in exponential time in the number of gene lineages in general. The second algorithm is the STELLS algorithm (2012), which is usually faster but also runs in exponential time in almost all the cases. Results: In this article, we present a new algorithm, called CompactCH, for computing the exact gene tree probability. This new algorithm is based on the notion of compact coalescent histories: multiple coalescent histories are represented by a single compact coalescent history. The key advantage of our new algorithm is that it runs in polynomial time in the number of gene lineages if the number of populations is fixed to be a constant. The new algorithm is more efficient than the STELLS algorithm both in theory and in practice when the number of populations is small and there are multiple gene lineages from each population. As an application, we show that CompactCH can be applied in the inference of population tree (i.e. the population divergence history) from population haplotypes. Simulation results show that the CompactCH algorithm enables efficient and accurate inference of population trees with much more haplotypes than a previous approach. Availability: The CompactCH algorithm is implemented in the STELLS software package, which is available for download at http

  1. Adaptation in Coding by Large Populations of Neurons in the Retina

    NASA Astrophysics Data System (ADS)

    Ioffe, Mark L.

    A comprehensive theory of neural computation requires an understanding of the statistical properties of the neural population code. The focus of this work is the experimental study and theoretical analysis of the statistical properties of neural activity in the tiger salamander retina. This is an accessible yet complex system, for which we control the visual input and record from a substantial portion--greater than a half--of the ganglion cell population generating the spiking output. Our experiments probe adaptation of the retina to visual statistics: a central feature of sensory systems which have to adjust their limited dynamic range to a far larger space of possible inputs. In Chapter 1 we place our work in context with a brief overview of the relevant background. In Chapter 2 we describe the experimental methodology of recording from 100+ ganglion cells in the tiger salamander retina. In Chapter 3 we first present the measurements of adaptation of individual cells to changes in stimulation statistics and then investigate whether pairwise correlations in fluctuations of ganglion cell activity change across different stimulation conditions. We then transition to a study of the population-level probability distribution of the retinal response captured with maximum-entropy models. Convergence of the model inference is presented in Chapter 4. In Chapter 5 we first test the empirical presence of a phase transition in such models fitting the retinal response to different experimental conditions, and then proceed to develop other characterizations which are sensitive to complexity in the interaction matrix. This includes an analysis of the dynamics of sampling at finite temperature, which demonstrates a range of subtle attractor-like properties in the energy landscape. These are largely conserved when ambient illumination is varied 1000-fold, a result not necessarily apparent from the measured low-order statistics of the distribution. Our results form a consistent

  2. Inferring Recent Demography from Isolation by Distance of Long Shared Sequence Blocks

    PubMed Central

    Ringbauer, Harald; Coop, Graham

    2017-01-01

    Recently it has become feasible to detect long blocks of nearly identical sequence shared between pairs of genomes. These identity-by-descent (IBD) blocks are direct traces of recent coalescence events and, as such, contain ample signal to infer recent demography. Here, we examine sharing of such blocks in two-dimensional populations with local migration. Using a diffusion approximation to trace genetic ancestry, we derive analytical formulas for patterns of isolation by distance of IBD blocks, which can also incorporate recent population density changes. We introduce an inference scheme that uses a composite-likelihood approach to fit these formulas. We then extensively evaluate our theory and inference method on a range of scenarios using simulated data. We first validate the diffusion approximation by showing that the theoretical results closely match the simulated block-sharing patterns. We then demonstrate that our inference scheme can accurately and robustly infer dispersal rate and effective density, as well as bounds on recent dynamics of population density. To demonstrate an application, we use our estimation scheme to explore the fit of a diffusion model to Eastern European samples in the Population Reference Sample data set. We show that ancestry diffusing with a rate of σ≈50−−100 km/gen during the last centuries, combined with accelerating population growth, can explain the observed exponential decay of block sharing with increasing pairwise sample distance. PMID:28108588

  3. Fast half-sibling population reconstruction: theory and algorithms.

    PubMed

    Dexter, Daniel; Brown, Daniel G

    2013-07-12

    Kinship inference is the task of identifying genealogically related individuals. Kinship information is important for determining mating structures, notably in endangered populations. Although many solutions exist for reconstructing full sibling relationships, few exist for half-siblings. We consider the problem of determining whether a proposed half-sibling population reconstruction is valid under Mendelian inheritance assumptions. We show that this problem is NP-complete and provide a 0/1 integer program that identifies the minimum number of individuals that must be removed from a population in order for the reconstruction to become valid. We also present SibJoin, a heuristic-based clustering approach based on Mendelian genetics, which is strikingly fast. The software is available at http://github.com/ddexter/SibJoin.git+. Our SibJoin algorithm is reasonably accurate and thousands of times faster than existing algorithms. The heuristic is used to infer a half-sibling structure for a population which was, until recently, too large to evaluate.

  4. A potential large and persistent black carbon forcing over Northern Pacific inferred from satellite observations

    PubMed Central

    Li, Zhongshu; Liu, Junfeng; Mauzerall, Denise L.; Li, Xiaoyuan; Fan, Songmiao; Horowitz, Larry W.; He, Cenlin; Yi, Kan; Tao, Shu

    2017-01-01

    Black carbon (BC) aerosol strongly absorbs solar radiation, which warms climate. However, accurate estimation of BC’s climate effect is limited by the uncertainties of its spatiotemporal distribution, especially over remote oceanic areas. The HIAPER Pole-to-Pole Observation (HIPPO) program from 2009 to 2011 intercepted multiple snapshots of BC profiles over Pacific in various seasons, and revealed a 2 to 5 times overestimate of BC by current global models. In this study, we compared the measurements from aircraft campaigns and satellites, and found a robust association between BC concentrations and satellite-retrieved CO, tropospheric NO2, and aerosol optical depth (AOD) (R2 > 0.8). This establishes a basis to construct a satellite-based column BC approximation (sBC*) over remote oceans. The inferred sBC* shows that Asian outflows in spring bring much more BC aerosols to the mid-Pacific than those occurring in other seasons. In addition, inter-annual variability of sBC* is seen over the Northern Pacific, with abundances varying consistently with the springtime Pacific/North American (PNA) index. Our sBC* dataset infers a widespread overestimation of BC loadings and BC Direct Radiative Forcing by current models over North Pacific, which further suggests that large uncertainties exist on aerosol-climate interactions over other remote oceanic areas beyond Pacific. PMID:28266532

  5. A potential large and persistent black carbon forcing over Northern Pacific inferred from satellite observations

    NASA Astrophysics Data System (ADS)

    Li, Zhongshu; Liu, Junfeng; Mauzerall, Denise L.; Li, Xiaoyuan; Fan, Songmiao; Horowitz, Larry W.; He, Cenlin; Yi, Kan; Tao, Shu

    2017-03-01

    Black carbon (BC) aerosol strongly absorbs solar radiation, which warms climate. However, accurate estimation of BC’s climate effect is limited by the uncertainties of its spatiotemporal distribution, especially over remote oceanic areas. The HIAPER Pole-to-Pole Observation (HIPPO) program from 2009 to 2011 intercepted multiple snapshots of BC profiles over Pacific in various seasons, and revealed a 2 to 5 times overestimate of BC by current global models. In this study, we compared the measurements from aircraft campaigns and satellites, and found a robust association between BC concentrations and satellite-retrieved CO, tropospheric NO2, and aerosol optical depth (AOD) (R2 > 0.8). This establishes a basis to construct a satellite-based column BC approximation (sBC*) over remote oceans. The inferred sBC* shows that Asian outflows in spring bring much more BC aerosols to the mid-Pacific than those occurring in other seasons. In addition, inter-annual variability of sBC* is seen over the Northern Pacific, with abundances varying consistently with the springtime Pacific/North American (PNA) index. Our sBC* dataset infers a widespread overestimation of BC loadings and BC Direct Radiative Forcing by current models over North Pacific, which further suggests that large uncertainties exist on aerosol-climate interactions over other remote oceanic areas beyond Pacific.

  6. Reconstructing population histories from single nucleotide polymorphism data.

    PubMed

    Sirén, Jukka; Marttinen, Pekka; Corander, Jukka

    2011-01-01

    Population genetics encompasses a strong theoretical and applied research tradition on the multiple demographic processes that shape genetic variation present within a species. When several distinct populations exist in the current generation, it is often natural to consider the pattern of their divergence from a single ancestral population in terms of a binary tree structure. Inference about such population histories based on molecular data has been an intensive research topic in the recent years. The most common approach uses coalescent theory to model genealogies of individuals sampled from the current populations. Such methods are able to compare several different evolutionary scenarios and to estimate demographic parameters. However, their major limitation is the enormous computational complexity associated with the indirect modeling of the demographies, which limits the application to small data sets. Here, we propose a novel Bayesian method for inferring population histories from unlinked single nucleotide polymorphisms, which is applicable also to data sets harboring large numbers of individuals from distinct populations. We use an approximation to the neutral Wright-Fisher diffusion to model random fluctuations in allele frequencies. The population histories are modeled as binary rooted trees that represent the historical order of divergence of the different populations. A combination of analytical, numerical, and Monte Carlo integration techniques are utilized for the inferences. A particularly important feature of our approach is that it provides intuitive measures of statistical uncertainty related with the estimates computed, which may be entirely lacking for the alternative methods in this context. The potential of our approach is illustrated by analyses of both simulated and real data sets.

  7. RENT+: an improved method for inferring local genealogical trees from haplotypes with recombination

    PubMed Central

    Mirzaei, Sajad; Wu, Yufeng

    2017-01-01

    Abstract Motivation: Haplotypes from one or multiple related populations share a common genealogical history. If this shared genealogy can be inferred from haplotypes, it can be very useful for many population genetics problems. However, with the presence of recombination, the genealogical history of haplotypes is complex and cannot be represented by a single genealogical tree. Therefore, inference of genealogical history with recombination is much more challenging than the case of no recombination. Results: In this paper, we present a new approach called RENT+ for the inference of local genealogical trees from haplotypes with the presence of recombination. RENT+ builds on a previous genealogy inference approach called RENT, which infers a set of related genealogical trees at different genomic positions. RENT+ represents a significant improvement over RENT in the sense that it is more effective in extracting information contained in the haplotype data about the underlying genealogy than RENT. The key components of RENT+ are several greatly enhanced genealogy inference rules. Through simulation, we show that RENT+ is more efficient and accurate than several existing genealogy inference methods. As an application, we apply RENT+ in the inference of population demographic history from haplotypes, which outperforms several existing methods. Availability and Implementation: RENT+ is implemented in Java, and is freely available for download from: https://github.com/SajadMirzaei/RentPlus. Contacts: sajad@engr.uconn.edu or ywu@engr.uconn.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28065901

  8. Show Me the Pragmatic Contribution: A Developmental Investigation of Contrastive Inference

    ERIC Educational Resources Information Center

    Kronmuller, Edmundo; Morisseau, Tiffany; Noveck, Ira A.

    2014-01-01

    An utterance such as "Show me the large rabbit" potentially generates a "contrastive inference," i.e., the article "the" and the adjective "large" allow listeners to pragmatically infer the existence of other entities having the same noun (e.g. a "small" rabbit). The primary way to measure…

  9. Demographic inference under the coalescent in a spatial continuum.

    PubMed

    Guindon, Stéphane; Guo, Hongbin; Welch, David

    2016-10-01

    Understanding population dynamics from the analysis of molecular and spatial data requires sound statistical modeling. Current approaches assume that populations are naturally partitioned into discrete demes, thereby failing to be relevant in cases where individuals are scattered on a spatial continuum. Other models predict the formation of increasingly tight clusters of individuals in space, which, again, conflicts with biological evidence. Building on recent theoretical work, we introduce a new genealogy-based inference framework that alleviates these issues. This approach effectively implements a stochastic model in which the distribution of individuals is homogeneous and stationary, thereby providing a relevant null model for the fluctuation of genetic diversity in time and space. Importantly, the spatial density of individuals in a population and their range of dispersal during the course of evolution are two parameters that can be inferred separately with this method. The validity of the new inference framework is confirmed with extensive simulations and the analysis of influenza sequences collected over five seasons in the USA. Copyright © 2016 Elsevier Inc. All rights reserved.

  10. Surfing among species, populations and morphotypes: Inferring boundaries between two species of new world silversides (Atherinopsidae).

    PubMed

    González-Castro, Mariano; Rosso, Juan José; Mabragaña, Ezequiel; Díaz de Astarloa, Juan Martín

    2016-01-01

    Atherinopsidae are widespread freshwater and shallow marine fish with singular economic importance. Morphological, genetical and life cycles differences between marine and estuarine populations were already reported in this family, suggesting ongoing speciation. Also, coexistence and interbreeding between closely related species were documented. The aim of this study was to infer boundaries among: (A) Odontesthes bonariensis and O. argentinensis at species level, and intermediate morphs; (B) the population of O. argentinensis of Mar Chiquita Lagoon and its marine conspecifics. To achieve this, we integrated, meristic, Geometrics Morphometrics and DNA Barcode approaches. Four groups were discriminated and subsequently characterized according to their morphological traits, shape and meristic characters. No shared haplotypes between O. bonariensis and O. argentinensis were found. Significative-meristic and body shape differences between the Mar Chiquita and marine individuals of O. argentinensis were found, suggesting they behave as well differentiated populations, or even incipient ecological species. The fact that the Odontesthes morphotypes shared haplotypes with both, O. argentinensis and O. bonariensis, but also possess meristic and morphometric distinctive traits open new questions related to the origin of this morphogroup. Copyright © 2015 Académie des sciences. Published by Elsevier SAS. All rights reserved.

  11. Inferring the Mode of Selection from the Transient Response to Demographic Perturbations

    NASA Astrophysics Data System (ADS)

    Balick, Daniel; Do, Ron; Reich, David; Sunyaev, Shamil

    2014-03-01

    Despite substantial recent progress in theoretical population genetics, most models work under the assumption of a constant population size. Deviations from fixed population sizes are ubiquitous in natural populations, many of which experience population bottlenecks and re-expansions. The non-equilibrium dynamics introduced by a large perturbation in population size are generally viewed as a confounding factor. In the present work, we take advantage of the transient response to a population bottleneck to infer features of the mode of selection and the distribution of selective effects. We develop an analytic framework and a corresponding statistical test that qualitatively differentiates between alleles under additive and those under recessive or more general epistatic selection. This statistic can be used to bound the joint distribution of selective effects and dominance effects in any diploid sexual organism. We apply this technique to human population genetic data, and severely restrict the space of allowed selective coefficients in humans. Additionally, one can test a set of functionally or medically relevant alleles for the primary mode of selection, or determine the local regional variation in dominance coefficients along the genome.

  12. Automatic physical inference with information maximizing neural networks

    NASA Astrophysics Data System (ADS)

    Charnock, Tom; Lavaux, Guilhem; Wandelt, Benjamin D.

    2018-04-01

    Compressing large data sets to a manageable number of summaries that are informative about the underlying parameters vastly simplifies both frequentist and Bayesian inference. When only simulations are available, these summaries are typically chosen heuristically, so they may inadvertently miss important information. We introduce a simulation-based machine learning technique that trains artificial neural networks to find nonlinear functionals of data that maximize Fisher information: information maximizing neural networks (IMNNs). In test cases where the posterior can be derived exactly, likelihood-free inference based on automatically derived IMNN summaries produces nearly exact posteriors, showing that these summaries are good approximations to sufficient statistics. In a series of numerical examples of increasing complexity and astrophysical relevance we show that IMNNs are robustly capable of automatically finding optimal, nonlinear summaries of the data even in cases where linear compression fails: inferring the variance of Gaussian signal in the presence of noise, inferring cosmological parameters from mock simulations of the Lyman-α forest in quasar spectra, and inferring frequency-domain parameters from LISA-like detections of gravitational waveforms. In this final case, the IMNN summary outperforms linear data compression by avoiding the introduction of spurious likelihood maxima. We anticipate that the automatic physical inference method described in this paper will be essential to obtain both accurate and precise cosmological parameter estimates from complex and large astronomical data sets, including those from LSST and Euclid.

  13. Single nucleotide polymorphism coverage and inference of N-acetyltransferase-2 acetylator phenotypes in wordwide population groups.

    PubMed

    Suarez-Kurtz, Guilherme; Fuchshuber-Moraes, Mateus; Struchiner, Claudio J; Parra, Esteban J

    2016-08-01

    Several algorithms have been proposed to reduce the genotyping effort and cost, while retaining the accuracy of N-acetyltransferase-2 (NAT2) phenotype prediction. Data from the 1000 Genomes (1KG) project and an admixed cohort of Black Brazilians were used to assess the accuracy of NAT2 phenotype prediction using algorithms based on paired single nucleotide polymorphisms (SNPs) (rs1041983 and rs1801280) or a tag SNP (rs1495741). NAT2 haplotypes comprising SNPs rs1801279, rs1041983, rs1801280, rs1799929, rs1799930, rs1208 and rs1799931 were assigned according to the arylamine N-acetyltransferases database. Contingency tables were used to visualize the agreement between the NAT2 acetylator phenotypes on the basis of these haplotypes versus phenotypes inferred by the prediction algorithms. The paired and tag SNP algorithms provided more than 96% agreement with the 7-SNP derived phenotypes in Europeans, East Asians, South Asians and Admixed Americans, but discordance of phenotype prediction occurred in 30.2 and 24.8% 1KG Africans and in 14.4 and 18.6% Black Brazilians, respectively. Paired SNP panel misclassification occurs in carriers of NATs haplotypes *13A (282T alone), *12B (282T and 803G), *6B (590A alone) and *14A (191A alone), whereas haplotype *14, defined by the 191A allele, is the major culprit of misclassification by the tag allele. Both the paired SNP and the tag SNP algorithms may be used, with economy of scale, to infer NAT2 acetylator phenotypes, including the ultra-slow phenotype, in European, East Asian, South Asian and American populations represented in the 1KG cohort. Both algorithms, however, perform poorly in populations of predominant African descent, including admixed African-Americans, African Caribbeans and Black Brazilians.

  14. DESCARTES' RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA.

    PubMed

    Bhaskar, Anand; Song, Yun S

    2014-01-01

    The sample frequency spectrum (SFS) is a widely-used summary statistic of genomic variation in a sample of homologous DNA sequences. It provides a highly efficient dimensional reduction of large-scale population genomic data and its mathematical dependence on the underlying population demography is well understood, thus enabling the development of efficient inference algorithms. However, it has been recently shown that very different population demographies can actually generate the same SFS for arbitrarily large sample sizes. Although in principle this nonidentifiability issue poses a thorny challenge to statistical inference, the population size functions involved in the counterexamples are arguably not so biologically realistic. Here, we revisit this problem and examine the identifiability of demographic models under the restriction that the population sizes are piecewise-defined where each piece belongs to some family of biologically-motivated functions. Under this assumption, we prove that the expected SFS of a sample uniquely determines the underlying demographic model, provided that the sample is sufficiently large. We obtain a general bound on the sample size sufficient for identifiability; the bound depends on the number of pieces in the demographic model and also on the type of population size function in each piece. In the cases of piecewise-constant, piecewise-exponential and piecewise-generalized-exponential models, which are often assumed in population genomic inferences, we provide explicit formulas for the bounds as simple functions of the number of pieces. Lastly, we obtain analogous results for the "folded" SFS, which is often used when there is ambiguity as to which allelic type is ancestral. Our results are proved using a generalization of Descartes' rule of signs for polynomials to the Laplace transform of piecewise continuous functions.

  15. RENT+: an improved method for inferring local genealogical trees from haplotypes with recombination.

    PubMed

    Mirzaei, Sajad; Wu, Yufeng

    2017-04-01

    : Haplotypes from one or multiple related populations share a common genealogical history. If this shared genealogy can be inferred from haplotypes, it can be very useful for many population genetics problems. However, with the presence of recombination, the genealogical history of haplotypes is complex and cannot be represented by a single genealogical tree. Therefore, inference of genealogical history with recombination is much more challenging than the case of no recombination. : In this paper, we present a new approach called RENT+  for the inference of local genealogical trees from haplotypes with the presence of recombination. RENT+  builds on a previous genealogy inference approach called RENT , which infers a set of related genealogical trees at different genomic positions. RENT+  represents a significant improvement over RENT in the sense that it is more effective in extracting information contained in the haplotype data about the underlying genealogy than RENT . The key components of RENT+  are several greatly enhanced genealogy inference rules. Through simulation, we show that RENT+  is more efficient and accurate than several existing genealogy inference methods. As an application, we apply RENT+  in the inference of population demographic history from haplotypes, which outperforms several existing methods. : RENT+  is implemented in Java, and is freely available for download from: https://github.com/SajadMirzaei/RentPlus . : sajad@engr.uconn.edu or ywu@engr.uconn.edu. : Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  16. Ancestral inference from haplotypes and mutations.

    PubMed

    Griffiths, Robert C; Tavaré, Simon

    2018-04-25

    We consider inference about the history of a sample of DNA sequences, conditional upon the haplotype counts and the number of segregating sites observed at the present time. After deriving some theoretical results in the coalescent setting, we implement rejection sampling and importance sampling schemes to perform the inference. The importance sampling scheme addresses an extension of the Ewens Sampling Formula for a configuration of haplotypes and the number of segregating sites in the sample. The implementations include both constant and variable population size models. The methods are illustrated by two human Y chromosome datasets. Copyright © 2018. Published by Elsevier Inc.

  17. Approximate Bayesian computation in large-scale structure: constraining the galaxy-halo connection

    NASA Astrophysics Data System (ADS)

    Hahn, ChangHoon; Vakili, Mohammadjavad; Walsh, Kilian; Hearin, Andrew P.; Hogg, David W.; Campbell, Duncan

    2017-08-01

    Standard approaches to Bayesian parameter inference in large-scale structure assume a Gaussian functional form (chi-squared form) for the likelihood. This assumption, in detail, cannot be correct. Likelihood free inferences such as approximate Bayesian computation (ABC) relax these restrictions and make inference possible without making any assumptions on the likelihood. Instead ABC relies on a forward generative model of the data and a metric for measuring the distance between the model and data. In this work, we demonstrate that ABC is feasible for LSS parameter inference by using it to constrain parameters of the halo occupation distribution (HOD) model for populating dark matter haloes with galaxies. Using specific implementation of ABC supplemented with population Monte Carlo importance sampling, a generative forward model using HOD and a distance metric based on galaxy number density, two-point correlation function and galaxy group multiplicity function, we constrain the HOD parameters of mock observation generated from selected 'true' HOD parameters. The parameter constraints we obtain from ABC are consistent with the 'true' HOD parameters, demonstrating that ABC can be reliably used for parameter inference in LSS. Furthermore, we compare our ABC constraints to constraints we obtain using a pseudo-likelihood function of Gaussian form with MCMC and find consistent HOD parameter constraints. Ultimately, our results suggest that ABC can and should be applied in parameter inference for LSS analyses.

  18. Mutation load and the extinction of large populations

    NASA Astrophysics Data System (ADS)

    Bernardes, A. T.

    1996-02-01

    In the time evolution of finite populations, the accumulation of harmful mutations in further generations might lead to a temporal decay in the mean fitness of the whole population that, after sufficient time, would reduce population size and so lead to extinction. This joint action of mutation load and population reduction is called Mutational Meltdown and is usually considered only to occur in small asexual or very small sexual populations. However, the problem of extinction cannot be discussed in a proper way if one previously assumes the existence of an equilibrium state, as initially discussed in this paper. By performing simulations in a genetically inspired model for time-changing populations, we show that mutational meltdown also occurs in large asexual populations and that the mean time to extinction is a nonmonotonic function of the selection coefficient. The stochasticity of the extinction process is also discussed. The extinction of small sexual N ∼ 700 populations is shown and our results confirm the assumption that the existence of recombination might be a powerful mechanism to avoid extinction.

  19. Application of a time-dependent coalescence process for inferring the history of population size changes from DNA sequence data.

    PubMed

    Polanski, A; Kimmel, M; Chakraborty, R

    1998-05-12

    Distribution of pairwise differences of nucleotides from data on a sample of DNA sequences from a given segment of the genome has been used in the past to draw inferences about the past history of population size changes. However, all earlier methods assume a given model of population size changes (such as sudden expansion), parameters of which (e.g., time and amplitude of expansion) are fitted to the observed distributions of nucleotide differences among pairwise comparisons of all DNA sequences in the sample. Our theory indicates that for any time-dependent population size, N(tau) (in which time tau is counted backward from present), a time-dependent coalescence process yields the distribution, p(tau), of the time of coalescence between two DNA sequences randomly drawn from the population. Prediction of p(tau) and N(tau) requires the use of a reverse Laplace transform known to be unstable. Nevertheless, simulated data obtained from three models of monotone population change (stepwise, exponential, and logistic) indicate that the pattern of a past population size change leaves its signature on the pattern of DNA polymorphism. Application of the theory to the published mtDNA sequences indicates that the current mtDNA sequence variation is not inconsistent with a logistic growth of the human population.

  20. Accurate HLA type inference using a weighted similarity graph.

    PubMed

    Xie, Minzhu; Li, Jing; Jiang, Tao

    2010-12-14

    The human leukocyte antigen system (HLA) contains many highly variable genes. HLA genes play an important role in the human immune system, and HLA gene matching is crucial for the success of human organ transplantations. Numerous studies have demonstrated that variation in HLA genes is associated with many autoimmune, inflammatory and infectious diseases. However, typing HLA genes by serology or PCR is time consuming and expensive, which limits large-scale studies involving HLA genes. Since it is much easier and cheaper to obtain single nucleotide polymorphism (SNP) genotype data, accurate computational algorithms to infer HLA gene types from SNP genotype data are in need. To infer HLA types from SNP genotypes, the first step is to infer SNP haplotypes from genotypes. However, for the same SNP genotype data set, the haplotype configurations inferred by different methods are usually inconsistent, and it is often difficult to decide which one is true. In this paper, we design an accurate HLA gene type inference algorithm by utilizing SNP genotype data from pedigrees, known HLA gene types of some individuals and the relationship between inferred SNP haplotypes and HLA gene types. Given a set of haplotypes inferred from the genotypes of a population consisting of many pedigrees, the algorithm first constructs a weighted similarity graph based on a new haplotype similarity measure and derives constraint edges from known HLA gene types. Based on the principle that different HLA gene alleles should have different background haplotypes, the algorithm searches for an optimal labeling of all the haplotypes with unknown HLA gene types such that the total weight among the same HLA gene types is maximized. To deal with ambiguous haplotype solutions, we use a genetic algorithm to select haplotype configurations that tend to maximize the same optimization criterion. Our experiments on a previously typed subset of the HapMap data show that the algorithm is highly accurate

  1. Clonal population structure of Legionella pneumophila inferred from allelic profiling.

    PubMed

    Edwards, Martin T; Fry, Norman K; Harrison, Timothy G

    2008-03-01

    The population structure of Legionella pneumophila was investigated by analysing nucleotide sequences from six loci (flaA, pilE, asd, mip, mompS and proA) of 335 globally distributed isolates from clinical and environmental sources over a 29-year period (1977-2006). Data were obtained from unrelated isolates from Europe (n=270), Japan (n=31), Canada (n=7), the USA (n=24) and Australia (n=1). The country of origin of two strains was unknown. Analysis of these isolates indicated significant linkage disequilibrium between the six loci. Application of six sequence-based recombination detection tests did not reveal evidence of recombination, but estimates of rates of recombination and mutation made by a seventh test suggested that recombination could have occurred at a rate similar to, but probably lower than, that of mutation. Genealogies inferred under models with and without recombination were congruent with each other, providing no definitive evidence regarding recombination, and were in agreement with sequence clusters identified by graph methods. Further evidence supporting the distinct nature of two of the three subspecies of L. pneumophila, subsp. fraseri and subsp. pascullei, was also found. The ratios of non-synonymous to synonymous nucleotide polymorphisms for each of the allele sets were examined and revealed that the putative virulence loci mompS and pilE are under diversifying pressure, while the allelic regions of three other loci linked to virulence (flaA, proA and mip) do not appear to be.

  2. Inferences of Recent and Ancient Human Population History Using Genetic and Non-Genetic Data

    ERIC Educational Resources Information Center

    Kitchen, Andrew

    2008-01-01

    I have adopted complementary approaches to inferring human demographic history utilizing human and non-human genetic data as well as cultural data. These complementary approaches form an interdisciplinary perspective that allows one to make inferences of human history at varying timescales, from the events that occurred tens of thousands of years…

  3. Life-History Traits of the Miocene Hipparion concudense (Spain) Inferred from Bone Histological Structure

    PubMed Central

    Martinez-Maza, Cayetana; Alberdi, Maria Teresa; Nieto-Diaz, Manuel; Prado, José Luis

    2014-01-01

    Histological analyses of fossil bones have provided clues on the growth patterns and life history traits of several extinct vertebrates that would be unavailable for classical morphological studies. We analyzed the bone histology of Hipparion to infer features of its life history traits and growth pattern. Microscope analysis of thin sections of a large sample of humeri, femora, tibiae and metapodials of Hipparion concudense from the upper Miocene site of Los Valles de Fuentidueña (Segovia, Spain) has shown that the number of growth marks is similar among the different limb bones, suggesting that equivalent skeletochronological inferences for this Hipparion population might be achieved by means of any of the elements studied. Considering their abundance, we conducted a skeletechronological study based on the large sample of third metapodials from Los Valles de Fuentidueña together with another large sample from the Upper Miocene locality of Concud (Teruel, Spain). The data obtained enabled us to distinguish four age groups in both samples and to determine that Hipparion concudense tended to reach skeletal maturity during its third year of life. Integration of bone microstructure and skeletochronological data allowed us to identify ontogenetic changes in bone structure and growth rate and to distinguish three histologic ontogenetic stages corresponding to immature, subadult and adult individuals. Data on secondary osteon density revealed an increase in bone remodeling throughout the ontogenetic stages and a lesser degree thereof in the Concud population, which indicates different biomechanical stresses in the two populations, likely due to environmental differences. Several individuals showed atypical growth patterns in the Concud sample, which may also reflect environmental differences between the two localities. Finally, classification of the specimens’ age within groups enabled us to characterize the age structure of both samples, which is typical of

  4. Inferring personal economic status from social network location

    NASA Astrophysics Data System (ADS)

    Luo, Shaojun; Morone, Flaviano; Sarraute, Carlos; Travizano, Matías; Makse, Hernán A.

    2017-05-01

    It is commonly believed that patterns of social ties affect individuals' economic status. Here we translate this concept into an operational definition at the network level, which allows us to infer the economic well-being of individuals through a measure of their location and influence in the social network. We analyse two large-scale sources: telecommunications and financial data of a whole country's population. Our results show that an individual's location, measured as the optimal collective influence to the structural integrity of the social network, is highly correlated with personal economic status. The observed social network patterns of influence mimic the patterns of economic inequality. For pragmatic use and validation, we carry out a marketing campaign that shows a threefold increase in response rate by targeting individuals identified by our social network metrics as compared to random targeting. Our strategy can also be useful in maximizing the effects of large-scale economic stimulus policies.

  5. Inferring personal economic status from social network location.

    PubMed

    Luo, Shaojun; Morone, Flaviano; Sarraute, Carlos; Travizano, Matías; Makse, Hernán A

    2017-05-16

    It is commonly believed that patterns of social ties affect individuals' economic status. Here we translate this concept into an operational definition at the network level, which allows us to infer the economic well-being of individuals through a measure of their location and influence in the social network. We analyse two large-scale sources: telecommunications and financial data of a whole country's population. Our results show that an individual's location, measured as the optimal collective influence to the structural integrity of the social network, is highly correlated with personal economic status. The observed social network patterns of influence mimic the patterns of economic inequality. For pragmatic use and validation, we carry out a marketing campaign that shows a threefold increase in response rate by targeting individuals identified by our social network metrics as compared to random targeting. Our strategy can also be useful in maximizing the effects of large-scale economic stimulus policies.

  6. The first large population based twin study of coeliac disease

    PubMed Central

    Greco, L; Romino, R; Coto, I; Di Cosmo, N; Percopo, S; Maglio, M; Paparo, F; Gasperi, V; Limongelli, M G; Cotichini, R; D'Agate, C; Tinto, N; Sacchetti, L; Tosi, R; Stazi, M A

    2002-01-01

    Background and aims: The genetic load in coeliac disease has hitherto been inferred from case series or anecdotally referred twin pairs. We have evaluated the genetic component in coeliac disease by estimating the concordance rate for the disease among twin pairs in a large population based study. Methods: The Italian Twin Registry was matched with the membership lists of a patient support group. Forty seven twin pairs were recruited and screened for antiendomysial (EMA) and antihuman-tissue transglutaminase (anti-tTG) antibodies; zygosity was verified by DNA fingerprinting and twins were typed for HLA class II DRB1 and DQB1 molecules. Results: Concordance rates for coeliac disease differ significantly between monozygotic (MZ) (0.86 probandwise and 0.75 pairwise) and dizygotic (DZ) (0.20 probandwise and 0.11 pairwise) twins. This is the highest concordance so far reported for a multifactorial disease. A logistic regression model, adjusted for age, sex, number of shared HLA haplotypes, and zygosity, showed that genotypes DQA1*0501/DQB1*0201 and DQA1*0301/DQB1*0302 (encoding for heterodimers DQ2 and DQ8, respectively) conferred to the non-index twin a risk of contracting the disease of 3.3 and 1.4, respectively. The risk of being concordant for coeliac disease estimated for the non-index twin of MZ pairs was 17 (95% confidence interval 2.1–134), independent of the DQ at risk genotype. Conclusion: This study provides substantial evidence for a very strong genetic component in coeliac disease, which is only partially due to the HLA region. PMID:11950806

  7. multi-dice: r package for comparative population genomic inference under hierarchical co-demographic models of independent single-population size changes.

    PubMed

    Xue, Alexander T; Hickerson, Michael J

    2017-11-01

    Population genetic data from multiple taxa can address comparative phylogeographic questions about community-scale response to environmental shifts, and a useful strategy to this end is to employ hierarchical co-demographic models that directly test multi-taxa hypotheses within a single, unified analysis. This approach has been applied to classical phylogeographic data sets such as mitochondrial barcodes as well as reduced-genome polymorphism data sets that can yield 10,000s of SNPs, produced by emergent technologies such as RAD-seq and GBS. A strategy for the latter had been accomplished by adapting the site frequency spectrum to a novel summarization of population genomic data across multiple taxa called the aggregate site frequency spectrum (aSFS), which potentially can be deployed under various inferential frameworks including approximate Bayesian computation, random forest and composite likelihood optimization. Here, we introduce the r package multi-dice, a wrapper program that exploits existing simulation software for flexible execution of hierarchical model-based inference using the aSFS, which is derived from reduced genome data, as well as mitochondrial data. We validate several novel software features such as applying alternative inferential frameworks, enforcing a minimal threshold of time surrounding co-demographic pulses and specifying flexible hyperprior distributions. In sum, multi-dice provides comparative analysis within the familiar R environment while allowing a high degree of user customization, and will thus serve as a tool for comparative phylogeography and population genomics. © 2017 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.

  8. A parametric interpretation of Bayesian Nonparametric Inference from Gene Genealogies: Linking ecological, population genetics and evolutionary processes.

    PubMed

    Ponciano, José Miguel

    2017-11-22

    Using a nonparametric Bayesian approach Palacios and Minin (2013) dramatically improved the accuracy, precision of Bayesian inference of population size trajectories from gene genealogies. These authors proposed an extension of a Gaussian Process (GP) nonparametric inferential method for the intensity function of non-homogeneous Poisson processes. They found that not only the statistical properties of the estimators were improved with their method, but also, that key aspects of the demographic histories were recovered. The authors' work represents the first Bayesian nonparametric solution to this inferential problem because they specify a convenient prior belief without a particular functional form on the population trajectory. Their approach works so well and provides such a profound understanding of the biological process, that the question arises as to how truly "biology-free" their approach really is. Using well-known concepts of stochastic population dynamics, here I demonstrate that in fact, Palacios and Minin's GP model can be cast as a parametric population growth model with density dependence and environmental stochasticity. Making this link between population genetics and stochastic population dynamics modeling provides novel insights into eliciting biologically meaningful priors for the trajectory of the effective population size. The results presented here also bring novel understanding of GP as models for the evolution of a trait. Thus, the ecological principles foundation of Palacios and Minin (2013)'s prior adds to the conceptual and scientific value of these authors' inferential approach. I conclude this note by listing a series of insights brought about by this connection with Ecology. Copyright © 2017 The Author. Published by Elsevier Inc. All rights reserved.

  9. Genetic history of the population of Corsica (western Mediterranean) as inferred from autosomal STR analysis.

    PubMed

    Tofanelli, Sergio; Taglioli, Luca; Varesi, Laurent; Paoli, Giorgio

    2004-04-01

    To genetically reconstruct the demographic history of the human population of Corsica (western Mediterranean), we analyzed the variability at eight autosomal STR loci (FES, VWA, CSF1PO, TH01, F13A1, TPOX, CD4, and D3S1358) in a sample of 179 native blood donors from 4 out of the 5 administrative districts. The main line of genetic discontinuity inferred from the spatial distribution of STR variability overlapped the linguistic and geographic boundaries. In the innermost areas (Corte district) several estimators had larger stochastic effects on allele frequencies. Genetic distance measures underlying different evolutionary models all pointed to a higher variability within Corsicans than within the rest of the Mediterranean reference populations. All Corsican subsamples showed the highest distance with a pooled sample from central Sardinia, thus making recent gene flow between the two neighboring islands unlikely. Hierarchical AMOVA and distance-based multivariate genetic spaces stressed the closeness of Tuscan and Corsican frequency distributions, which could reflect peopling events with different time depths. Anyway, estimated separation times well support the linguistic hypothesis that Neolithic/Chalcolithic events have been far more important than Paleolithic or historical processes in the shaping of present Corsican variability.

  10. Large-scale inference of gene function through phylogenetic annotation of Gene Ontology terms: case study of the apoptosis and autophagy cellular processes.

    PubMed

    Feuermann, Marc; Gaudet, Pascale; Mi, Huaiyu; Lewis, Suzanna E; Thomas, Paul D

    2016-01-01

    We previously reported a paradigm for large-scale phylogenomic analysis of gene families that takes advantage of the large corpus of experimentally supported Gene Ontology (GO) annotations. This 'GO Phylogenetic Annotation' approach integrates GO annotations from evolutionarily related genes across ∼100 different organisms in the context of a gene family tree, in which curators build an explicit model of the evolution of gene functions. GO Phylogenetic Annotation models the gain and loss of functions in a gene family tree, which is used to infer the functions of uncharacterized (or incompletely characterized) gene products, even for human proteins that are relatively well studied. Here, we report our results from applying this paradigm to two well-characterized cellular processes, apoptosis and autophagy. This revealed several important observations with respect to GO annotations and how they can be used for function inference. Notably, we applied only a small fraction of the experimentally supported GO annotations to infer function in other family members. The majority of other annotations describe indirect effects, phenotypes or results from high throughput experiments. In addition, we show here how feedback from phylogenetic annotation leads to significant improvements in the PANTHER trees, the GO annotations and GO itself. Thus GO phylogenetic annotation both increases the quantity and improves the accuracy of the GO annotations provided to the research community. We expect these phylogenetically based annotations to be of broad use in gene enrichment analysis as well as other applications of GO annotations.Database URL: http://amigo.geneontology.org/amigo. © The Author(s) 2016. Published by Oxford University Press.

  11. Reward inference by primate prefrontal and striatal neurons.

    PubMed

    Pan, Xiaochuan; Fan, Hongwei; Sawa, Kosuke; Tsuda, Ichiro; Tsukada, Minoru; Sakagami, Masamichi

    2014-01-22

    The brain contains multiple yet distinct systems involved in reward prediction. To understand the nature of these processes, we recorded single-unit activity from the lateral prefrontal cortex (LPFC) and the striatum in monkeys performing a reward inference task using an asymmetric reward schedule. We found that neurons both in the LPFC and in the striatum predicted reward values for stimuli that had been previously well experienced with set reward quantities in the asymmetric reward task. Importantly, these LPFC neurons could predict the reward value of a stimulus using transitive inference even when the monkeys had not yet learned the stimulus-reward association directly; whereas these striatal neurons did not show such an ability. Nevertheless, because there were two set amounts of reward (large and small), the selected striatal neurons were able to exclusively infer the reward value (e.g., large) of one novel stimulus from a pair after directly experiencing the alternative stimulus with the other reward value (e.g., small). Our results suggest that although neurons that predict reward value for old stimuli in the LPFC could also do so for new stimuli via transitive inference, those in the striatum could only predict reward for new stimuli via exclusive inference. Moreover, the striatum showed more complex functions than was surmised previously for model-free learning.

  12. Reward Inference by Primate Prefrontal and Striatal Neurons

    PubMed Central

    Pan, Xiaochuan; Fan, Hongwei; Sawa, Kosuke; Tsuda, Ichiro; Tsukada, Minoru

    2014-01-01

    The brain contains multiple yet distinct systems involved in reward prediction. To understand the nature of these processes, we recorded single-unit activity from the lateral prefrontal cortex (LPFC) and the striatum in monkeys performing a reward inference task using an asymmetric reward schedule. We found that neurons both in the LPFC and in the striatum predicted reward values for stimuli that had been previously well experienced with set reward quantities in the asymmetric reward task. Importantly, these LPFC neurons could predict the reward value of a stimulus using transitive inference even when the monkeys had not yet learned the stimulus–reward association directly; whereas these striatal neurons did not show such an ability. Nevertheless, because there were two set amounts of reward (large and small), the selected striatal neurons were able to exclusively infer the reward value (e.g., large) of one novel stimulus from a pair after directly experiencing the alternative stimulus with the other reward value (e.g., small). Our results suggest that although neurons that predict reward value for old stimuli in the LPFC could also do so for new stimuli via transitive inference, those in the striatum could only predict reward for new stimuli via exclusive inference. Moreover, the striatum showed more complex functions than was surmised previously for model-free learning. PMID:24453328

  13. Bayesian Inference for Generalized Linear Models for Spiking Neurons

    PubMed Central

    Gerwinn, Sebastian; Macke, Jakob H.; Bethge, Matthias

    2010-01-01

    Generalized Linear Models (GLMs) are commonly used statistical methods for modelling the relationship between neural population activity and presented stimuli. When the dimension of the parameter space is large, strong regularization has to be used in order to fit GLMs to datasets of realistic size without overfitting. By imposing properly chosen priors over parameters, Bayesian inference provides an effective and principled approach for achieving regularization. Here we show how the posterior distribution over model parameters of GLMs can be approximated by a Gaussian using the Expectation Propagation algorithm. In this way, we obtain an estimate of the posterior mean and posterior covariance, allowing us to calculate Bayesian confidence intervals that characterize the uncertainty about the optimal solution. From the posterior we also obtain a different point estimate, namely the posterior mean as opposed to the commonly used maximum a posteriori estimate. We systematically compare the different inference techniques on simulated as well as on multi-electrode recordings of retinal ganglion cells, and explore the effects of the chosen prior and the performance measure used. We find that good performance can be achieved by choosing an Laplace prior together with the posterior mean estimate. PMID:20577627

  14. Inference and quantification of peptidoforms in large sample cohorts by SWATH-MS

    PubMed Central

    Röst, Hannes L; Ludwig, Christina; Buil, Alfonso; Bensimon, Ariel; Soste, Martin; Spector, Tim D; Dermitzakis, Emmanouil T; Collins, Ben C; Malmström, Lars; Aebersold, Ruedi

    2017-01-01

    The consistent detection and quantification of protein post-translational modifications (PTMs) across sample cohorts is an essential prerequisite for the functional analysis of biological processes. Data-independent acquisition (DIA), a bottom-up mass spectrometry based proteomic strategy, exemplified by SWATH-MS, provides complete precursor and fragment ion information of a sample and thus, in principle, the information to identify peptidoforms, the modified variants of a peptide. However, due to the convoluted structure of DIA data sets the confident and systematic identification and quantification of peptidoforms has remained challenging. Here we present IPF (Inference of PeptidoForms), a fully automated algorithm that uses spectral libraries to query, validate and quantify peptidoforms in DIA data sets. The method was developed on data acquired by SWATH-MS and benchmarked using a synthetic phosphopeptide reference data set and phosphopeptide-enriched samples. The data indicate that IPF reduced false site-localization by more than 7-fold in comparison to previous approaches, while recovering 85.4% of the true signals. IPF was applied to detect and quantify peptidoforms carrying ten different types of PTMs in DIA data acquired from more than 200 samples of undepleted blood plasma of a human twin cohort. The data approportioned, for the first time, the contribution of heritable, environmental and longitudinal effects on the observed quantitative variability of specific modifications in blood plasma of a human population. PMID:28604659

  15. Reconstructing a Large-Scale Population for Social Simulation

    NASA Astrophysics Data System (ADS)

    Fan, Zongchen; Meng, Rongqing; Ge, Yuanzheng; Qiu, Xiaogang

    The advent of social simulation has provided an opportunity to research on social systems. More and more researchers tend to describe the components of social systems in a more detailed level. Any simulation needs the support of population data to initialize and implement the simulation systems. However, it's impossible to get the data which provide full information about individuals and households. We propose a two-step method to reconstruct a large-scale population for a Chinese city according to Chinese culture. Firstly, a baseline population is generated through gathering individuals into households one by one; secondly, social relationships such as friendship are assigned to the baseline population. Through a case study, a population of 3,112,559 individuals gathered in 1,133,835 households is reconstructed for Urumqi city, and the results show that the generated data can respect the real data quite well. The generated data can be applied to support modeling of some social phenomenon.

  16. Inferring time derivatives including cell growth rates using Gaussian processes

    NASA Astrophysics Data System (ADS)

    Swain, Peter S.; Stevenson, Keiran; Leary, Allen; Montano-Gutierrez, Luis F.; Clark, Ivan B. N.; Vogel, Jackie; Pilizota, Teuta

    2016-12-01

    Often the time derivative of a measured variable is of as much interest as the variable itself. For a growing population of biological cells, for example, the population's growth rate is typically more important than its size. Here we introduce a non-parametric method to infer first and second time derivatives as a function of time from time-series data. Our approach is based on Gaussian processes and applies to a wide range of data. In tests, the method is at least as accurate as others, but has several advantages: it estimates errors both in the inference and in any summary statistics, such as lag times, and allows interpolation with the corresponding error estimation. As illustrations, we infer growth rates of microbial cells, the rate of assembly of an amyloid fibril and both the speed and acceleration of two separating spindle pole bodies. Our algorithm should thus be broadly applicable.

  17. HLA Type Inference via Haplotypes Identical by Descent

    NASA Astrophysics Data System (ADS)

    Setty, Manu N.; Gusev, Alexander; Pe'Er, Itsik

    The Human Leukocyte Antigen (HLA) genes play a major role in adaptive immune response and are used to differentiate self antigens from non self ones. HLA genes are hyper variable with nearly every locus harboring over a dozen alleles. This variation plays an important role in susceptibility to multiple autoimmune diseases and needs to be matched on for organ transplantation. Unfortunately, HLA typing by serological methods is time consuming and expensive compared to high throughput Single Nucleotide Polymorphism (SNP) data. We present a new computational method to infer per-locus HLA types using shared segments Identical By Descent (IBD), inferred from SNP genotype data. IBD information is modeled as graph where shared haplotypes are explored among clusters of individuals with known and unknown HLA types to identify the latter. We analyze performance of the method in a previously typed subset of the HapMap population, achieving accuracy of 96% in HLA-A, 94% in HLA-B, 95% in HLA-C, 77% in HLA-DR1, 93% in HLA-DQA1 and 90% in HLA-DQB1 genes. We compare our method to a tag SNP based approach and demonstrate higher sensitivity and specificity. Our method demonstrates the power of using shared haplotype segments for large-scale imputation at the HLA locus.

  18. Efficient probabilistic inference in generic neural networks trained with non-probabilistic feedback.

    PubMed

    Orhan, A Emin; Ma, Wei Ji

    2017-07-26

    Animals perform near-optimal probabilistic inference in a wide range of psychophysical tasks. Probabilistic inference requires trial-to-trial representation of the uncertainties associated with task variables and subsequent use of this representation. Previous work has implemented such computations using neural networks with hand-crafted and task-dependent operations. We show that generic neural networks trained with a simple error-based learning rule perform near-optimal probabilistic inference in nine common psychophysical tasks. In a probabilistic categorization task, error-based learning in a generic network simultaneously explains a monkey's learning curve and the evolution of qualitative aspects of its choice behavior. In all tasks, the number of neurons required for a given level of performance grows sublinearly with the input population size, a substantial improvement on previous implementations of probabilistic inference. The trained networks develop a novel sparsity-based probabilistic population code. Our results suggest that probabilistic inference emerges naturally in generic neural networks trained with error-based learning rules.Behavioural tasks often require probability distributions to be inferred about task specific variables. Here, the authors demonstrate that generic neural networks can be trained using a simple error-based learning rule to perform such probabilistic computations efficiently without any need for task specific operations.

  19. Statistics for nuclear engineers and scientists. Part 1. Basic statistical inference

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Beggs, W.J.

    1981-02-01

    This report is intended for the use of engineers and scientists working in the nuclear industry, especially at the Bettis Atomic Power Laboratory. It serves as the basis for several Bettis in-house statistics courses. The objectives of the report are to introduce the reader to the language and concepts of statistics and to provide a basic set of techniques to apply to problems of the collection and analysis of data. Part 1 covers subjects of basic inference. The subjects include: descriptive statistics; probability; simple inference for normally distributed populations, and for non-normal populations as well; comparison of two populations; themore » analysis of variance; quality control procedures; and linear regression analysis.« less

  20. The Genealogical Population Dynamics of HIV-1 in a Large Transmission Chain: Bridging within and among Host Evolutionary Rates

    PubMed Central

    Vrancken, Bram; Rambaut, Andrew; Suchard, Marc A.; Drummond, Alexei; Baele, Guy; Derdelinckx, Inge; Van Wijngaerden, Eric; Vandamme, Anne-Mieke; Van Laethem, Kristel; Lemey, Philippe

    2014-01-01

    Transmission lies at the interface of human immunodeficiency virus type 1 (HIV-1) evolution within and among hosts and separates distinct selective pressures that impose differences in both the mode of diversification and the tempo of evolution. In the absence of comprehensive direct comparative analyses of the evolutionary processes at different biological scales, our understanding of how fast within-host HIV-1 evolutionary rates translate to lower rates at the between host level remains incomplete. Here, we address this by analyzing pol and env data from a large HIV-1 subtype C transmission chain for which both the timing and the direction is known for most transmission events. To this purpose, we develop a new transmission model in a Bayesian genealogical inference framework and demonstrate how to constrain the viral evolutionary history to be compatible with the transmission history while simultaneously inferring the within-host evolutionary and population dynamics. We show that accommodating a transmission bottleneck affords the best fit our data, but the sparse within-host HIV-1 sampling prevents accurate quantification of the concomitant loss in genetic diversity. We draw inference under the transmission model to estimate HIV-1 evolutionary rates among epidemiologically-related patients and demonstrate that they lie in between fast intra-host rates and lower rates among epidemiologically unrelated individuals infected with HIV subtype C. Using a new molecular clock approach, we quantify and find support for a lower evolutionary rate along branches that accommodate a transmission event or branches that represent the entire backbone of transmitted lineages in our transmission history. Finally, we recover the rate differences at the different biological scales for both synonymous and non-synonymous substitution rates, which is only compatible with the ‘store and retrieve’ hypothesis positing that viruses stored early in latently infected cells

  1. DESCARTES’ RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA1

    PubMed Central

    Bhaskar, Anand; Song, Yun S.

    2016-01-01

    The sample frequency spectrum (SFS) is a widely-used summary statistic of genomic variation in a sample of homologous DNA sequences. It provides a highly efficient dimensional reduction of large-scale population genomic data and its mathematical dependence on the underlying population demography is well understood, thus enabling the development of efficient inference algorithms. However, it has been recently shown that very different population demographies can actually generate the same SFS for arbitrarily large sample sizes. Although in principle this nonidentifiability issue poses a thorny challenge to statistical inference, the population size functions involved in the counterexamples are arguably not so biologically realistic. Here, we revisit this problem and examine the identifiability of demographic models under the restriction that the population sizes are piecewise-defined where each piece belongs to some family of biologically-motivated functions. Under this assumption, we prove that the expected SFS of a sample uniquely determines the underlying demographic model, provided that the sample is sufficiently large. We obtain a general bound on the sample size sufficient for identifiability; the bound depends on the number of pieces in the demographic model and also on the type of population size function in each piece. In the cases of piecewise-constant, piecewise-exponential and piecewise-generalized-exponential models, which are often assumed in population genomic inferences, we provide explicit formulas for the bounds as simple functions of the number of pieces. Lastly, we obtain analogous results for the “folded” SFS, which is often used when there is ambiguity as to which allelic type is ancestral. Our results are proved using a generalization of Descartes’ rule of signs for polynomials to the Laplace transform of piecewise continuous functions. PMID:28018011

  2. Genetic rescue of an insular population of large mammals

    PubMed Central

    Hogg, John T; Forbes, Stephen H; Steele, Brian M; Luikart, Gordon

    2006-01-01

    Natural populations worldwide are increasingly fragmented by habitat loss. Isolation at small population size is thought to reduce individual and population fitness via inbreeding depression. However, little is known about the time-scale over which adverse genetic effects may develop in natural populations or the number and types of traits likely to be affected. The benefits of restoring gene flow to isolates are therefore also largely unknown. In contrast, the potential costs of migration (e.g. disease spread) are readily apparent. Management for ecological connectivity has therefore been controversial and sometimes avoided. Using pedigree and life-history data collected during 25 years of study, we evaluated genetic decline and rescue in a population of bighorn sheep founded by 12 individuals in 1922 and isolated at an average size of 42 animals for 10–12 generations. Immigration was restored experimentally, beginning in 1985. We detected marked improvements in reproduction, survival and five fitness-related traits among descendants of the 15 recent migrants. Trait values were increased by 23–257% in maximally outbred individuals. This is the first demonstration, to our knowledge, of increased male and female fitness attributable to outbreeding realized in a fully competitive natural setting. Our findings suggest that genetic principles deserve broader recognition as practical management tools with near-term consequences for large-mammal conservation. PMID:16777743

  3. Different Evolutionary Paths to Complexity for Small and Large Populations of Digital Organisms

    PubMed Central

    2016-01-01

    A major aim of evolutionary biology is to explain the respective roles of adaptive versus non-adaptive changes in the evolution of complexity. While selection is certainly responsible for the spread and maintenance of complex phenotypes, this does not automatically imply that strong selection enhances the chance for the emergence of novel traits, that is, the origination of complexity. Population size is one parameter that alters the relative importance of adaptive and non-adaptive processes: as population size decreases, selection weakens and genetic drift grows in importance. Because of this relationship, many theories invoke a role for population size in the evolution of complexity. Such theories are difficult to test empirically because of the time required for the evolution of complexity in biological populations. Here, we used digital experimental evolution to test whether large or small asexual populations tend to evolve greater complexity. We find that both small and large—but not intermediate-sized—populations are favored to evolve larger genomes, which provides the opportunity for subsequent increases in phenotypic complexity. However, small and large populations followed different evolutionary paths towards these novel traits. Small populations evolved larger genomes by fixing slightly deleterious insertions, while large populations fixed rare beneficial insertions that increased genome size. These results demonstrate that genetic drift can lead to the evolution of complexity in small populations and that purifying selection is not powerful enough to prevent the evolution of complexity in large populations. PMID:27923053

  4. A review of causal inference for biomedical informatics

    PubMed Central

    Kleinberg, Samantha; Hripcsak, George

    2011-01-01

    Causality is an important concept throughout the health sciences and is particularly vital for informatics work such as finding adverse drug events or risk factors for disease using electronic health records. While philosophers and scientists working for centuries on formalizing what makes something a cause have not reached a consensus, new methods for inference show that we can make progress in this area in many practical cases. This article reviews core concepts in understanding and identifying causality and then reviews current computational methods for inference and explanation, focusing on inference from large-scale observational data. While the problem is not fully solved, we show that graphical models and Granger causality provide useful frameworks for inference and that a more recent approach based on temporal logic addresses some of the limitations of these methods. PMID:21782035

  5. A Framework for Inferring Taxonomic Class of Asteroids.

    NASA Technical Reports Server (NTRS)

    Dotson, J. L.; Mathias, D. L.

    2017-01-01

    Introduction: Taxonomic classification of asteroids based on their visible / near-infrared spectra or multi band photometry has proven to be a useful tool to infer other properties about asteroids. Meteorite analogs have been identified for several taxonomic classes, permitting detailed inference about asteroid composition. Trends have been identified between taxonomy and measured asteroid density. Thanks to NEOWise (Near-Earth-Object Wide-field Infrared Survey Explorer) and Spitzer (Spitzer Space Telescope), approximately twice as many asteroids have measured albedos than the number with taxonomic classifications. (If one only considers spectroscopically determined classifications, the ratio is greater than 40.) We present a Bayesian framework that provides probabilistic estimates of the taxonomic class of an asteroid based on its albedo. Although probabilistic estimates of taxonomic classes are not a replacement for spectroscopic or photometric determinations, they can be a useful tool for identifying objects for further study or for asteroid threat assessment models. Inputs and Framework: The framework relies upon two inputs: the expected fraction of each taxonomic class in the population and the albedo distribution of each class. Luckily, numerous authors have addressed both of these questions. For example, the taxonomic distribution by number, surface area and mass of the main belt has been estimated and a diameter limited estimate of fractional abundances of the near earth asteroid population was made. Similarly, the albedo distributions for taxonomic classes have been estimated for the combined main belt and NEA (Near Earth Asteroid) populations in different taxonomic systems and for the NEA population specifically. The framework utilizes a Bayesian inference appropriate for categorical data. The population fractions provide the prior while the albedo distributions allow calculation of the likelihood an albedo measurement is consistent with a given taxonomic

  6. Ensemble stacking mitigates biases in inference of synaptic connectivity.

    PubMed

    Chambers, Brendan; Levy, Maayan; Dechery, Joseph B; MacLean, Jason N

    2018-01-01

    A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches.

  7. Design-based and model-based inference in surveys of freshwater mollusks

    USGS Publications Warehouse

    Dorazio, R.M.

    1999-01-01

    Well-known concepts in statistical inference and sampling theory are used to develop recommendations for planning and analyzing the results of quantitative surveys of freshwater mollusks. Two methods of inference commonly used in survey sampling (design-based and model-based) are described and illustrated using examples relevant in surveys of freshwater mollusks. The particular objectives of a survey and the type of information observed in each unit of sampling can be used to help select the sampling design and the method of inference. For example, the mean density of a sparsely distributed population of mollusks can be estimated with higher precision by using model-based inference or by using design-based inference with adaptive cluster sampling than by using design-based inference with conventional sampling. More experience with quantitative surveys of natural assemblages of freshwater mollusks is needed to determine the actual benefits of different sampling designs and inferential procedures.

  8. Sampling through time and phylodynamic inference with coalescent and birth–death models

    PubMed Central

    Volz, Erik M.; Frost, Simon D. W.

    2014-01-01

    Many population genetic models have been developed for the purpose of inferring population size and growth rates from random samples of genetic data. We examine two popular approaches to this problem, the coalescent and the birth–death-sampling model (BDM), in the context of estimating population size and birth rates in a population growing exponentially according to the birth–death branching process. For sequences sampled at a single time, we found the coalescent and the BDM gave virtually indistinguishable results in terms of the growth rates and fraction of the population sampled, even when sampling from a small population. For sequences sampled at multiple time points, we find that the birth–death model estimators are subject to large bias if the sampling process is misspecified. Since BDMs incorporate a model of the sampling process, we show how much of the statistical power of BDMs arises from the sequence of sample times and not from the genealogical tree. This motivates the development of a new coalescent estimator, which is augmented with a model of the known sampling process and is potentially more precise than the coalescent that does not use sample time information. PMID:25401173

  9. Bayesian Inference and Online Learning in Poisson Neuronal Networks.

    PubMed

    Huang, Yanping; Rao, Rajesh P N

    2016-08-01

    Motivated by the growing evidence for Bayesian computation in the brain, we show how a two-layer recurrent network of Poisson neurons can perform both approximate Bayesian inference and learning for any hidden Markov model. The lower-layer sensory neurons receive noisy measurements of hidden world states. The higher-layer neurons infer a posterior distribution over world states via Bayesian inference from inputs generated by sensory neurons. We demonstrate how such a neuronal network with synaptic plasticity can implement a form of Bayesian inference similar to Monte Carlo methods such as particle filtering. Each spike in a higher-layer neuron represents a sample of a particular hidden world state. The spiking activity across the neural population approximates the posterior distribution over hidden states. In this model, variability in spiking is regarded not as a nuisance but as an integral feature that provides the variability necessary for sampling during inference. We demonstrate how the network can learn the likelihood model, as well as the transition probabilities underlying the dynamics, using a Hebbian learning rule. We present results illustrating the ability of the network to perform inference and learning for arbitrary hidden Markov models.

  10. Population Genetic Analysis Infers Migration Pathways of Phytophthora ramorum in US Nurseries

    PubMed Central

    Goss, Erica M.; Larsen, Meg; Chastagner, Gary A.; Givens, Donald R.; Grünwald, Niklaus J.

    2009-01-01

    Recently introduced, exotic plant pathogens may exhibit low genetic diversity and be limited to clonal reproduction. However, rapidly mutating molecular markers such as microsatellites can reveal genetic variation within these populations and be used to model putative migration patterns. Phytophthora ramorum is the exotic pathogen, discovered in the late 1990s, that is responsible for sudden oak death in California forests and ramorum blight of common ornamentals. The nursery trade has moved this pathogen from source populations on the West Coast to locations across the United States, thus risking introduction to other native forests. We examined the genetic diversity of P. ramorum in United States nurseries by microsatellite genotyping 279 isolates collected from 19 states between 2004 and 2007. Of the three known P. ramorum clonal lineages, the most common and genetically diverse lineage in the sample was NA1. Two eastward migration pathways were revealed in the clustering of NA1 isolates into two groups, one containing isolates from Connecticut, Oregon, and Washington and the other isolates from California and the remaining states. This finding is consistent with trace forward analyses conducted by the US Department of Agriculture's Animal and Plant Health Inspection Service. At the same time, genetic diversities in several states equaled those observed in California, Oregon, and Washington and two-thirds of multilocus genotypes exhibited limited geographic distributions, indicating that mutation was common during or subsequent to migration. Together, these data suggest that migration, rapid mutation, and genetic drift all play a role in structuring the genetic diversity of P. ramorum in US nurseries. This work demonstrates that fast-evolving genetic markers can be used to examine the evolutionary processes acting on recently introduced pathogens and to infer their putative migration patterns, thus showing promise for the application of forensics to plant

  11. Statistical learning and selective inference.

    PubMed

    Taylor, Jonathan; Tibshirani, Robert J

    2015-06-23

    We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.

  12. Argentine Population Genetic Structure: Large Variance in Amerindian Contribution

    PubMed Central

    Seldin, Michael F.; Tian, Chao; Shigeta, Russell; Scherbarth, Hugo R.; Silva, Gabriel; Belmont, John W.; Kittles, Rick; Gamron, Susana; Allevi, Alberto; Palatnik, Simon A.; Alvarellos, Alejandro; Paira, Sergio; Caprarulo, Cesar; Guillerón, Carolina; Catoggio, Luis J.; Prigione, Cristina; Berbotto, Guillermo A.; García, Mercedes A.; Perandones, Carlos E.; Pons-Estel, Bernardo A.; Alarcon-Riquelme, Marta E.

    2011-01-01

    Argentine population genetic structure was examined using a set of 78 ancestry informative markers (AIMs) to assess the contributions of European, Amerindian, and African ancestry in 94 individuals members of this population. Using the Bayesian clustering algorithm STRUCTURE, the mean European contribution was 78%, the Amerindian contribution was 19.4%, and the African contribution was 2.5%. Similar results were found using weighted least mean square method: European, 80.2%; Amerindian, 18.1%; and African, 1.7%. Consistent with previous studies the current results showed very few individuals (four of 94) with greater than 10% African admixture. Notably, when individual admixture was examined, the Amerindian and European admixture showed a very large variance and individual Amerindian contribution ranged from 1.5 to 84.5% in the 94 individual Argentine subjects. These results indicate that admixture must be considered when clinical epidemiology or case control genetic analyses are studied in this population. Moreover, the current study provides a set of informative SNPs that can be used to ascertain or control for this potentially hidden stratification. In addition, the large variance in admixture proportions in individual Argentine subjects shown by this study suggests that this population is appropriate for future admixture mapping studies. PMID:17177183

  13. Inferences about binary stellar populations using gravitational wave observations

    NASA Astrophysics Data System (ADS)

    Wysocki, Daniel; Gerosa, Davide; O'Shaughnessy, Richard; Belczynski, Krzysztof; Gladysz, Wojciech; Berti, Emanuele; Kesden, Michael; Holz, Daniel

    2018-01-01

    With the dawn of gravitational wave astronomy, enabled by the LIGO and Virgo interferometers, we now have a new window into the Universe. In the short time these detectors have been in use, multiple confirmed detections of gravitational waves from compact binary coalescences have been made. Stellar binary systems are one of the likely progenitors of the observed compact binary sources. If this is indeed the case, then we can use measured properties of these binary systems to learn about their progenitors. We will discuss the Bayesian framework in which we make these inferences, and results which include mass and spin distributions.

  14. Multilocus nuclear DNA markers reveal population structure and demography of Anopheles minimus.

    PubMed

    Dixit, Jyotsana; Arunyawat, Uraiwan; Huong, Ngo Thi; Das, Aparup

    2014-11-01

    Utilization of multiple putatively neutral DNA markers for inferring evolutionary history of species population is considered to be the most robust approach. Molecular population genetic studies have been conducted in many species of Anopheles genus, but studies based on single nucleotide polymorphism (SNP) data are still very scarce. Anopheles minimus is one of the principal malaria vectors of Southeast (SE) Asia including the Northeastern (NE) India. Although population genetic studies with mitochondrial genetic variation data have been utilized to infer phylogeography of the SE Asian populations of this species, limited information on the population structure and demography of Indian An. minimus is available. We herewith have developed multilocus nuclear genetic approach with SNP markers located in X chromosome of An. minimus in eight Indian and two SE Asian population samples (121 individual mosquitoes in total) to infer population history and test several hypotheses on the phylogeography of this species. While the Thai population sample of An. minimus presented the highest nucleotide diversity, majority of the Indian samples were also fairly diverse. In general, An. minimus populations were moderately substructured in the distribution range covering SE Asia and NE India, largely falling under three distinct genetic clusters. Moreover, demographic expansion events could be detected in the majority of the presently studied populations of An. minimus. Additional DNA sequencing of the mitochondrial COII region in a subset of the samples (40 individual mosquitoes) corroborated the existing hypothesis of Indian An. minimus falling under the earlier reported mitochondrial lineage B. © 2014 John Wiley & Sons Ltd.

  15. Inference of directional selection and mutation parameters assuming equilibrium.

    PubMed

    Vogl, Claus; Bergman, Juraj

    2015-12-01

    In a classical study, Wright (1931) proposed a model for the evolution of a biallelic locus under the influence of mutation, directional selection and drift. He derived the equilibrium distribution of the allelic proportion conditional on the scaled mutation rate, the mutation bias and the scaled strength of directional selection. The equilibrium distribution can be used for inference of these parameters with genome-wide datasets of "site frequency spectra" (SFS). Assuming that the scaled mutation rate is low, Wright's model can be approximated by a boundary-mutation model, where mutations are introduced into the population exclusively from sites fixed for the preferred or unpreferred allelic states. With the boundary-mutation model, inference can be partitioned: (i) the shape of the SFS distribution within the polymorphic region is determined by random drift and directional selection, but not by the mutation parameters, such that inference of the selection parameter relies exclusively on the polymorphic sites in the SFS; (ii) the mutation parameters can be inferred from the amount of polymorphic and monomorphic preferred and unpreferred alleles, conditional on the selection parameter. Herein, we derive maximum likelihood estimators for the mutation and selection parameters in equilibrium and apply the method to simulated SFS data as well as empirical data from a Madagascar population of Drosophila simulans. Copyright © 2015 Elsevier Inc. All rights reserved.

  16. Prior robust empirical Bayes inference for large-scale data by conditioning on rank with application to microarray data

    PubMed Central

    Liao, J. G.; Mcmurry, Timothy; Berg, Arthur

    2014-01-01

    Empirical Bayes methods have been extensively used for microarray data analysis by modeling the large number of unknown parameters as random effects. Empirical Bayes allows borrowing information across genes and can automatically adjust for multiple testing and selection bias. However, the standard empirical Bayes model can perform poorly if the assumed working prior deviates from the true prior. This paper proposes a new rank-conditioned inference in which the shrinkage and confidence intervals are based on the distribution of the error conditioned on rank of the data. Our approach is in contrast to a Bayesian posterior, which conditions on the data themselves. The new method is almost as efficient as standard Bayesian methods when the working prior is close to the true prior, and it is much more robust when the working prior is not close. In addition, it allows a more accurate (but also more complex) non-parametric estimate of the prior to be easily incorporated, resulting in improved inference. The new method’s prior robustness is demonstrated via simulation experiments. Application to a breast cancer gene expression microarray dataset is presented. Our R package rank.Shrinkage provides a ready-to-use implementation of the proposed methodology. PMID:23934072

  17. Nonidentifiability of population size from capture-recapture data with heterogeneous detection probabilities

    USGS Publications Warehouse

    Link, W.A.

    2003-01-01

    Heterogeneity in detection probabilities has long been recognized as problematic in mark-recapture studies, and numerous models developed to accommodate its effects. Individual heterogeneity is especially problematic, in that reasonable alternative models may predict essentially identical observations from populations of substantially different sizes. Thus even with very large samples, the analyst will not be able to distinguish among reasonable models of heterogeneity, even though these yield quite distinct inferences about population size. The problem is illustrated with models for closed and open populations.

  18. Single board system for fuzzy inference

    NASA Technical Reports Server (NTRS)

    Symon, James R.; Watanabe, Hiroyuki

    1991-01-01

    The very large scale integration (VLSI) implementation of a fuzzy logic inference mechanism allows the use of rule-based control and decision making in demanding real-time applications. Researchers designed a full custom VLSI inference engine. The chip was fabricated using CMOS technology. The chip consists of 688,000 transistors of which 476,000 are used for RAM memory. The fuzzy logic inference engine board system incorporates the custom designed integrated circuit into a standard VMEbus environment. The Fuzzy Logic system uses Transistor-Transistor Logic (TTL) parts to provide the interface between the Fuzzy chip and a standard, double height VMEbus backplane, allowing the chip to perform application process control through the VMEbus host. High level C language functions hide details of the hardware system interface from the applications level programmer. The first version of the board was installed on a robot at Oak Ridge National Laboratory in January of 1990.

  19. Joint Inference of Population Assignment and Demographic History

    PubMed Central

    Choi, Sang Chul; Hey, Jody

    2011-01-01

    A new approach to assigning individuals to populations using genetic data is described. Most existing methods work by maximizing Hardy–Weinberg and linkage equilibrium within populations, neither of which will apply for many demographic histories. By including a demographic model, within a likelihood framework based on coalescent theory, we can jointly study demographic history and population assignment. Genealogies and population assignments are sampled from a posterior distribution using a general isolation-with-migration model for multiple populations. A measure of partition distance between assignments facilitates not only the summary of a posterior sample of assignments, but also the estimation of the posterior density for the demographic history. It is shown that joint estimates of assignment and demographic history are possible, including estimation of population phylogeny for samples from three populations. The new method is compared to results of a widely used assignment method, using simulated and published empirical data sets. PMID:21775468

  20. Inferring relationships between pairs of individuals from locus heterozygosities

    PubMed Central

    Presciuttini, Silvano; Toni, Chiara; Tempestini, Elena; Verdiani, Simonetta; Casarino, Lucia; Spinetti, Isabella; Stefano, Francesco De; Domenici, Ranieri; Bailey-Wilson, Joan E

    2002-01-01

    Background The traditional exact method for inferring relationships between individuals from genetic data is not easily applicable in all situations that may be encountered in several fields of applied genetics. This study describes an approach that gives affordable results and is easily applicable; it is based on the probabilities that two individuals share 0, 1 or both alleles at a locus identical by state. Results We show that these probabilities (zi) depend on locus heterozygosity (H), and are scarcely affected by variation of the distribution of allele frequencies. This allows us to obtain empirical curves relating zi's to H for a series of common relationships, so that the likelihood ratio of a pair of relationships between any two individuals, given their genotypes at a locus, is a function of a single parameter, H. Application to large samples of mother-child and full-sib pairs shows that the statistical power of this method to infer the correct relationship is not much lower than the exact method. Analysis of a large database of STR data proves that locus heterozygosity does not vary significantly among Caucasian populations, apart from special cases, so that the likelihood ratio of the more common relationships between pairs of individuals may be obtained by looking at tabulated zi values. Conclusions A simple method is provided, which may be used by any scientist with the help of a calculator or a spreadsheet to compute the likelihood ratios of common alternative relationships between pairs of individuals. PMID:12441003

  1. Alternating event processes during lifetimes: population dynamics and statistical inference.

    PubMed

    Shinohara, Russell T; Sun, Yifei; Wang, Mei-Cheng

    2018-01-01

    In the literature studying recurrent event data, a large amount of work has been focused on univariate recurrent event processes where the occurrence of each event is treated as a single point in time. There are many applications, however, in which univariate recurrent events are insufficient to characterize the feature of the process because patients experience nontrivial durations associated with each event. This results in an alternating event process where the disease status of a patient alternates between exacerbations and remissions. In this paper, we consider the dynamics of a chronic disease and its associated exacerbation-remission process over two time scales: calendar time and time-since-onset. In particular, over calendar time, we explore population dynamics and the relationship between incidence, prevalence and duration for such alternating event processes. We provide nonparametric estimation techniques for characteristic quantities of the process. In some settings, exacerbation processes are observed from an onset time until death; to account for the relationship between the survival and alternating event processes, nonparametric approaches are developed for estimating exacerbation process over lifetime. By understanding the population dynamics and within-process structure, the paper provide a new and general way to study alternating event processes.

  2. Genetic diversity and population structure in Bactrocera correcta (Diptera: Tephritidae) inferred from mtDNA cox1 and microsatellite markers

    PubMed Central

    Qin, Yu-Jia; Buahom, Nopparat; Krosch, Matthew N.; Du, Yu; Wu, Yi; Malacrida, Anna R.; Deng, Yu-Liang; Liu, Jia-Qi; Jiang, Xiao-Long; Li, Zhi-Hong

    2016-01-01

    Bactrocera correcta is one of the most destructive pests of horticultural crops in tropical and subtropical regions. Despite the economic risk, the population genetics of this pest have remained relatively unexplored. This study explores population genetic structure and contemporary gene flow in B. correcta in Chinese Yunnan Province and attempts to place observed patterns within the broader geographical context of the species’ total range. Based on combined data from mtDNA cox1 sequences and 12 microsatellite loci obtained from 793 individuals located in 7 countries, overall genetic structuring was low. The expansion history of this species, including likely human-mediated dispersal, may have played a role in shaping the observed weak structure. The study suggested a close relationship between Yunnan Province and adjacent countries, with evidence for Western and/or Southern Yunnan as the invasive origin of B. correcta within Yunnan Province. The information gleaned from this analysis of gene flow and population structure has broad implications for quarantine, trade and management of this pest, especially in China where it is expanding northward. Future studies should concentrate effort on sampling South Asian populations, which would enable better inferences of the ancestral location of B. correcta and its invasion history into and throughout Asia. PMID:27929126

  3. Inferring Centrality from Network Snapshots

    NASA Astrophysics Data System (ADS)

    Shao, Haibin; Mesbahi, Mehran; Li, Dewei; Xi, Yugeng

    2017-01-01

    The topology and dynamics of a complex network shape its functionality. However, the topologies of many large-scale networks are either unavailable or incomplete. Without the explicit knowledge of network topology, we show how the data generated from the network dynamics can be utilised to infer the tempo centrality, which is proposed to quantify the influence of nodes in a consensus network. We show that the tempo centrality can be used to construct an accurate estimate of both the propagation rate of influence exerted on consensus networks and the Kirchhoff index of the underlying graph. Moreover, the tempo centrality also encodes the disturbance rejection of nodes in a consensus network. Our findings provide an approach to infer the performance of a consensus network from its temporal data.

  4. Inference of cell-cell interactions from population density characteristics and cell trajectories on static and growing domains.

    PubMed

    Ross, Robert J H; Yates, C A; Baker, R E

    2015-06-01

    A key feature of cell migration is how cell movement is affected by cell-cell interactions. Furthermore, many cell migratory processes such as neural crest stem cell migration [Thomas and Erickson, 2008; McLennan et al., 2012] occur on growing domains or in the presence of a chemoattractant. Therefore, it is important to study interactions between migrating cells in the context of domain growth and directed motility. Here we compare discrete and continuum models describing the spatial and temporal evolution of a cell population for different types of cell-cell interactions on static and growing domains. We suggest that cell-cell interactions can be inferred from population density characteristics in the presence of motility bias, and these population density characteristics for different cell-cell interactions are conserved on both static and growing domains. We also study the expected displacement of a tagged cell, and show that different types of cell-cell interactions can give rise to cell trajectories with different characteristics. These characteristics are conserved in the presence of domain growth, however, they are diminished in the presence of motility bias. Our results are relevant for researchers who study the existence and role of cell-cell interactions in biological systems, so far as we suggest that different types of cell-cell interactions could be identified from cell density and trajectory data. Copyright © 2015 Elsevier Inc. All rights reserved.

  5. Long-term dynamics of hawaiian volcanoes inferred by large-scale relative relocations of earthquakes

    NASA Astrophysics Data System (ADS)

    Got, J.-L.; Okubo, P.

    2003-04-01

    We investigated the microseismicity recorded in an active volcano to infer information concerning the volcano structure and long-term dynamics, by using relative relocations and focal mechanisms of microearthquakes. 32000 earthquakes of Mauna Loa and Kilauea volcanoes were recorded by more than 8 stations of the Hawaiian Volcano Observatory seismic network between 1988 and 1999. We studied 17000 of these events and relocated more than 70% with an accuracy ranging from 10 to 500 meters. About 75% of these relocated events are located in the vicinity of subhorizontal decollement planes, at 8 to 11 km depth. However, the striking features revealed by these relocation results are steep south-east dipping fault planes working as reverse faults, clearly located below the decollement plane and which intersect it. If this decollement plane coincides with the pre-Mauna Loa seafloor, as hypothesized by numerous authors, such reverse faults rupture the pre-Mauna Loa oceanic crust. The weight of the volcano and pressure in the magma storage system are possible causes of these ruptures, fully compatible with the local stress tensor computed by Gillard et al. (1996). Reverse faults are suspected of producing scarps revealed by km-long horizontal slip-perpendicular lineations along the decollement surface, and therefore large-scale roughness, asperities and normal stress variations. These are capable of generating stick-slip, large magnitude earthquakes, the spatial microseismic pattern observed in the south flank of Kilauea volcano, and Hilina-type instabilities. Ruptures intersecting the decollement surface, causing its large-scale roughness, may be an important parameter controlling the growth of Hawaiian volcanoes. Are there more or less rough decollement planes existing near the base of other volcanoes, such as Piton de la Fournaise or Etna, and able to explain part of their deformation and seismicity ?

  6. Decline and recovery of a large carnivore: environmental change and long-term trends in an endangered brown bear population

    PubMed Central

    Naves, Javier; Fernández-Gil, Alberto

    2016-01-01

    Understanding what factors drive fluctuations in the abundance of endangered species is a difficult ecological problem but a major requirement to attain effective management and conservation success. The ecological traits of large mammals make this task even more complicated, calling for integrative approaches. We develop a framework combining individual-based modelling and statistical inference to assess alternative hypotheses on brown bear dynamics in the Cantabrian range (Iberian Peninsula). Models including the effect of environmental factors on mortality rates were able to reproduce three decades of variation in the number of females with cubs of the year (Fcoy), including the decline that put the population close to extinction in the mid-nineties, and the following increase in brown bear numbers. This external effect prevailed over density-dependent mechanisms (sexually selected infanticide and female reproductive suppression), with a major impact of climate driven changes in resource availability and a secondary role of changes in human pressure. Predicted changes in population structure revealed a nonlinear relationship between total abundance and the number of Fcoy, highlighting the risk of simple projections based on indirect abundance indices. This study demonstrates the advantages of integrative, mechanistic approaches and provides a widely applicable framework to improve our understanding of wildlife dynamics. PMID:27903871

  7. Decline and recovery of a large carnivore: environmental change and long-term trends in an endangered brown bear population.

    PubMed

    Martínez Cano, Isabel; Taboada, Fernando González; Naves, Javier; Fernández-Gil, Alberto; Wiegand, Thorsten

    2016-11-30

    Understanding what factors drive fluctuations in the abundance of endangered species is a difficult ecological problem but a major requirement to attain effective management and conservation success. The ecological traits of large mammals make this task even more complicated, calling for integrative approaches. We develop a framework combining individual-based modelling and statistical inference to assess alternative hypotheses on brown bear dynamics in the Cantabrian range (Iberian Peninsula). Models including the effect of environmental factors on mortality rates were able to reproduce three decades of variation in the number of females with cubs of the year (Fcoy), including the decline that put the population close to extinction in the mid-nineties, and the following increase in brown bear numbers. This external effect prevailed over density-dependent mechanisms (sexually selected infanticide and female reproductive suppression), with a major impact of climate driven changes in resource availability and a secondary role of changes in human pressure. Predicted changes in population structure revealed a nonlinear relationship between total abundance and the number of Fcoy, highlighting the risk of simple projections based on indirect abundance indices. This study demonstrates the advantages of integrative, mechanistic approaches and provides a widely applicable framework to improve our understanding of wildlife dynamics. © 2016 The Author(s).

  8. Inferring Fitness Effects from Time-Resolved Sequence Data with a Delay-Deterministic Model.

    PubMed

    Nené, Nuno R; Dunham, Alistair S; Illingworth, Christopher J R

    2018-05-01

    A common challenge arising from the observation of an evolutionary system over time is to infer the magnitude of selection acting upon a specific genetic variant, or variants, within the population. The inference of selection may be confounded by the effects of genetic drift in a system, leading to the development of inference procedures to account for these effects. However, recent work has suggested that deterministic models of evolution may be effective in capturing the effects of selection even under complex models of demography, suggesting the more general application of deterministic approaches to inference. Responding to this literature, we here note a case in which a deterministic model of evolution may give highly misleading inferences, resulting from the nondeterministic properties of mutation in a finite population. We propose an alternative approach that acts to correct for this error, and which we denote the delay-deterministic model. Applying our model to a simple evolutionary system, we demonstrate its performance in quantifying the extent of selection acting within that system. We further consider the application of our model to sequence data from an evolutionary experiment. We outline scenarios in which our model may produce improved results for the inference of selection, noting that such situations can be easily identified via the use of a regular deterministic model. Copyright © 2018 Nené et al.

  9. Inferring Fitness Effects from Time-Resolved Sequence Data with a Delay-Deterministic Model

    PubMed Central

    Nené, Nuno R.; Dunham, Alistair S.; Illingworth, Christopher J. R.

    2018-01-01

    A common challenge arising from the observation of an evolutionary system over time is to infer the magnitude of selection acting upon a specific genetic variant, or variants, within the population. The inference of selection may be confounded by the effects of genetic drift in a system, leading to the development of inference procedures to account for these effects. However, recent work has suggested that deterministic models of evolution may be effective in capturing the effects of selection even under complex models of demography, suggesting the more general application of deterministic approaches to inference. Responding to this literature, we here note a case in which a deterministic model of evolution may give highly misleading inferences, resulting from the nondeterministic properties of mutation in a finite population. We propose an alternative approach that acts to correct for this error, and which we denote the delay-deterministic model. Applying our model to a simple evolutionary system, we demonstrate its performance in quantifying the extent of selection acting within that system. We further consider the application of our model to sequence data from an evolutionary experiment. We outline scenarios in which our model may produce improved results for the inference of selection, noting that such situations can be easily identified via the use of a regular deterministic model. PMID:29500183

  10. The Value of Large-Scale Randomised Control Trials in System-Wide Improvement: The Case of the Reading Catch-Up Programme

    ERIC Educational Resources Information Center

    Fleisch, Brahm; Taylor, Stephen; Schöer, Volker; Mabogoane, Thabo

    2017-01-01

    This article illustrates the value of large-scale impact evaluations with counterfactual components. It begins by exploring the limitations of small-scale impact studies, which do not allow reliable inference to a wider population or which do not use valid comparison groups. The paper then describes the design features of a recent large-scale…

  11. sick: The Spectroscopic Inference Crank

    NASA Astrophysics Data System (ADS)

    Casey, Andrew R.

    2016-03-01

    There exists an inordinate amount of spectral data in both public and private astronomical archives that remain severely under-utilized. The lack of reliable open-source tools for analyzing large volumes of spectra contributes to this situation, which is poised to worsen as large surveys successively release orders of magnitude more spectra. In this article I introduce sick, the spectroscopic inference crank, a flexible and fast Bayesian tool for inferring astrophysical parameters from spectra. sick is agnostic to the wavelength coverage, resolving power, or general data format, allowing any user to easily construct a generative model for their data, regardless of its source. sick can be used to provide a nearest-neighbor estimate of model parameters, a numerically optimized point estimate, or full Markov Chain Monte Carlo sampling of the posterior probability distributions. This generality empowers any astronomer to capitalize on the plethora of published synthetic and observed spectra, and make precise inferences for a host of astrophysical (and nuisance) quantities. Model intensities can be reliably approximated from existing grids of synthetic or observed spectra using linear multi-dimensional interpolation, or a Cannon-based model. Additional phenomena that transform the data (e.g., redshift, rotational broadening, continuum, spectral resolution) are incorporated as free parameters and can be marginalized away. Outlier pixels (e.g., cosmic rays or poorly modeled regimes) can be treated with a Gaussian mixture model, and a noise model is included to account for systematically underestimated variance. Combining these phenomena into a scalar-justified, quantitative model permits precise inferences with credible uncertainties on noisy data. I describe the common model features, the implementation details, and the default behavior, which is balanced to be suitable for most astronomical applications. Using a forward model on low-resolution, high signal

  12. SICK: THE SPECTROSCOPIC INFERENCE CRANK

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Casey, Andrew R., E-mail: arc@ast.cam.ac.uk

    2016-03-15

    There exists an inordinate amount of spectral data in both public and private astronomical archives that remain severely under-utilized. The lack of reliable open-source tools for analyzing large volumes of spectra contributes to this situation, which is poised to worsen as large surveys successively release orders of magnitude more spectra. In this article I introduce sick, the spectroscopic inference crank, a flexible and fast Bayesian tool for inferring astrophysical parameters from spectra. sick is agnostic to the wavelength coverage, resolving power, or general data format, allowing any user to easily construct a generative model for their data, regardless of itsmore » source. sick can be used to provide a nearest-neighbor estimate of model parameters, a numerically optimized point estimate, or full Markov Chain Monte Carlo sampling of the posterior probability distributions. This generality empowers any astronomer to capitalize on the plethora of published synthetic and observed spectra, and make precise inferences for a host of astrophysical (and nuisance) quantities. Model intensities can be reliably approximated from existing grids of synthetic or observed spectra using linear multi-dimensional interpolation, or a Cannon-based model. Additional phenomena that transform the data (e.g., redshift, rotational broadening, continuum, spectral resolution) are incorporated as free parameters and can be marginalized away. Outlier pixels (e.g., cosmic rays or poorly modeled regimes) can be treated with a Gaussian mixture model, and a noise model is included to account for systematically underestimated variance. Combining these phenomena into a scalar-justified, quantitative model permits precise inferences with credible uncertainties on noisy data. I describe the common model features, the implementation details, and the default behavior, which is balanced to be suitable for most astronomical applications. Using a forward model on low-resolution, high signal

  13. Inference of human continental origin and admixture proportions using a highly discriminative ancestry informative 41-SNP panel

    PubMed Central

    2013-01-01

    Background Accurate determination of genetic ancestry is of high interest for many areas such as biomedical research, personal genomics and forensics. It remains an important topic in genetic association studies, as it has been shown that population stratification, if not appropriately considered, can lead to false-positive and -negative results. While large association studies typically extract ancestry information from available genome-wide SNP genotypes, many important clinical data sets on rare phenotypes and historical collections assembled before the GWAS area are in need of a feasible method (i.e., ease of genotyping, small number of markers) to infer the geographic origin and potential admixture of the study subjects. Here we report on the development, application and limitations of a small, multiplexable ancestry informative marker (AIM) panel of SNPs (or AISNP) developed specifically for this purpose. Results Based on worldwide populations from the HGDP, a 41-AIM AISNP panel for multiplex application with the ABI SNPlex and a subset with 31 AIMs for the Sequenome iPLEX system were selected and found to be highly informative for inferring ancestry among the seven continental regions Africa, the Middle East, Europe, Central/South Asia, East Asia, the Americas and Oceania. The panel was found to be least informative for Eurasian populations, and additional AIMs for a higher resolution are suggested. A large reference set including over 4,000 subjects collected from 120 global populations was assembled to facilitate accurate ancestry determination. We show practical applications of this AIM panel, discuss its limitations for admixed individuals and suggest ways to incorporate ancestry information into genetic association studies. Conclusion We demonstrated the utility of a small AISNP panel specifically developed to discern global ancestry. We believe that it will find wide application because of its feasibility and potential for a wide range of applications

  14. Response of human populations to large-scale emergencies

    NASA Astrophysics Data System (ADS)

    Bagrow, James; Wang, Dashun; Barabási, Albert-László

    2010-03-01

    Until recently, little quantitative data regarding collective human behavior during dangerous events such as bombings and riots have been available, despite its importance for emergency management, safety and urban planning. Understanding how populations react to danger is critical for prediction, detection and intervention strategies. Using a large telecommunications dataset, we study for the first time the spatiotemporal, social and demographic response properties of people during several disasters, including a bombing, a city-wide power outage, and an earthquake. Call activity rapidly increases after an event and we find that, when faced with a truly life-threatening emergency, information rapidly propagates through a population's social network. Other events, such as sports games, do not exhibit this propagation.

  15. Inferring Centrality from Network Snapshots

    PubMed Central

    Shao, Haibin; Mesbahi, Mehran; Li, Dewei; Xi, Yugeng

    2017-01-01

    The topology and dynamics of a complex network shape its functionality. However, the topologies of many large-scale networks are either unavailable or incomplete. Without the explicit knowledge of network topology, we show how the data generated from the network dynamics can be utilised to infer the tempo centrality, which is proposed to quantify the influence of nodes in a consensus network. We show that the tempo centrality can be used to construct an accurate estimate of both the propagation rate of influence exerted on consensus networks and the Kirchhoff index of the underlying graph. Moreover, the tempo centrality also encodes the disturbance rejection of nodes in a consensus network. Our findings provide an approach to infer the performance of a consensus network from its temporal data. PMID:28098166

  16. Genome-Wide SNP Discovery, Genotyping and Their Preliminary Applications for Population Genetic Inference in Spotted Sea Bass (Lateolabrax maculatus)

    PubMed Central

    Wang, Juan; Xue, Dong-Xiu; Zhang, Bai-Dong; Li, Yu-Long; Liu, Bing-Jian; Liu, Jin-Xian

    2016-01-01

    Next-generation sequencing and the collection of genome-wide single-nucleotide polymorphisms (SNPs) allow identifying fine-scale population genetic structure and genomic regions under selection. The spotted sea bass (Lateolabrax maculatus) is a non-model species of ecological and commercial importance and widely distributed in northwestern Pacific. A total of 22 648 SNPs was discovered across the genome of L. maculatus by paired-end sequencing of restriction-site associated DNA (RAD-PE) for 30 individuals from two populations. The nucleotide diversity (π) for each population was 0.0028±0.0001 in Dandong and 0.0018±0.0001 in Beihai, respectively. Shallow but significant genetic differentiation was detected between the two populations analyzed by using both the whole data set (FST = 0.0550, P < 0.001) and the putatively neutral SNPs (FST = 0.0347, P < 0.001). However, the two populations were highly differentiated based on the putatively adaptive SNPs (FST = 0.6929, P < 0.001). Moreover, a total of 356 SNPs representing 298 unique loci were detected as outliers putatively under divergent selection by FST-based outlier tests as implemented in BAYESCAN and LOSITAN. Functional annotation of the contigs containing putatively adaptive SNPs yielded hits for 22 of 55 (40%) significant BLASTX matches. Candidate genes for local selection constituted a wide array of functions, including binding, catalytic and metabolic activities, etc. The analyses with the SNPs developed in the present study highlighted the importance of genome-wide genetic variation for inference of population structure and local adaptation in L. maculatus. PMID:27336696

  17. Genome-Wide SNP Discovery, Genotyping and Their Preliminary Applications for Population Genetic Inference in Spotted Sea Bass (Lateolabrax maculatus).

    PubMed

    Wang, Juan; Xue, Dong-Xiu; Zhang, Bai-Dong; Li, Yu-Long; Liu, Bing-Jian; Liu, Jin-Xian

    2016-01-01

    Next-generation sequencing and the collection of genome-wide single-nucleotide polymorphisms (SNPs) allow identifying fine-scale population genetic structure and genomic regions under selection. The spotted sea bass (Lateolabrax maculatus) is a non-model species of ecological and commercial importance and widely distributed in northwestern Pacific. A total of 22 648 SNPs was discovered across the genome of L. maculatus by paired-end sequencing of restriction-site associated DNA (RAD-PE) for 30 individuals from two populations. The nucleotide diversity (π) for each population was 0.0028±0.0001 in Dandong and 0.0018±0.0001 in Beihai, respectively. Shallow but significant genetic differentiation was detected between the two populations analyzed by using both the whole data set (FST = 0.0550, P < 0.001) and the putatively neutral SNPs (FST = 0.0347, P < 0.001). However, the two populations were highly differentiated based on the putatively adaptive SNPs (FST = 0.6929, P < 0.001). Moreover, a total of 356 SNPs representing 298 unique loci were detected as outliers putatively under divergent selection by FST-based outlier tests as implemented in BAYESCAN and LOSITAN. Functional annotation of the contigs containing putatively adaptive SNPs yielded hits for 22 of 55 (40%) significant BLASTX matches. Candidate genes for local selection constituted a wide array of functions, including binding, catalytic and metabolic activities, etc. The analyses with the SNPs developed in the present study highlighted the importance of genome-wide genetic variation for inference of population structure and local adaptation in L. maculatus.

  18. Nonleaky Population Transfer in a Transmon Qutrit via Largely-Detuned Drivings

    NASA Astrophysics Data System (ADS)

    Yan, Run-Ying; Feng, Zhi-Bo

    2018-06-01

    We propose an efficient scheme to implement nonleaky population transfer in a transmon qutrit via largely-detuned drivings. Due to weak level anharmonicity of the transmon system, the remarkable quantum leakages need to be considered in quantum coherent operations. Under the conditions of two-photon resonance and large detunings, the robust population transfer within a qutrit can be implemented via the technique of stimulated Raman adiabatic passage. Based on the accessible parameters, the feasible approach can remove the leakage error effectively, and then provides a potential approach for enhancing the transfer fidelity with transmon-regime artificial atoms experimentally.

  19. Analysis of genetic population structure in Acacia caven (Leguminosae, Mimosoideae), comparing one exploratory and two Bayesian-model-based methods

    PubMed Central

    Pometti, Carolina L.; Bessega, Cecilia F.; Saidman, Beatriz O.; Vilardi, Juan C.

    2014-01-01

    Bayesian clustering as implemented in STRUCTURE or GENELAND software is widely used to form genetic groups of populations or individuals. On the other hand, in order to satisfy the need for less computer-intensive approaches, multivariate analyses are specifically devoted to extracting information from large datasets. In this paper, we report the use of a dataset of AFLP markers belonging to 15 sampling sites of Acacia caven for studying the genetic structure and comparing the consistency of three methods: STRUCTURE, GENELAND and DAPC. Of these methods, DAPC was the fastest one and showed accuracy in inferring the K number of populations (K = 12 using the find.clusters option and K = 15 with a priori information of populations). GENELAND in turn, provides information on the area of membership probabilities for individuals or populations in the space, when coordinates are specified (K = 12). STRUCTURE also inferred the number of K populations and the membership probabilities of individuals based on ancestry, presenting the result K = 11 without prior information of populations and K = 15 using the LOCPRIOR option. Finally, in this work all three methods showed high consistency in estimating the population structure, inferring similar numbers of populations and the membership probabilities of individuals to each group, with a high correlation between each other. PMID:24688293

  20. Analysis of genetic population structure in Acacia caven (Leguminosae, Mimosoideae), comparing one exploratory and two Bayesian-model-based methods.

    PubMed

    Pometti, Carolina L; Bessega, Cecilia F; Saidman, Beatriz O; Vilardi, Juan C

    2014-03-01

    Bayesian clustering as implemented in STRUCTURE or GENELAND software is widely used to form genetic groups of populations or individuals. On the other hand, in order to satisfy the need for less computer-intensive approaches, multivariate analyses are specifically devoted to extracting information from large datasets. In this paper, we report the use of a dataset of AFLP markers belonging to 15 sampling sites of Acacia caven for studying the genetic structure and comparing the consistency of three methods: STRUCTURE, GENELAND and DAPC. Of these methods, DAPC was the fastest one and showed accuracy in inferring the K number of populations (K = 12 using the find.clusters option and K = 15 with a priori information of populations). GENELAND in turn, provides information on the area of membership probabilities for individuals or populations in the space, when coordinates are specified (K = 12). STRUCTURE also inferred the number of K populations and the membership probabilities of individuals based on ancestry, presenting the result K = 11 without prior information of populations and K = 15 using the LOCPRIOR option. Finally, in this work all three methods showed high consistency in estimating the population structure, inferring similar numbers of populations and the membership probabilities of individuals to each group, with a high correlation between each other.

  1. Children's and adults' evaluation of the certainty of deductive inferences, inductive inferences, and guesses.

    PubMed

    Pillow, Bradford H

    2002-01-01

    Two experiments investigated kindergarten through fourth-grade children's and adults' (N = 128) ability to (1) evaluate the certainty of deductive inferences, inductive inferences, and guesses; and (2) explain the origins of inferential knowledge. When judging their own cognitive state, children in first grade and older rated deductive inferences as more certain than guesses; but when judging another person's knowledge, children did not distinguish valid inferences from invalid inferences and guesses until fourth grade. By third grade, children differentiated their own deductive inferences from inductive inferences and guesses, but only adults both differentiated deductive inferences from inductive inferences and differentiated inductive inferences from guesses. Children's recognition of their own inferences may contribute to the development of knowledge about cognitive processes, scientific reasoning, and a constructivist epistemology.

  2. Detecting population recovery using gametic disequilibrium-based effective population size estimates

    Treesearch

    David A. Tallmon; Robin S. Waples; Dave Gregovich; Michael K. Schwartz

    2012-01-01

    Recovering populations often must meet specific growth rate or abundance targets before their legal status can be changed from endangered or threatened. While the efficacy, power, and performance of population metrics to infer trends in declining populations has received considerable attention, how these same metrics perform when populations are increasing is less...

  3. Not-so-simple stellar populations in the intermediate-age Large Magellanic Cloud star clusters NGC 1831 and NGC 1868

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Chengyuan; De Grijs, Richard; Deng, Licai, E-mail: joshuali@pku.edu.cn, E-mail: grijs@pku.edu.cn

    2014-04-01

    Using a combination of high-resolution Hubble Space Telescope/Wide-Field and Planetary Camera-2 observations, we explore the physical properties of the stellar populations in two intermediate-age star clusters, NGC 1831 and NGC 1868, in the Large Magellanic Cloud based on their color-magnitude diagrams. We show that both clusters exhibit extended main-sequence turn offs. To explain the observations, we consider variations in helium abundance, binarity, age dispersions, and the fast rotation of the clusters' member stars. The observed narrow main sequence excludes significant variations in helium abundance in both clusters. We first establish the clusters' main-sequence binary fractions using the bulk of themore » clusters' main-sequence stellar populations ≳ 1 mag below their turn-offs. The extent of the turn-off regions in color-magnitude space, corrected for the effects of binarity, implies that age spreads of order 300 Myr may be inferred for both clusters if the stellar distributions in color-magnitude space were entirely due to the presence of multiple populations characterized by an age range. Invoking rapid rotation of the population of cluster members characterized by a single age also allows us to match the observed data in detail. However, when taking into account the extent of the red clump in color-magnitude space, we encounter an apparent conflict for NGC 1831 between the age dispersion derived from that based on the extent of the main-sequence turn off and that implied by the compact red clump. We therefore conclude that, for this cluster, variations in stellar rotation rate are preferred over an age dispersion. For NGC 1868, both models perform equally well.« less

  4. Coalescent Inference Using Serially Sampled, High-Throughput Sequencing Data from Intrahost HIV Infection

    PubMed Central

    Dialdestoro, Kevin; Sibbesen, Jonas Andreas; Maretty, Lasse; Raghwani, Jayna; Gall, Astrid; Kellam, Paul; Pybus, Oliver G.; Hein, Jotun; Jenkins, Paul A.

    2016-01-01

    Human immunodeficiency virus (HIV) is a rapidly evolving pathogen that causes chronic infections, so genetic diversity within a single infection can be very high. High-throughput “deep” sequencing can now measure this diversity in unprecedented detail, particularly since it can be performed at different time points during an infection, and this offers a potentially powerful way to infer the evolutionary dynamics of the intrahost viral population. However, population genomic inference from HIV sequence data is challenging because of high rates of mutation and recombination, rapid demographic changes, and ongoing selective pressures. In this article we develop a new method for inference using HIV deep sequencing data, using an approach based on importance sampling of ancestral recombination graphs under a multilocus coalescent model. The approach further extends recent progress in the approximation of so-called conditional sampling distributions, a quantity of key interest when approximating coalescent likelihoods. The chief novelties of our method are that it is able to infer rates of recombination and mutation, as well as the effective population size, while handling sampling over different time points and missing data without extra computational difficulty. We apply our method to a data set of HIV-1, in which several hundred sequences were obtained from an infected individual at seven time points over 2 years. We find mutation rate and effective population size estimates to be comparable to those produced by the software BEAST. Additionally, our method is able to produce local recombination rate estimates. The software underlying our method, Coalescenator, is freely available. PMID:26857628

  5. Nonparametric Bayesian inference of the microcanonical stochastic block model

    NASA Astrophysics Data System (ADS)

    Peixoto, Tiago P.

    2017-01-01

    A principled approach to characterize the hidden modular structure of networks is to formulate generative models and then infer their parameters from data. When the desired structure is composed of modules or "communities," a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints, i.e., the generated networks are not allowed to violate the patterns imposed by the model. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: (1) deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, which not only remove limitations that seriously degrade the inference on large networks but also reveal structures at multiple scales; (2) a very efficient inference algorithm that scales well not only for networks with a large number of nodes and edges but also with an unlimited number of modules. We show also how this approach can be used to sample modular hierarchies from the posterior distribution, as well as to perform model selection. We discuss and analyze the differences between sampling from the posterior and simply finding the single parameter estimate that maximizes it. Furthermore, we expose a direct equivalence between our microcanonical approach and alternative derivations based on the canonical SBM.

  6. Inference of biogeographical ancestry across central regions of Eurasia.

    PubMed

    Bulbul, O; Filoglu, G; Zorlu, T; Altuncul, H; Freire-Aradas, A; Söchtig, J; Ruiz, Y; Klintschar, M; Triki-Fendri, S; Rebai, A; Phillips, C; Lareu, M V; Carracedo, Á; Schneider, P M

    2016-01-01

    The inference of biogeographical ancestry (BGA) can provide useful information for forensic investigators when there are no suspects to be compared with DNA collected at the crime scene or when no DNA database matches exist. Although public databases are increasing in size and population scope, there is a lack of information regarding genetic variation in Eurasian populations, especially in central regions such as the Middle East. Inhabitants of these regions show a high degree of genetic admixture, characterized by an allele frequency cline running from NW Europe to East Asia. Although a proper differentiation has been established between the cline extremes of western Europe and South Asia, populations geographically located in between, i.e, Middle East and Mediterranean populations, require more detailed study in order to characterize their genetic background as well as to further understand their demographic histories. To initiate these studies, three ancestry informative SNP (AI-SNP) multiplex panels: the SNPforID 34-plex, Eurasiaplex and a novel 33-plex assay were used to describe the ancestry patterns of a total of 24 populations ranging across the longitudinal axis from NW Europe to East Asia. Different ancestry inference approaches, including STRUCTURE, PCA, DAPC and Snipper Bayes analysis, were applied to determine relationships among populations. The structure results show differentiation between continental groups and a NW to SE allele frequency cline running across Eurasian populations. This study adds useful population data that could be used as reference genotypes for future ancestry investigations in forensic cases. The 33-plex assay also includes pigmentation predictive SNPs, but this study primarily focused on Eurasian population differentiation using 33-plex and its combination with the other two AI-SNP sets.

  7. Population dynamics of HIV-1 inferred from gene sequences.

    PubMed Central

    Grassly, N C; Harvey, P H; Holmes, E C

    1999-01-01

    A method for the estimation of population dynamic history from sequence data is described and used to investigate the past population dynamics of HIV-1 subtypes A and B. Using both gag and env gene alignments the effective population size of each subtype is estimated and found to be surprisingly small. This may be a result of the selective sweep of mutations through the population, or may indicate an important role of genetic drift in the fixation of mutations. The implications of these results for the spread of drug-resistant mutations and transmission dynamics, and also the roles of selection and recombination in shaping HIV-1 genetic diversity, are discussed. A larger estimated effective population size for subtype A may be the result of differences in time of origin, transmission dynamics, and/or population structure. To investigate the importance of population structure a model of population subdivision was fitted to each subtype, although the improvement in likelihood was found to be nonsignificant. PMID:9927440

  8. The genetic heterogeneity of Arab populations as inferred from HLA genes

    PubMed Central

    Almawi, Wassim Y.; Arnaiz-Villena, Antonio; Hattab, Lasmar; Hmida, Slama

    2018-01-01

    This is the first genetic anthropology study on Arabs in MENA (Middle East and North Africa) region. The present meta-analysis included 100 populations from 36 Arab and non-Arab communities, comprising 16,006 individuals, and evaluates the genetic profile of Arabs using HLA class I (A, B) and class II (DRB1, DQB1) genes. A total of 56 Arab populations comprising 10,283 individuals were selected from several databases, and were compared with 44 Mediterranean, Asian, and sub-Saharan populations. The most frequent alleles in Arabs are A*01, A*02, B*35, B*51, DRB1*03:01, DRB1*07:01, DQB1*02:01, and DQB1*03:01, while DRB1*03:01-DQB1*02:01 and DRB1*07:01-DQB1*02:02 are the most frequent class II haplotypes. Dendrograms, correspondence analyses, genetic distances, and haplotype analysis indicate that Arabs could be stratified into four groups. The first consists of North Africans (Algerians, Tunisians, Moroccans, and Libyans), and the first Arabian Peninsula cluster (Saudis, Kuwaitis, and Yemenis), who appear to be related to Western Mediterraneans, including Iberians; this might be explained for a massive migration into these areas when Sahara underwent a relatively rapid desiccation, starting about 10,000 years BC. The second includes Levantine Arabs (Palestinians, Jordanians, Lebanese, and Syrians), along with Iraqi and Egyptians, who are related to Eastern Mediterraneans. The third comprises Sudanese and Comorians, who tend to cluster with Sub-Saharans. The fourth comprises the second Arabian Peninsula cluster, made up of Omanis, Emiratis, and Bahrainis. It is noteworthy that the two large minorities (Berbers and Kurds) are indigenous (autochthonous), and are not genetically different from “host” and neighboring populations. In conclusion, this study confirmed high genetic heterogeneity among present-day Arabs, and especially those of the Arabian Peninsula. PMID:29522542

  9. PROBABILITY SAMPLING AND POPULATION INFERENCE IN MONITORING PROGRAMS

    EPA Science Inventory

    A fundamental difference between probability sampling and conventional statistics is that "sampling" deals with real, tangible populations, whereas "conventional statistics" usually deals with hypothetical populations that have no real-world realization. he focus here is on real ...

  10. Turnover and accumulation of genetic diversity across large time-scale cycles of isolation and connection of populations

    PubMed Central

    Alcala, Nicolas; Vuilleumier, Séverine

    2014-01-01

    Major climatic and geological events but also population history (secondary contacts) have generated cycles of population isolation and connection of long and short periods. Recent empirical and theoretical studies suggest that fast evolutionary processes might be triggered by such events, as commonly illustrated in ecology by the adaptive radiation of cichlid fishes (isolation and reconnection of lakes and watersheds) and in epidemiology by the fast adaptation of the influenza virus (isolation and reconnection in hosts). We test whether cyclic population isolation and connection provide the raw material (standing genetic variation) for species evolution and diversification. Our analytical results demonstrate that population isolation and connection can provide, to populations, a high excess of genetic diversity compared with what is expected at equilibrium. This excess is either cyclic (high allele turnover) or cumulates with time depending on the duration of the isolation and the connection periods and the mutation rate. We show that diversification rates of animal clades are associated with specific periods of climatic cycles in the Quaternary. We finally discuss the importance of our results for macroevolutionary patterns and for the inference of population history from genomic data. PMID:25253456

  11. The genetic assimilation in language borrowing inferred from Jing People.

    PubMed

    Huang, Xiufeng; Zhou, Qinghui; Bin, Xiaoyun; Lai, Shu; Lin, Chaowen; Hu, Rong; Xiao, Jiashun; Luo, Dajun; Li, Yingxiang; Wei, Lan-Hai; Yeh, Hui-Yuan; Chen, Gang; Wang, Chuan-Chao

    2018-02-28

    The Jing people are a recognized ethnic group in Guangxi, southwest China, who are the immigrants from Vietnam during the 16th century. They speak Vietnamese but with lots of language borrowings from Cantonese, Zhuang, and Mandarin. However, it's unclear if there is large-scale gene flow from surrounding populations into Jing people during their language change due to the very limited genetic information of this population. We collected blood samples from 37 Jing and 3 Han Chinese individuals from Wanwei, Shanxin, and Wutou islands in Guangxi and genotyped about 600,000 genome-wide single nucleotide polymorphisms (SNPs). We used Principal Component Analysis (PCA), ADMIXTURE analysis, f statistics, qpWave and qpAdm to infer the population genetic structure and admixture. Our data revealed that the Jing people are genetically similar to the populations in southwest China and mainland Southeast Asia. But compared with Vietnamese, they show significant evidence of gene flow from surrounding East Asians. The admixture proportion is estimated to be around 35-42% in different Jing groups using southern Han Chinese as a proxy. The majority of the paternal lineages of Jing people are most likely from surrounding East Asians. We conclude that the formation and language change of present-day Jing people have involved genetic assimilation of surrounding East Asian populations. The language borrowing, in this case, is not only a cultural phenomenon but has involved demic diffusion. © 2018 Wiley Periodicals, Inc.

  12. Evolution in Mind: Evolutionary Dynamics, Cognitive Processes, and Bayesian Inference.

    PubMed

    Suchow, Jordan W; Bourgin, David D; Griffiths, Thomas L

    2017-07-01

    Evolutionary theory describes the dynamics of population change in settings affected by reproduction, selection, mutation, and drift. In the context of human cognition, evolutionary theory is most often invoked to explain the origins of capacities such as language, metacognition, and spatial reasoning, framing them as functional adaptations to an ancestral environment. However, evolutionary theory is useful for understanding the mind in a second way: as a mathematical framework for describing evolving populations of thoughts, ideas, and memories within a single mind. In fact, deep correspondences exist between the mathematics of evolution and of learning, with perhaps the deepest being an equivalence between certain evolutionary dynamics and Bayesian inference. This equivalence permits reinterpretation of evolutionary processes as algorithms for Bayesian inference and has relevance for understanding diverse cognitive capacities, including memory and creativity. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. The Discovery of Single-Nucleotide Polymorphisms—and Inferences about Human Demographic History

    PubMed Central

    Wakeley, John; Nielsen, Rasmus; Liu-Cordero, Shau Neen; Ardlie, Kristin

    2001-01-01

    A method of historical inference that accounts for ascertainment bias is developed and applied to single-nucleotide polymorphism (SNP) data in humans. The data consist of 84 short fragments of the genome that were selected, from three recent SNP surveys, to contain at least two polymorphisms in their respective ascertainment samples and that were then fully resequenced in 47 globally distributed individuals. Ascertainment bias is the deviation, from what would be observed in a random sample, caused either by discovery of polymorphisms in small samples or by locus selection based on levels or patterns of polymorphism. The three SNP surveys from which the present data were derived differ both in their protocols for ascertainment and in the size of the samples used for discovery. We implemented a Monte Carlo maximum-likelihood method to fit a subdivided-population model that includes a possible change in effective size at some time in the past. Incorrectly assuming that ascertainment bias does not exist causes errors in inference, affecting both estimates of migration rates and historical changes in size. Migration rates are overestimated when ascertainment bias is ignored. However, the direction of error in inferences about changes in effective population size (whether the population is inferred to be shrinking or growing) depends on whether either the numbers of SNPs per fragment or the SNP-allele frequencies are analyzed. We use the abbreviation “SDL,” for “SNP-discovered locus,” in recognition of the genomic-discovery context of SNPs. When ascertainment bias is modeled fully, both the number of SNPs per SDL and their allele frequencies support a scenario of growth in effective size in the context of a subdivided population. If subdivision is ignored, however, the hypothesis of constant effective population size cannot be rejected. An important conclusion of this work is that, in demographic or other studies, SNP data are useful only to the extent that

  14. Inferring sex-specific demographic history from SNP data

    PubMed Central

    Gautier, Mathieu

    2018-01-01

    The relative female and male contributions to demography are of great importance to better understand the history and dynamics of populations. While earlier studies relied on uniparental markers to investigate sex-specific questions, the increasing amount of sequence data now enables us to take advantage of tens to hundreds of thousands of independent loci from autosomes and the X chromosome. Here, we develop a novel method to estimate effective sex ratios or ESR (defined as the female proportion of the effective population) from allele count data for each branch of a rooted tree topology that summarizes the history of the populations of interest. Our method relies on Kimura’s time-dependent diffusion approximation for genetic drift, and is based on a hierarchical Bayesian model to integrate over the allele frequencies along the branches. We show via simulations that parameters are inferred robustly, even under scenarios that violate some of the model assumptions. Analyzing bovine SNP data, we infer a strongly female-biased ESR in both dairy and beef cattle, as expected from the underlying breeding scheme. Conversely, we observe a strongly male-biased ESR in early domestication times, consistent with an easier taming and management of cows, and/or introgression from wild auroch males, that would both cause a relative increase in male effective population size. In humans, analyzing a subsample of non-African populations, we find a male-biased ESR in Oceanians that may reflect complex marriage patterns in Aboriginal Australians. Because our approach relies on allele count data, it may be applied on a wide range of species. PMID:29385127

  15. Inference of beliefs and emotions in patients with Alzheimer's disease.

    PubMed

    Zaitchik, Deborah; Koff, Elissa; Brownell, Hiram; Winner, Ellen; Albert, Marilyn

    2006-01-01

    The present study compared 20 patients with mild to moderate Alzheimer's disease with 20 older controls (ages 69-94 years) on their ability to make inferences about emotions and beliefs in others. Six tasks tested their ability to make 1st-order and 2nd-order inferences as well as to offer explanations and moral evaluations of human action by appeal to emotions and beliefs. Results showed that the ability to infer emotions and beliefs in 1st-order tasks remains largely intact in patients with mild to moderate Alzheimer's. Patients were able to use mental states in the prediction, explanation, and moral evaluation of behavior. Impairment on 2nd-order tasks involving inference of mental states was equivalent to impairment on control tasks, suggesting that patients' difficulty is secondary to their cognitive impairments. ((c) 2006 APA, all rights reserved).

  16. Maximum caliber inference of nonequilibrium processes

    NASA Astrophysics Data System (ADS)

    Otten, Moritz; Stock, Gerhard

    2010-07-01

    Thirty years ago, Jaynes suggested a general theoretical approach to nonequilibrium statistical mechanics, called maximum caliber (MaxCal) [Annu. Rev. Phys. Chem. 31, 579 (1980)]. MaxCal is a variational principle for dynamics in the same spirit that maximum entropy is a variational principle for equilibrium statistical mechanics. Motivated by the success of maximum entropy inference methods for equilibrium problems, in this work the MaxCal formulation is applied to the inference of nonequilibrium processes. That is, given some time-dependent observables of a dynamical process, one constructs a model that reproduces these input data and moreover, predicts the underlying dynamics of the system. For example, the observables could be some time-resolved measurements of the folding of a protein, which are described by a few-state model of the free energy landscape of the system. MaxCal then calculates the probabilities of an ensemble of trajectories such that on average the data are reproduced. From this probability distribution, any dynamical quantity of the system can be calculated, including population probabilities, fluxes, or waiting time distributions. After briefly reviewing the formalism, the practical numerical implementation of MaxCal in the case of an inference problem is discussed. Adopting various few-state models of increasing complexity, it is demonstrated that the MaxCal principle indeed works as a practical method of inference: The scheme is fairly robust and yields correct results as long as the input data are sufficient. As the method is unbiased and general, it can deal with any kind of time dependency such as oscillatory transients and multitime decays.

  17. Landscape attributes and life history variability shape genetic structure of trout populations in a stream network

    USGS Publications Warehouse

    Neville, H.M.; Dunham, J.B.; Peacock, M.M.

    2006-01-01

    Spatial and temporal landscape patterns have long been recognized to influence biological processes, but these processes often operate at scales that are difficult to study by conventional means. Inferences from genetic markers can overcome some of these limitations. We used a landscape genetics approach to test hypotheses concerning landscape processes influencing the demography of Lahontan cutthroat trout in a complex stream network in the Great Basin desert of the western US. Predictions were tested with population- and individual-based analyses of microsatellite DNA variation, reflecting patterns of dispersal, population stability, and local effective population sizes. Complementary genetic inferences suggested samples from migratory corridors housed a mixture of fish from tributaries, as predicted based on assumed migratory life histories in those habitats. Also as predicted, populations presumed to have greater proportions of migratory fish or from physically connected, large, or high quality habitats had higher genetic variability and reduced genetic differentiation from other populations. Populations thought to contain largely non-migratory individuals generally showed the opposite pattern, suggesting behavioral isolation. Estimated effective sizes were small, and we identified significant and severe genetic bottlenecks in several populations that were isolated, recently founded, or that inhabit streams that desiccate frequently. Overall, this work suggested that Lahontan cutthroat trout populations in stream networks are affected by a combination of landscape and metapopulation processes. Results also demonstrated that genetic patterns can reveal unexpected processes, even within a system that is well studied from a conventional ecological perspective. ?? Springer 2006.

  18. A method for inferring regional origins of neurodegeneration.

    PubMed

    Torok, Justin; Maia, Pedro D; Powell, Fon; Pandya, Sneha; Raj, Ashish

    2018-02-02

    Alzheimer's disease, the most common form of dementia, is characterized by the emergence and spread of senile plaques and neurofibrillary tangles, causing widespread neurodegeneration. Though the progression of Alzheimer's disease is considered to be stereotyped, the significant variability within clinical populations obscures this interpretation on the individual level. Of particular clinical importance is understanding where exactly pathology, e.g. tau, emerges in each patient and how the incipient atrophy pattern relates to future spread of disease. Here we demonstrate a newly developed graph theoretical method of inferring prior disease states in patients with Alzheimer's disease and mild cognitive impairment using an established network diffusion model and an L1-penalized optimization algorithm. Although the 'seeds' of origin using our inference method successfully reproduce known trends in Alzheimer's disease staging on a population level, we observed that the high degree of heterogeneity between patients at baseline is also reflected in their seeds. Additionally, the individualized seeds are significantly more predictive of future atrophy than a single seed placed at the hippocampus. Our findings illustrate that understanding where disease originates in individuals is critical to determining how it progresses and that our method allows us to infer early stages of disease from atrophy patterns observed at diagnosis. © The Author(s) (2018). Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  19. Context-dependence of long-term responses of terrestrial gastropod populations to large-scale disturbance.

    Treesearch

    Christopher P. Bloch; Michael R. Willi

    2006-01-01

    Large-scale natural disturbances, such as hurricanes, can have profound effects on animal populations. Nonetheless, generalizations about the effects of disturbance are elusive, and few studies consider long-term responses of a single population or community to multiple large-scale disturbance events. In the last 20 y, twomajor hurricanes (Hugo and Georges) have struck...

  20. Inference or Observation?

    ERIC Educational Resources Information Center

    Finson, Kevin D.

    2010-01-01

    Learning about what inferences are, and what a good inference is, will help students become more scientifically literate and better understand the nature of science in inquiry. Students in K-4 should be able to give explanations about what they investigate (NSTA 1997) and that includes doing so through inferring. This article provides some tips…

  1. Inferring multi-scale neural mechanisms with brain network modelling

    PubMed Central

    Schirner, Michael; McIntosh, Anthony Randal; Jirsa, Viktor; Deco, Gustavo

    2018-01-01

    The neurophysiological processes underlying non-invasive brain activity measurements are incompletely understood. Here, we developed a connectome-based brain network model that integrates individual structural and functional data with neural population dynamics to support multi-scale neurophysiological inference. Simulated populations were linked by structural connectivity and, as a novelty, driven by electroencephalography (EEG) source activity. Simulations not only predicted subjects' individual resting-state functional magnetic resonance imaging (fMRI) time series and spatial network topologies over 20 minutes of activity, but more importantly, they also revealed precise neurophysiological mechanisms that underlie and link six empirical observations from different scales and modalities: (1) resting-state fMRI oscillations, (2) functional connectivity networks, (3) excitation-inhibition balance, (4, 5) inverse relationships between α-rhythms, spike-firing and fMRI on short and long time scales, and (6) fMRI power-law scaling. These findings underscore the potential of this new modelling framework for general inference and integration of neurophysiological knowledge to complement empirical studies. PMID:29308767

  2. How Generalizable Is Your Experiment? An Index for Comparing Experimental Samples and Populations

    ERIC Educational Resources Information Center

    Tipton, Elizabeth

    2014-01-01

    Although a large-scale experiment can provide an estimate of the average causal impact for a program, the sample of sites included in the experiment is often not drawn randomly from the inference population of interest. In this article, we provide a generalizability index that can be used to assess the degree of similarity between the sample of…

  3. Multi-scale approaches for high-speed imaging and analysis of large neural populations

    PubMed Central

    Ahrens, Misha B.; Yuste, Rafael; Peterka, Darcy S.; Paninski, Liam

    2017-01-01

    Progress in modern neuroscience critically depends on our ability to observe the activity of large neuronal populations with cellular spatial and high temporal resolution. However, two bottlenecks constrain efforts towards fast imaging of large populations. First, the resulting large video data is challenging to analyze. Second, there is an explicit tradeoff between imaging speed, signal-to-noise, and field of view: with current recording technology we cannot image very large neuronal populations with simultaneously high spatial and temporal resolution. Here we describe multi-scale approaches for alleviating both of these bottlenecks. First, we show that spatial and temporal decimation techniques based on simple local averaging provide order-of-magnitude speedups in spatiotemporally demixing calcium video data into estimates of single-cell neural activity. Second, once the shapes of individual neurons have been identified at fine scale (e.g., after an initial phase of conventional imaging with standard temporal and spatial resolution), we find that the spatial/temporal resolution tradeoff shifts dramatically: after demixing we can accurately recover denoised fluorescence traces and deconvolved neural activity of each individual neuron from coarse scale data that has been spatially decimated by an order of magnitude. This offers a cheap method for compressing this large video data, and also implies that it is possible to either speed up imaging significantly, or to “zoom out” by a corresponding factor to image order-of-magnitude larger neuronal populations with minimal loss in accuracy or temporal resolution. PMID:28771570

  4. Neural correlates of species-typical illogical cognitive bias in human inference.

    PubMed

    Ogawa, Akitoshi; Yamazaki, Yumiko; Ueno, Kenichi; Cheng, Kang; Iriki, Atsushi

    2010-09-01

    The ability to think logically is a hallmark of human intelligence, yet our innate inferential abilities are marked by implicit biases that often lead to illogical inference. For example, given AB ("if A then B"), people frequently but fallaciously infer the inverse, BA. This mode of inference, called symmetry, is logically invalid because, although it may be true, it is not necessarily true. Given pairs of conditional relations, such as AB and BC, humans reflexively perform two additional modes of inference: transitivity, whereby one (validly) infers AC; and equivalence, whereby one (invalidly) infers CA. In sharp contrast, nonhuman animals can handle transitivity but can rarely be made to acquire symmetry or equivalence. In the present study, human subjects performed logical and illogical inferences about the relations between abstract, visually presented figures while their brain activation was monitored with fMRI. The prefrontal, medial frontal, and intraparietal cortices were activated during all modes of inference. Additional activation in the precuneus and posterior parietal cortex was observed during transitivity and equivalence, which may reflect the need to retrieve the intermediate stimulus (B) from memory. Surprisingly, the patterns of brain activation in illogical and logical inference were very similar. We conclude that the observed inference-related fronto-parietal network is adapted for processing categorical, but not logical, structures of association among stimuli. Humans might prefer categorization over the memorization of logical structures in order to minimize the cognitive working memory load when processing large volumes of information.

  5. SLUG - stochastically lighting up galaxies - III. A suite of tools for simulated photometry, spectroscopy, and Bayesian inference with stochastic stellar populations

    NASA Astrophysics Data System (ADS)

    Krumholz, Mark R.; Fumagalli, Michele; da Silva, Robert L.; Rendahl, Theodore; Parra, Jonathan

    2015-09-01

    Stellar population synthesis techniques for predicting the observable light emitted by a stellar population have extensive applications in numerous areas of astronomy. However, accurate predictions for small populations of young stars, such as those found in individual star clusters, star-forming dwarf galaxies, and small segments of spiral galaxies, require that the population be treated stochastically. Conversely, accurate deductions of the properties of such objects also require consideration of stochasticity. Here we describe a comprehensive suite of modular, open-source software tools for tackling these related problems. These include the following: a greatly-enhanced version of the SLUG code introduced by da Silva et al., which computes spectra and photometry for stochastically or deterministically sampled stellar populations with nearly arbitrary star formation histories, clustering properties, and initial mass functions; CLOUDY_SLUG, a tool that automatically couples SLUG-computed spectra with the CLOUDY radiative transfer code in order to predict stochastic nebular emission; BAYESPHOT, a general-purpose tool for performing Bayesian inference on the physical properties of stellar systems based on unresolved photometry; and CLUSTER_SLUG and SFR_SLUG, a pair of tools that use BAYESPHOT on a library of SLUG models to compute the mass, age, and extinction of mono-age star clusters, and the star formation rate of galaxies, respectively. The latter two tools make use of an extensive library of pre-computed stellar population models, which are included in the software. The complete package is available at http://www.slugsps.com.

  6. Entropic Inference

    NASA Astrophysics Data System (ADS)

    Caticha, Ariel

    2011-03-01

    In this tutorial we review the essential arguments behing entropic inference. We focus on the epistemological notion of information and its relation to the Bayesian beliefs of rational agents. The problem of updating from a prior to a posterior probability distribution is tackled through an eliminative induction process that singles out the logarithmic relative entropy as the unique tool for inference. The resulting method of Maximum relative Entropy (ME), includes as special cases both MaxEnt and Bayes' rule, and therefore unifies the two themes of these workshops—the Maximum Entropy and the Bayesian methods—into a single general inference scheme.

  7. The Dimensionality of Inference Making: Are Local and Global Inferences Distinguishable?

    ERIC Educational Resources Information Center

    Muijselaar, Marloes M. L.

    2018-01-01

    We investigated the dimensionality of inference making in samples of 4- to 9-year-olds (Ns = 416-783) to determine if local and global coherence inferences could be distinguished. In addition, we examined the validity of our experimenter-developed inference measure by comparing with three additional measures of listening comprehension. Multitrait,…

  8. Statistical inference on genetic data reveals the complex demographic history of human populations in central Asia.

    PubMed

    Palstra, Friso P; Heyer, Evelyne; Austerlitz, Frédéric

    2015-06-01

    The demographic history of modern humans constitutes a combination of expansions, colonizations, contractions, and remigrations. The advent of large scale genetic data combined with statistically refined methods facilitates inference of this complex history. Here we study the demographic history of two genetically admixed ethnic groups in Central Asia, an area characterized by high levels of genetic diversity and a history of recurrent immigration. Using Approximate Bayesian Computation, we infer that the timing of admixture markedly differs between the two groups. Admixture in the traditionally agricultural Tajiks could be dated back to the onset of the Neolithic transition in the region, whereas admixture in Kyrgyz is more recent, and may have involved the westward movement of Turkic peoples. These results are confirmed by a coalescent method that fits an isolation-with-migration model to the genetic data, with both Central Asian groups having received gene flow from the extremities of Eurasia. Interestingly, our analyses also uncover signatures of gene flow from Eastern to Western Eurasia during Paleolithic times. In conclusion, the high genetic diversity currently observed in these two Central Asian peoples most likely reflects the effects of recurrent immigration that likely started before historical times. Conversely, conquests during historical times may have had a relatively limited genetic impact. These results emphasize the need for a better understanding of the genetic consequences of transmission of culture and technological innovations, as well as those of invasions and conquests. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  9. STRIDE: Species Tree Root Inference from Gene Duplication Events.

    PubMed

    Emms, David M; Kelly, Steven

    2017-12-01

    The correct interpretation of any phylogenetic tree is dependent on that tree being correctly rooted. We present STRIDE, a fast, effective, and outgroup-free method for identification of gene duplication events and species tree root inference in large-scale molecular phylogenetic analyses. STRIDE identifies sets of well-supported in-group gene duplication events from a set of unrooted gene trees, and analyses these events to infer a probability distribution over an unrooted species tree for the location of its root. We show that STRIDE correctly identifies the root of the species tree in multiple large-scale molecular phylogenetic data sets spanning a wide range of timescales and taxonomic groups. We demonstrate that the novel probability model implemented in STRIDE can accurately represent the ambiguity in species tree root assignment for data sets where information is limited. Furthermore, application of STRIDE to outgroup-free inference of the origin of the eukaryotic tree resulted in a root probability distribution that provides additional support for leading hypotheses for the origin of the eukaryotes. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  10. Global population structure of the tope (Galeorhinus galeus) inferred by mitochondrial control region sequence data.

    PubMed

    Chabot, C L; Allen, L G

    2009-02-01

    In order to properly manage and conserve exploited shark species, detailed analyses of their population structure is needed. Global populations of Galeorhinus galeus are in decline due to the exploitation of the fishery over the past 80 years. Currently, the genetic structure of eastern Pacific populations of G. galeus is not known and recent observations in the northeastern Pacific suggest an increase in numbers. To evaluate gene flow among populations of G. galeus, 116 samples were collected and analysed from six geographically dispersed locations: Australia, North America, South Africa, South America (Argentina and Peru), and the UK. Analysis of 968 to 1006 bp of the 1068-bp mitochondrial control region revealed 38 unique haplotypes that were largely restricted to their collecting locality. Significant genetic structure was detected among populations (Phi(ST) = 0.84; P < 0.000001) and migration estimates were low (Nm = 0.05-0.97). Due to an apparent lack of migration, populations of G. galeus appear to be isolated from each other with little to no gene flow occurring among them. As a consequence of this isolation, increasing numbers of G. galeus in the northeastern Pacific can be best explained by local recruitment and not by input from geographically distant populations.

  11. The total satellite population of the Milky Way

    NASA Astrophysics Data System (ADS)

    Newton, Oliver; Cautun, Marius; Jenkins, Adrian; Frenk, Carlos S.; Helly, John C.

    2018-05-01

    The total number and luminosity function of the population of dwarf galaxies of the Milky Way (MW) provide important constraints on the nature of the dark matter and on the astrophysics of galaxy formation at low masses. However, only a partial census of this population exists because of the flux limits and restricted sky coverage of existing Galactic surveys. We combine the sample of satellites recently discovered by the Dark Energy Survey (DES) survey with the satellites found in Sloan Digital Sky Survey (SDSS) Data Release 9 (together these surveys cover nearly half the sky) to estimate the total luminosity function of satellites down to MV = 0. We apply a new Bayesian inference method in which we assume that the radial distribution of satellites independently of absolute magnitude follows that of subhaloes selected according to their peak maximum circular velocity. We find that there should be at least 124^{+40}_{-27}(68% CL, statistical error) satellites brighter than MV = 0 within 300kpc of the Sun. As a result of our use of new data and better simulations, and a more robust statistical method, we infer a much smaller population of satellites than reported in previous studies using earlier SDSS data only; we also address an underestimation of the uncertainties in earlier work by accounting for stochastic effects. We find that the inferred number of faint satellites depends only weakly on the assumed mass of the MW halo and we provide scaling relations to extend our results to different assumed halo masses and outer radii. We predict that half of our estimated total satellite population of the MW should be detected by the Large Synoptic Survey Telescope (LSST). The code implementing our estimation method is available online.†

  12. Controller certification: The generalized stability margin inference for a large number of MIMO controllers

    NASA Astrophysics Data System (ADS)

    Park, Jisang

    In this dissertation, we investigate MIMO stability margin inference of a large number of controllers using pre-established stability margins of a small number of nu-gap-wise adjacent controllers. The generalized stability margin and the nu-gap metric are inherently able to handle MIMO system analysis without the necessity of repeating multiple channel-by-channel SISO analyses. This research consists of three parts: (i) development of a decision support tool for inference of the stability margin, (ii) computational considerations for yielding the maximal stability margin with the minimal nu-gap metric in a less conservative manner, and (iii) experiment design for estimating the generalized stability margin with an assured error bound. A modern problem from aerospace control involves the certification of a large set of potential controllers with either a single plant or a fleet of potential plant systems, with both plants and controllers being MIMO and, for the moment, linear. Experiments on a limited number of controller/plant pairs should establish the stability and a certain level of margin of the complete set. We consider this certification problem for a set of controllers and provide algorithms for selecting an efficient subset for testing. This is done for a finite set of candidate controllers and, at least for SISO plants, for an infinite set. In doing this, the nu-gap metric will be the main tool. We provide a theorem restricting a radius of a ball in the parameter space so that the controller can guarantee a prescribed level of stability and performance if parameters of the controllers are contained in the ball. Computational examples are given, including one of certification of an aircraft engine controller. The overarching aim is to introduce truly MIMO margin calculations and to understand their efficacy in certifying stability over a set of controllers and in replacing legacy single-loop gain and phase margin calculations. We consider methods for the

  13. Self-enforcing Private Inference Control

    NASA Astrophysics Data System (ADS)

    Yang, Yanjiang; Li, Yingjiu; Weng, Jian; Zhou, Jianying; Bao, Feng

    Private inference control enables simultaneous enforcement of inference control and protection of users' query privacy. Private inference control is a useful tool for database applications, especially when users are increasingly concerned about individual privacy nowadays. However, protection of query privacy on top of inference control is a double-edged sword: without letting the database server know the content of user queries, users can easily launch DoS attacks. To assuage DoS attacks in private inference control, we propose the concept of self-enforcing private inference control, whose intuition is to force users to only make inference-free queries by enforcing inference control themselves; otherwise, penalty will inflict upon the violating users.

  14. Bayesian Inference of High-Dimensional Dynamical Ocean Models

    NASA Astrophysics Data System (ADS)

    Lin, J.; Lermusiaux, P. F. J.; Lolla, S. V. T.; Gupta, A.; Haley, P. J., Jr.

    2015-12-01

    This presentation addresses a holistic set of challenges in high-dimension ocean Bayesian nonlinear estimation: i) predict the probability distribution functions (pdfs) of large nonlinear dynamical systems using stochastic partial differential equations (PDEs); ii) assimilate data using Bayes' law with these pdfs; iii) predict the future data that optimally reduce uncertainties; and (iv) rank the known and learn the new model formulations themselves. Overall, we allow the joint inference of the state, equations, geometry, boundary conditions and initial conditions of dynamical models. Examples are provided for time-dependent fluid and ocean flows, including cavity, double-gyre and Strait flows with jets and eddies. The Bayesian model inference, based on limited observations, is illustrated first by the estimation of obstacle shapes and positions in fluid flows. Next, the Bayesian inference of biogeochemical reaction equations and of their states and parameters is presented, illustrating how PDE-based machine learning can rigorously guide the selection and discovery of complex ecosystem models. Finally, the inference of multiscale bottom gravity current dynamics is illustrated, motivated in part by classic overflows and dense water formation sites and their relevance to climate monitoring and dynamics. This is joint work with our MSEAS group at MIT.

  15. Bayesian Inference on the Effect of Density Dependence and Weather on a Guanaco Population from Chile

    PubMed Central

    Zubillaga, María; Skewes, Oscar; Soto, Nicolás; Rabinovich, Jorge E.; Colchero, Fernando

    2014-01-01

    Understanding the mechanisms that drive population dynamics is fundamental for management of wild populations. The guanaco (Lama guanicoe) is one of two wild camelid species in South America. We evaluated the effects of density dependence and weather variables on population regulation based on a time series of 36 years of population sampling of guanacos in Tierra del Fuego, Chile. The population density varied between 2.7 and 30.7 guanaco/km2, with an apparent monotonic growth during the first 25 years; however, in the last 10 years the population has shown large fluctuations, suggesting that it might have reached its carrying capacity. We used a Bayesian state-space framework and model selection to determine the effect of density and environmental variables on guanaco population dynamics. Our results show that the population is under density dependent regulation and that it is currently fluctuating around an average carrying capacity of 45,000 guanacos. We also found a significant positive effect of previous winter temperature while sheep density has a strong negative effect on the guanaco population growth. We conclude that there are significant density dependent processes and that climate as well as competition with domestic species have important effects determining the population size of guanacos, with important implications for management and conservation. PMID:25514510

  16. Bayesian inference on the effect of density dependence and weather on a guanaco population from Chile.

    PubMed

    Zubillaga, María; Skewes, Oscar; Soto, Nicolás; Rabinovich, Jorge E; Colchero, Fernando

    2014-01-01

    Understanding the mechanisms that drive population dynamics is fundamental for management of wild populations. The guanaco (Lama guanicoe) is one of two wild camelid species in South America. We evaluated the effects of density dependence and weather variables on population regulation based on a time series of 36 years of population sampling of guanacos in Tierra del Fuego, Chile. The population density varied between 2.7 and 30.7 guanaco/km2, with an apparent monotonic growth during the first 25 years; however, in the last 10 years the population has shown large fluctuations, suggesting that it might have reached its carrying capacity. We used a Bayesian state-space framework and model selection to determine the effect of density and environmental variables on guanaco population dynamics. Our results show that the population is under density dependent regulation and that it is currently fluctuating around an average carrying capacity of 45,000 guanacos. We also found a significant positive effect of previous winter temperature while sheep density has a strong negative effect on the guanaco population growth. We conclude that there are significant density dependent processes and that climate as well as competition with domestic species have important effects determining the population size of guanacos, with important implications for management and conservation.

  17. More than one kind of inference: re-examining what's learned in feature inference and classification.

    PubMed

    Sweller, Naomi; Hayes, Brett K

    2010-08-01

    Three studies examined how task demands that impact on attention to typical or atypical category features shape the category representations formed through classification learning and inference learning. During training categories were learned via exemplar classification or by inferring missing exemplar features. In the latter condition inferences were made about missing typical features alone (typical feature inference) or about both missing typical and atypical features (mixed feature inference). Classification and mixed feature inference led to the incorporation of typical and atypical features into category representations, with both kinds of features influencing inferences about familiar (Experiments 1 and 2) and novel (Experiment 3) test items. Those in the typical inference condition focused primarily on typical features. Together with formal modelling, these results challenge previous accounts that have characterized inference learning as producing a focus on typical category features. The results show that two different kinds of inference learning are possible and that these are subserved by different kinds of category representations.

  18. Perceptual inference.

    PubMed

    Aggelopoulos, Nikolaos C

    2015-08-01

    Perceptual inference refers to the ability to infer sensory stimuli from predictions that result from internal neural representations built through prior experience. Methods of Bayesian statistical inference and decision theory model cognition adequately by using error sensing either in guiding action or in "generative" models that predict the sensory information. In this framework, perception can be seen as a process qualitatively distinct from sensation, a process of information evaluation using previously acquired and stored representations (memories) that is guided by sensory feedback. The stored representations can be utilised as internal models of sensory stimuli enabling long term associations, for example in operant conditioning. Evidence for perceptual inference is contributed by such phenomena as the cortical co-localisation of object perception with object memory, the response invariance in the responses of some neurons to variations in the stimulus, as well as from situations in which perception can be dissociated from sensation. In the context of perceptual inference, sensory areas of the cerebral cortex that have been facilitated by a priming signal may be regarded as comparators in a closed feedback loop, similar to the better known motor reflexes in the sensorimotor system. The adult cerebral cortex can be regarded as similar to a servomechanism, in using sensory feedback to correct internal models, producing predictions of the outside world on the basis of past experience. Copyright © 2015 Elsevier Ltd. All rights reserved.

  19. About recent star formation rates inferences

    NASA Astrophysics Data System (ADS)

    Cerviño, M.; Bongiovanni, A.; Hidalgo, S.

    2017-03-01

    Star Formation Rate (SFR) inferences are based in the so-called constant SFR approximation, where synthesis models are require to provide a calibration; we aims to study the key points of such approximation to produce accurate SFR inferences. We use the intrinsic algebra used in synthesis models, and we explore how SFR can be inferred from the integrated light without any assumption about the underling Star Formation history (SFH). We show that the constant SFR approximation is actually a simplified expression of more deeper characteristics of synthesis models: It is a characterization of the evolution of single stellar populations (SSPs), acting the SSPs as sensitivity curve over different measures of the SFH can be obtained. As results, we find that (1) the best age to calibrate SFR indices is the age of the observed system (i.e. about 13 Gyr for z = 0 systems); (2) constant SFR and steady-state luminosities are not requirements to calibrate the SFR ; (3) it is not possible to define a SFR single time scale over which the recent SFH is averaged, and we suggest to use typical SFR indices (ionizing flux, UV fluxes) together with no typical ones (optical/IR fluxes) to correct the SFR from the contribution of the old component of the SFH, we show how to use galaxy colors to quote age ranges where the recent component of the SFH is stronger/softer than the older component. Particular values of SFR calibrations are (almost) not affect by this work, but the meaning of what is obtained by SFR inferences does. In our framework, results as the correlation of SFR time scales with galaxy colors, or the sensitivity of different SFR indices to sort and long scale variations in the SFH, fit naturally. In addition, the present framework provides a theoretical guideline to optimize the available information from data/numerical experiments to improve the accuracy of SFR inferences. More info en Cerviño, Bongiovanni & Hidalgo A&A 588, 108C (2016)

  20. Large Scale Flood Risk Analysis using a New Hyper-resolution Population Dataset

    NASA Astrophysics Data System (ADS)

    Smith, A.; Neal, J. C.; Bates, P. D.; Quinn, N.; Wing, O.

    2017-12-01

    Here we present the first national scale flood risk analyses, using high resolution Facebook Connectivity Lab population data and data from a hyper resolution flood hazard model. In recent years the field of large scale hydraulic modelling has been transformed by new remotely sensed datasets, improved process representation, highly efficient flow algorithms and increases in computational power. These developments have allowed flood risk analysis to be undertaken in previously unmodeled territories and from continental to global scales. Flood risk analyses are typically conducted via the integration of modelled water depths with an exposure dataset. Over large scales and in data poor areas, these exposure data typically take the form of a gridded population dataset, estimating population density using remotely sensed data and/or locally available census data. The local nature of flooding dictates that for robust flood risk analysis to be undertaken both hazard and exposure data should sufficiently resolve local scale features. Global flood frameworks are enabling flood hazard data to produced at 90m resolution, resulting in a mis-match with available population datasets which are typically more coarsely resolved. Moreover, these exposure data are typically focused on urban areas and struggle to represent rural populations. In this study we integrate a new population dataset with a global flood hazard model. The population dataset was produced by the Connectivity Lab at Facebook, providing gridded population data at 5m resolution, representing a resolution increase over previous countrywide data sets of multiple orders of magnitude. Flood risk analysis undertaken over a number of developing countries are presented, along with a comparison of flood risk analyses undertaken using pre-existing population datasets.

  1. Efficient characterisation of large deviations using population dynamics

    NASA Astrophysics Data System (ADS)

    Brewer, Tobias; Clark, Stephen R.; Bradford, Russell; Jack, Robert L.

    2018-05-01

    We consider population dynamics as implemented by the cloning algorithm for analysis of large deviations of time-averaged quantities. We use the simple symmetric exclusion process with periodic boundary conditions as a prototypical example and investigate the convergence of the results with respect to the algorithmic parameters, focussing on the dynamical phase transition between homogeneous and inhomogeneous states, where convergence is relatively difficult to achieve. We discuss how the performance of the algorithm can be optimised, and how it can be efficiently exploited on parallel computing platforms.

  2. Using DNA fingerprints to infer familial relationships within NHANES III households

    PubMed Central

    Katki, Hormuzd A.; Sanders, Christopher L.; Graubard, Barry I.; Bergen, Andrew W.

    2009-01-01

    Developing, targeting, and evaluating genomic strategies for population-based disease prevention require population-based data. In response to this urgent need, genotyping has been conducted within the Third National Health and Nutrition Examination (NHANES III), the nationally-representative household-interview health survey in the U.S. However, before these genetic analyses can occur, family relationships within households must be accurately ascertained. Unfortunately, reported family relationships within NHANES III households based on questionnaire data are incomplete and inconclusive with regards to actual biological relatedness of family members. We inferred family relationships within households using DNA fingerprints (Identifiler®) that contain the DNA loci used by law enforcement agencies for forensic identification of individuals. However, performance of these loci for relationship inference is not well understood. We evaluated two competing statistical methods for relationship inference on pairs of household members: an exact likelihood ratio relying on allele frequencies to an Identical By State (IBS) likelihood ratio that only requires matching alleles. We modified these methods to account for genotyping errors and population substructure. The two methods usually agree on the rankings of the most likely relationships. However, the IBS method underestimates the likelihood ratio by not accounting for the informativeness of matching rare alleles. The likelihood ratio is sensitive to estimates of population substructure, and parent-child relationships are sensitive to the specified genotyping error rate. These loci were unable to distinguish second-degree relationships and cousins from being unrelated. The genetic data is also useful for verifying reported relationships and identifying data quality issues. An important by-product is the first explicitly nationally-representative estimates of allele frequencies at these ubiquitous forensic loci. PMID

  3. Paleolithic Contingent in Modern Japanese: Estimation and Inference using Genome-wide Data

    PubMed Central

    He, Yungang; Wang, Wei R.; Xu, Shuhua; Jin, Li; SNP Consortium, Pan-Asia

    2012-01-01

    The genetic origins of Japanese populations have been controversial. Upper Paleolithic Japanese, i.e. Jomon, developed independently in Japanese islands for more than 10,000 years until the isolation was ended with the influxes of continental immigrants about 2,000 years ago. However, the knowledge of origin of Jomon and its contribution to the genetic pool of contemporary Japanese is still limited, albeit the extensive studies using mtDNA and Y chromosomes. In this report, we aimed to infer the origin of Jomon and to estimate its contribution to Japanese by fitting an admixture model with missing data from Jomon to a genome-wide data from 94 worldwide populations. Our results showed that the genetic contributions of Jomon, the Paleolithic contingent in Japanese, are 54.3∼62.3% in Ryukyuans and 23.1∼39.5% in mainland Japanese, respectively. Utilizing inferred allele frequencies of the Jomon population, we further showed the Paleolithic contingent in Japanese had a Northeast Asia origin. PMID:22482036

  4. Bayesian inferences suggest that Amazon Yunga Natives diverged from Andeans less than 5000 ybp: implications for South American prehistory.

    PubMed

    Scliar, Marilia O; Gouveia, Mateus H; Benazzo, Andrea; Ghirotto, Silvia; Fagundes, Nelson J R; Leal, Thiago P; Magalhães, Wagner C S; Pereira, Latife; Rodrigues, Maira R; Soares-Souza, Giordano B; Cabrera, Lilia; Berg, Douglas E; Gilman, Robert H; Bertorelle, Giorgio; Tarazona-Santos, Eduardo

    2014-09-30

    Archaeology reports millenary cultural contacts between Peruvian Coast-Andes and the Amazon Yunga, a rainforest transitional region between Andes and Lower Amazonia. To clarify the relationships between cultural and biological evolution of these populations, in particular between Amazon Yungas and Andeans, we used DNA-sequence data, a model-based Bayesian approach and several statistical validations to infer a set of demographic parameters. We found that the genetic diversity of the Shimaa (an Amazon Yunga population) is a subset of that of Quechuas from Central-Andes. Using the Isolation-with-Migration population genetics model, we inferred that the Shimaa ancestors were a small subgroup that split less than 5300 years ago (after the development of complex societies) from an ancestral Andean population. After the split, the most plausible scenario compatible with our results is that the ancestors of Shimaas moved toward the Peruvian Amazon Yunga and incorporated the culture and language of some of their neighbors, but not a substantial amount of their genes. We validated our results using Approximate Bayesian Computations, posterior predictive tests and the analysis of pseudo-observed datasets. We presented a case study in which model-based Bayesian approaches, combined with necessary statistical validations, shed light into the prehistoric demographic relationship between Andeans and a population from the Amazon Yunga. Our results offer a testable model for the peopling of this large transitional environmental region between the Andes and the Lower Amazonia. However, studies on larger samples and involving more populations of these regions are necessary to confirm if the predominant Andean biological origin of the Shimaas is the rule, and not the exception.

  5. Genealogies of rapidly adapting populations

    PubMed Central

    Neher, Richard A.; Hallatschek, Oskar

    2013-01-01

    The genetic diversity of a species is shaped by its recent evolutionary history and can be used to infer demographic events or selective sweeps. Most inference methods are based on the null hypothesis that natural selection is a weak or infrequent evolutionary force. However, many species, particularly pathogens, are under continuous pressure to adapt in response to changing environments. A statistical framework for inference from diversity data of such populations is currently lacking. Towards this goal, we explore the properties of genealogies in a model of continual adaptation in asexual populations. We show that lineages trace back to a small pool of highly fit ancestors, in which almost simultaneous coalescence of more than two lineages frequently occurs. Whereas such multiple mergers are unlikely under the neutral coalescent, they create a unique genetic footprint in adapting populations. The site frequency spectrum of derived neutral alleles, for example, is nonmonotonic and has a peak at high frequencies, whereas Tajima’s D becomes more and more negative with increasing sample size. Because multiple merger coalescents emerge in many models of rapid adaptation, we argue that they should be considered as a null model for adapting populations. PMID:23269838

  6. Integrating evolutionary and functional approaches to infer adaptation at specific loci.

    PubMed

    Storz, Jay F; Wheat, Christopher W

    2010-09-01

    Inferences about adaptation at specific loci are often exclusively based on the static analysis of DNA sequence variation. Ideally,population-genetic evidence for positive selection serves as a stepping-off point for experimental studies to elucidate the functional significance of the putatively adaptive variation. We argue that inferences about adaptation at specific loci are best achieved by integrating the indirect, retrospective insights provided by population-genetic analyses with the more direct, mechanistic insights provided by functional experiments. Integrative studies of adaptive genetic variation may sometimes be motivated by experimental insights into molecular function, which then provide the impetus to perform population genetic tests to evaluate whether the functional variation is of adaptive significance. In other cases, studies may be initiated by genome scans of DNA variation to identify candidate loci for recent adaptation. Results of such analyses can then motivate experimental efforts to test whether the identified candidate loci do in fact contribute to functional variation in some fitness-related phenotype. Functional studies can provide corroborative evidence for positive selection at particular loci, and can potentially reveal specific molecular mechanisms of adaptation.

  7. Investigating large-scale brain dynamics using field potential recordings: analysis and interpretation.

    PubMed

    Pesaran, Bijan; Vinck, Martin; Einevoll, Gaute T; Sirota, Anton; Fries, Pascal; Siegel, Markus; Truccolo, Wilson; Schroeder, Charles E; Srinivasan, Ramesh

    2018-06-25

    New technologies to record electrical activity from the brain on a massive scale offer tremendous opportunities for discovery. Electrical measurements of large-scale brain dynamics, termed field potentials, are especially important to understanding and treating the human brain. Here, our goal is to provide best practices on how field potential recordings (electroencephalograms, magnetoencephalograms, electrocorticograms and local field potentials) can be analyzed to identify large-scale brain dynamics, and to highlight critical issues and limitations of interpretation in current work. We focus our discussion of analyses around the broad themes of activation, correlation, communication and coding. We provide recommendations for interpreting the data using forward and inverse models. The forward model describes how field potentials are generated by the activity of populations of neurons. The inverse model describes how to infer the activity of populations of neurons from field potential recordings. A recurring theme is the challenge of understanding how field potentials reflect neuronal population activity given the complexity of the underlying brain systems.

  8. Modeling coverage gaps in haplotype frequencies via Bayesian inference to improve stem cell donor selection.

    PubMed

    Louzoun, Yoram; Alter, Idan; Gragert, Loren; Albrecht, Mark; Maiers, Martin

    2018-05-01

    Regardless of sampling depth, accurate genotype imputation is limited in regions of high polymorphism which often have a heavy-tailed haplotype frequency distribution. Many rare haplotypes are thus unobserved. Statistical methods to improve imputation by extending reference haplotype distributions using linkage disequilibrium patterns that relate allele and haplotype frequencies have not yet been explored. In the field of unrelated stem cell transplantation, imputation of highly polymorphic human leukocyte antigen (HLA) genes has an important application in identifying the best-matched stem cell donor when searching large registries totaling over 28,000,000 donors worldwide. Despite these large registry sizes, a significant proportion of searched patients present novel HLA haplotypes. Supporting this observation, HLA population genetic models have indicated that many extant HLA haplotypes remain unobserved. The absent haplotypes are a significant cause of error in haplotype matching. We have applied a Bayesian inference methodology for extending haplotype frequency distributions, using a model where new haplotypes are created by recombination of observed alleles. Applications of this joint probability model offer significant improvement in frequency distribution estimates over the best existing alternative methods, as we illustrate using five-locus HLA frequency data from the National Marrow Donor Program registry. Transplant matching algorithms and disease association studies involving phasing and imputation of rare variants may benefit from this statistical inference framework.

  9. Self-fertilization, long-distance flash invasion and biogeography shape the population structure of Pseudosuccinea columella at the worldwide scale.

    PubMed

    Lounnas, M; Correa, A C; Vázquez, A A; Dia, A; Escobar, J S; Nicot, A; Arenas, J; Ayaqui, R; Dubois, M P; Gimenez, T; Gutiérrez, A; González-Ramírez, C; Noya, O; Prepelitchi, L; Uribe, N; Wisnivesky-Colli, C; Yong, M; David, P; Loker, E S; Jarne, P; Pointier, J P; Hurtrez-Boussès, S

    2017-02-01

    Population genetic studies are efficient for inferring the invasion history based on a comparison of native and invasive populations, especially when conducted at species scale. An expected outcome in invasive populations is variability loss, and this is especially true in self-fertilizing species. We here focus on the self-fertilizing Pseudosuccinea columella, an invasive hermaphroditic freshwater snail that has greatly expanded its geographic distribution and that acts as intermediate host of Fasciola hepatica, the causative agent of human and veterinary fasciolosis. We evaluated the distribution of genetic diversity at the largest geographic scale analysed to date in this species by surveying 80 populations collected during 16 years from 14 countries, using eight nuclear microsatellites and two mitochondrial genes. As expected, populations from North America, the putative origin area, were strongly structured by selfing and history and harboured much more genetic variability than invasive populations. We found high selfing rates (when it was possible to infer it), none-to-low genetic variability and strong population structure in most invasive populations. Strikingly, we found a unique genotype/haplotype in populations from eight invaded regions sampled all over the world. Moreover, snail populations resistant to infection by the parasite are genetically distinct from susceptible populations. Our results are compatible with repeated introductions in South America and flash worldwide invasion by this unique genotype/haplotype. Our study illustrates the population genetic consequences of biological invasion in a highly selfing species at very large geographic scale. We discuss how such a large-scale flash invasion may affect the spread of fasciolosis. © 2016 John Wiley & Sons Ltd.

  10. With or without you: predictive coding and Bayesian inference in the brain

    PubMed Central

    Aitchison, Laurence; Lengyel, Máté

    2018-01-01

    Two theoretical ideas have emerged recently with the ambition to provide a unifying functional explanation of neural population coding and dynamics: predictive coding and Bayesian inference. Here, we describe the two theories and their combination into a single framework: Bayesian predictive coding. We clarify how the two theories can be distinguished, despite sharing core computational concepts and addressing an overlapping set of empirical phenomena. We argue that predictive coding is an algorithmic / representational motif that can serve several different computational goals of which Bayesian inference is but one. Conversely, while Bayesian inference can utilize predictive coding, it can also be realized by a variety of other representations. We critically evaluate the experimental evidence supporting Bayesian predictive coding and discuss how to test it more directly. PMID:28942084

  11. From Coexpression to Coregulation: An Approach to Inferring Transcriptional Regulation Among Gene Classes from Large-Scale Expression Data

    NASA Technical Reports Server (NTRS)

    Mjolsness, Eric; Castano, Rebecca; Mann, Tobias; Wold, Barbara

    2000-01-01

    We provide preliminary evidence that existing algorithms for inferring small-scale gene regulation networks from gene expression data can be adapted to large-scale gene expression data coming from hybridization microarrays. The essential steps are (I) clustering many genes by their expression time-course data into a minimal set of clusters of co-expressed genes, (2) theoretically modeling the various conditions under which the time-courses are measured using a continuous-time analog recurrent neural network for the cluster mean time-courses, (3) fitting such a regulatory model to the cluster mean time courses by simulated annealing with weight decay, and (4) analysing several such fits for commonalities in the circuit parameter sets including the connection matrices. This procedure can be used to assess the adequacy of existing and future gene expression time-course data sets for determining transcriptional regulatory relationships such as coregulation.

  12. King penguin demography since the last glaciation inferred from genome-wide data

    PubMed Central

    Trucchi, Emiliano; Gratton, Paolo; Whittington, Jason D.; Cristofari, Robin; Le Maho, Yvon; Stenseth, Nils Chr; Le Bohec, Céline

    2014-01-01

    How natural climate cycles, such as past glacial/interglacial patterns, have shaped species distributions at the high-latitude regions of the Southern Hemisphere is still largely unclear. Here, we show how the post-glacial warming following the Last Glacial Maximum (ca 18 000 years ago), allowed the (re)colonization of the fragmented sub-Antarctic habitat by an upper-level marine predator, the king penguin Aptenodytes patagonicus. Using restriction site-associated DNA sequencing and standard mitochondrial data, we tested the behaviour of subsets of anonymous nuclear loci in inferring past demography through coalescent-based and allele frequency spectrum analyses. Our results show that the king penguin population breeding on Crozet archipelago steeply increased in size, closely following the Holocene warming recorded in the Epica Dome C ice core. The following population growth can be explained by a threshold model in which the ecological requirements of this species (year-round ice-free habitat for breeding and access to a major source of food such as the Antarctic Polar Front) were met on Crozet soon after the Pleistocene/Holocene climatic transition. PMID:24920481

  13. Sociocultural behavior, sex-biased admixture, and effective population sizes in Central African Pygmies and non-Pygmies.

    PubMed

    Verdu, Paul; Becker, Noémie S A; Froment, Alain; Georges, Myriam; Grugni, Viola; Quintana-Murci, Lluis; Hombert, Jean-Marie; Van der Veen, Lolke; Le Bomin, Sylvie; Bahuchet, Serge; Heyer, Evelyne; Austerlitz, Frédéric

    2013-04-01

    Sociocultural phenomena, such as exogamy or phylopatry, can largely determine human sex-specific demography. In Central Africa, diverging patterns of sex-specific genetic variation have been observed between mobile hunter-gatherer Pygmies and sedentary agricultural non-Pygmies. However, their sex-specific demography remains largely unknown. Using population genetics and approximate Bayesian computation approaches, we inferred male and female effective population sizes, sex-specific migration, and admixture rates in 23 Central African Pygmy and non-Pygmy populations, genotyped for autosomal, X-linked, Y-linked, and mitochondrial markers. We found much larger effective population sizes and migration rates among non-Pygmy populations than among Pygmies, in agreement with the recent expansions and migrations of non-Pygmies and, conversely, the isolation and stationary demography of Pygmy groups. We found larger effective sizes and migration rates for males than for females for Pygmies, and vice versa for non-Pygmies. Thus, although most Pygmy populations have patrilocal customs, their sex-specific genetic patterns resemble those of matrilocal populations. In fact, our results are consistent with a lower prevalence of polygyny and patrilocality in Pygmies compared with non-Pygmies and a potential female transmission of reproductive success in Pygmies. Finally, Pygmy populations showed variable admixture levels with the non-Pygmies, with often much larger introgression from male than from female lineages. Social discrimination against Pygmies triggering complex movements of spouses in intermarriages can explain these male-biased admixture patterns in a patrilocal context. We show how gender-related sociocultural phenomena can determine highly variable sex-specific demography among populations, and how population genetic approaches contrasting chromosomal types allow inferring detailed human sex-specific demographic history.

  14. Sociocultural Behavior, Sex-Biased Admixture, and Effective Population Sizes in Central African Pygmies and Non-Pygmies

    PubMed Central

    Verdu, Paul; Becker, Noémie S.A.; Froment, Alain; Georges, Myriam; Grugni, Viola; Quintana-Murci, Lluis; Hombert, Jean-Marie; Van der Veen, Lolke; Le Bomin, Sylvie; Bahuchet, Serge; Heyer, Evelyne; Austerlitz, Frédéric

    2013-01-01

    Sociocultural phenomena, such as exogamy or phylopatry, can largely determine human sex-specific demography. In Central Africa, diverging patterns of sex-specific genetic variation have been observed between mobile hunter–gatherer Pygmies and sedentary agricultural non-Pygmies. However, their sex-specific demography remains largely unknown. Using population genetics and approximate Bayesian computation approaches, we inferred male and female effective population sizes, sex-specific migration, and admixture rates in 23 Central African Pygmy and non-Pygmy populations, genotyped for autosomal, X-linked, Y-linked, and mitochondrial markers. We found much larger effective population sizes and migration rates among non-Pygmy populations than among Pygmies, in agreement with the recent expansions and migrations of non-Pygmies and, conversely, the isolation and stationary demography of Pygmy groups. We found larger effective sizes and migration rates for males than for females for Pygmies, and vice versa for non-Pygmies. Thus, although most Pygmy populations have patrilocal customs, their sex-specific genetic patterns resemble those of matrilocal populations. In fact, our results are consistent with a lower prevalence of polygyny and patrilocality in Pygmies compared with non-Pygmies and a potential female transmission of reproductive success in Pygmies. Finally, Pygmy populations showed variable admixture levels with the non-Pygmies, with often much larger introgression from male than from female lineages. Social discrimination against Pygmies triggering complex movements of spouses in intermarriages can explain these male-biased admixture patterns in a patrilocal context. We show how gender-related sociocultural phenomena can determine highly variable sex-specific demography among populations, and how population genetic approaches contrasting chromosomal types allow inferring detailed human sex-specific demographic history. PMID:23300254

  15. Inferring genetic interactions from comparative fitness data.

    PubMed

    Crona, Kristina; Gavryushkin, Alex; Greene, Devin; Beerenwinkel, Niko

    2017-12-20

    Darwinian fitness is a central concept in evolutionary biology. In practice, however, it is hardly possible to measure fitness for all genotypes in a natural population. Here, we present quantitative tools to make inferences about epistatic gene interactions when the fitness landscape is only incompletely determined due to imprecise measurements or missing observations. We demonstrate that genetic interactions can often be inferred from fitness rank orders, where all genotypes are ordered according to fitness, and even from partial fitness orders. We provide a complete characterization of rank orders that imply higher order epistasis. Our theory applies to all common types of gene interactions and facilitates comprehensive investigations of diverse genetic interactions. We analyzed various genetic systems comprising HIV-1, the malaria-causing parasite Plasmodium vivax , the fungus Aspergillus niger , and the TEM-family of β-lactamase associated with antibiotic resistance. For all systems, our approach revealed higher order interactions among mutations.

  16. Complex Population Dynamics and the Coalescent Under Neutrality

    PubMed Central

    Volz, Erik M.

    2012-01-01

    Estimates of the coalescent effective population size Ne can be poorly correlated with the true population size. The relationship between Ne and the population size is sensitive to the way in which birth and death rates vary over time. The problem of inference is exacerbated when the mechanisms underlying population dynamics are complex and depend on many parameters. In instances where nonparametric estimators of Ne such as the skyline struggle to reproduce the correct demographic history, model-based estimators that can draw on prior information about population size and growth rates may be more efficient. A coalescent model is developed for a large class of populations such that the demographic history is described by a deterministic nonlinear dynamical system of arbitrary dimension. This class of demographic model differs from those typically used in population genetics. Birth and death rates are not fixed, and no assumptions are made regarding the fraction of the population sampled. Furthermore, the population may be structured in such a way that gene copies reproduce both within and across demes. For this large class of models, it is shown how to derive the rate of coalescence, as well as the likelihood of a gene genealogy with heterochronous sampling and labeled taxa, and how to simulate a coalescent tree conditional on a complex demographic history. This theoretical framework encapsulates many of the models used by ecologists and epidemiologists and should facilitate the integration of population genetics with the study of mathematical population dynamics. PMID:22042576

  17. Movements of Diadromous Fish in Large Unregulated Tropical Rivers Inferred from Geochemical Tracers

    PubMed Central

    Walther, Benjamin D.; Dempster, Tim; Letnic, Mike; McCulloch, Malcolm T.

    2011-01-01

    Patterns of migration and habitat use in diadromous fishes can be highly variable among individuals. Most investigations into diadromous movement patterns have been restricted to populations in regulated rivers, and little information exists for those in unregulated catchments. We quantified movements of migratory barramundi Lates calcarifer (Bloch) in two large unregulated rivers in northern Australia using both elemental (Sr/Ba) and isotope (87Sr/86Sr) ratios in aragonitic ear stones, or otoliths. Chemical life history profiles indicated significant individual variation in habitat use, particularly among chemically distinct freshwater habitats within a catchment. A global zoning algorithm was used to quantify distinct changes in chemical signatures across profiles. This algorithm identified between 2 and 6 distinct chemical habitats in individual profiles, indicating variable movement among habitats. Profiles of 87Sr/86Sr ratios were notably distinct among individuals, with highly radiogenic values recorded in some otoliths. This variation suggested that fish made full use of habitats across the entire catchment basin. Our results show that unrestricted movement among freshwater habitats is an important component of diadromous life histories for populations in unregulated systems. PMID:21494693

  18. Bayesian pedigree inference with small numbers of single nucleotide polymorphisms via a factor-graph representation.

    PubMed

    Anderson, Eric C; Ng, Thomas C

    2016-02-01

    We develop a computational framework for addressing pedigree inference problems using small numbers (80-400) of single nucleotide polymorphisms (SNPs). Our approach relaxes the assumptions, which are commonly made, that sampling is complete with respect to the pedigree and that there is no genotyping error. It relies on representing the inferred pedigree as a factor graph and invoking the Sum-Product algorithm to compute and store quantities that allow the joint probability of the data to be rapidly computed under a large class of rearrangements of the pedigree structure. This allows efficient MCMC sampling over the space of pedigrees, and, hence, Bayesian inference of pedigree structure. In this paper we restrict ourselves to inference of pedigrees without loops using SNPs assumed to be unlinked. We present the methodology in general for multigenerational inference, and we illustrate the method by applying it to the inference of full sibling groups in a large sample (n=1157) of Chinook salmon typed at 95 SNPs. The results show that our method provides a better point estimate and estimate of uncertainty than the currently best-available maximum-likelihood sibling reconstruction method. Extensions of this work to more complex scenarios are briefly discussed. Published by Elsevier Inc.

  19. Computational Precision of Mental Inference as Critical Source of Human Choice Suboptimality.

    PubMed

    Drugowitsch, Jan; Wyart, Valentin; Devauchelle, Anne-Dominique; Koechlin, Etienne

    2016-12-21

    Making decisions in uncertain environments often requires combining multiple pieces of ambiguous information from external cues. In such conditions, human choices resemble optimal Bayesian inference, but typically show a large suboptimal variability whose origin remains poorly understood. In particular, this choice suboptimality might arise from imperfections in mental inference rather than in peripheral stages, such as sensory processing and response selection. Here, we dissociate these three sources of suboptimality in human choices based on combining multiple ambiguous cues. Using a novel quantitative approach for identifying the origin and structure of choice variability, we show that imperfections in inference alone cause a dominant fraction of suboptimal choices. Furthermore, two-thirds of this suboptimality appear to derive from the limited precision of neural computations implementing inference rather than from systematic deviations from Bayes-optimal inference. These findings set an upper bound on the accuracy and ultimate predictability of human choices in uncertain environments. Copyright © 2016 Elsevier Inc. All rights reserved.

  20. Optimal inference with suboptimal models: Addiction and active Bayesian inference

    PubMed Central

    Schwartenbeck, Philipp; FitzGerald, Thomas H.B.; Mathys, Christoph; Dolan, Ray; Wurst, Friedrich; Kronbichler, Martin; Friston, Karl

    2015-01-01

    When casting behaviour as active (Bayesian) inference, optimal inference is defined with respect to an agent’s beliefs – based on its generative model of the world. This contrasts with normative accounts of choice behaviour, in which optimal actions are considered in relation to the true structure of the environment – as opposed to the agent’s beliefs about worldly states (or the task). This distinction shifts an understanding of suboptimal or pathological behaviour away from aberrant inference as such, to understanding the prior beliefs of a subject that cause them to behave less ‘optimally’ than our prior beliefs suggest they should behave. Put simply, suboptimal or pathological behaviour does not speak against understanding behaviour in terms of (Bayes optimal) inference, but rather calls for a more refined understanding of the subject’s generative model upon which their (optimal) Bayesian inference is based. Here, we discuss this fundamental distinction and its implications for understanding optimality, bounded rationality and pathological (choice) behaviour. We illustrate our argument using addictive choice behaviour in a recently described ‘limited offer’ task. Our simulations of pathological choices and addictive behaviour also generate some clear hypotheses, which we hope to pursue in ongoing empirical work. PMID:25561321

  1. Single cell versus large population analysis: cell variability in elemental intracellular concentration and distribution.

    PubMed

    Malucelli, Emil; Procopio, Alessandra; Fratini, Michela; Gianoncelli, Alessandra; Notargiacomo, Andrea; Merolle, Lucia; Sargenti, Azzurra; Castiglioni, Sara; Cappadone, Concettina; Farruggia, Giovanna; Lombardo, Marco; Lagomarsino, Stefano; Maier, Jeanette A; Iotti, Stefano

    2018-01-01

    The quantification of elemental concentration in cells is usually performed by analytical assays on large populations missing peculiar but important rare cells. The present article aims at comparing the elemental quantification in single cells and cell population in three different cell types using a new approach for single cells elemental analysis performed at sub-micrometer scale combining X-ray fluorescence microscopy and atomic force microscopy. The attention is focused on the light element Mg, exploiting the opportunity to compare the single cell quantification to the cell population analysis carried out by a highly Mg-selective fluorescent chemosensor. The results show that the single cell analysis reveals the same Mg differences found in large population of the different cell strains studied. However, in one of the cell strains, single cell analysis reveals two cells with an exceptionally high intracellular Mg content compared with the other cells of the same strain. The single cell analysis allows mapping Mg and other light elements in whole cells at sub-micrometer scale. A detailed intensity correlation analysis on the two cells with the highest Mg content reveals that Mg subcellular localization correlates with oxygen in a different fashion with respect the other sister cells of the same strain. Graphical abstract Single cells or large population analysis this is the question!

  2. Exploring differences in pain beliefs within and between a large nonclinical (workplace) population and a clinical (chronic low back pain) population using the pain beliefs questionnaire.

    PubMed

    Baird, Andrew J; Haslam, Roger A

    2013-12-01

    Beliefs, cognitions, and behaviors relating to pain can be associated with a range of negative outcomes. In patients, certain beliefs are associated with increased levels of pain and related disability. There are few data, however, showing the extent to which beliefs of patients differ from those of the general population. This study explored pain beliefs in a large nonclinical population and a chronic low back pain (CLBP) sample using the Pain Beliefs Questionnaire (PBQ) to identify differences in scores and factor structures between and within the samples. This was a cross-sectional study. The samples comprised patients attending a rehabilitation program and respondents to a workplace survey. Pain beliefs were assessed using the PBQ, which incorporates 2 scales: organic and psychological. Exploratory factor analysis was used to explore variations in factor structure within and between samples. The relationship between the 2 scales also was examined. Patients reported higher organic scores and lower psychological scores than the nonclinical sample. Within the nonclinical sample, those who reported frequent pain scored higher on the organic scale than those who did not. Factor analysis showed variations in relation to the presence of pain. The relationship between scales was stronger in those not reporting frequent pain. This was a cross-sectional study; therefore, no causal inferences can be made. Patients experiencing CLBP adopt a more biomedical perspective on pain than nonpatients. The presence of pain is also associated with increased biomedical thinking in a nonclinical sample. However, the impact is not only on the strength of beliefs, but also on the relationship between elements of belief and the underlying belief structure.

  3. Dopamine, reward learning, and active inference.

    PubMed

    FitzGerald, Thomas H B; Dolan, Raymond J; Friston, Karl

    2015-01-01

    Temporal difference learning models propose phasic dopamine signaling encodes reward prediction errors that drive learning. This is supported by studies where optogenetic stimulation of dopamine neurons can stand in lieu of actual reward. Nevertheless, a large body of data also shows that dopamine is not necessary for learning, and that dopamine depletion primarily affects task performance. We offer a resolution to this paradox based on an hypothesis that dopamine encodes the precision of beliefs about alternative actions, and thus controls the outcome-sensitivity of behavior. We extend an active inference scheme for solving Markov decision processes to include learning, and show that simulated dopamine dynamics strongly resemble those actually observed during instrumental conditioning. Furthermore, simulated dopamine depletion impairs performance but spares learning, while simulated excitation of dopamine neurons drives reward learning, through aberrant inference about outcome states. Our formal approach provides a novel and parsimonious reconciliation of apparently divergent experimental findings.

  4. Complex postglacial recolonization inferred from population genetic structure of mottled sculpin Cottus bairdii in tributaries of eastern Lake Michigan, U.S.A.

    PubMed

    Homola, J J; Ruetz, C R; Kohler, S L; Thum, R A

    2016-11-01

    This study used analyses of the genetic structure of a non-game fish species, the mottled sculpin Cottus bairdii to hypothesize probable recolonization routes used by cottids and possibly other Laurentian Great Lakes fishes following glacial recession. Based on samples from 16 small streams in five major Lake Michigan, U.S.A., tributary basins, significant interpopulation differentiation was documented (overall F ST = 0·235). Differentiation was complex, however, with unexpectedly high genetic similarity among basins as well as occasionally strong differentiation within basins, despite relatively close geographic proximity of populations. Genetic dissimilarities were identified between eastern and western populations within river basins, with similarities existing between eastern and western populations across basins. Given such patterns, recolonization is hypothesized to have occurred on three occasions from more than one glacial refugium, with a secondary vicariant event resulting from reduction in the water level of ancestral Lake Michigan. By studying the phylogeography of a small, non-game fish species, this study provides insight into recolonization dynamics of the region that could be difficult to infer from game species that are often broadly dispersed by humans. © 2016 The Fisheries Society of the British Isles.

  5. Linkages between large-scale climate patterns and the dynamics of Alaskan caribou populations

    Treesearch

    Kyle Joly; David R. Klein; David L. Verbyla; T. Scott Rupp; F. Stuart Chapin

    2011-01-01

    Recent research has linked climate warming to global declines in caribou and reindeer (both Rangifer tarandus) populations. We hypothesize large-scale climate patterns are a contributing factor explaining why these declines are not universal. To test our hypothesis for such relationships among Alaska caribou herds, we calculated the population growth...

  6. Inferring Molecular Processes Heterogeneity from Transcriptional Data.

    PubMed

    Gogolewski, Krzysztof; Wronowska, Weronika; Lech, Agnieszka; Lesyng, Bogdan; Gambin, Anna

    2017-01-01

    RNA microarrays and RNA-seq are nowadays standard technologies to study the transcriptional activity of cells. Most studies focus on tracking transcriptional changes caused by specific experimental conditions. Information referring to genes up- and downregulation is evaluated analyzing the behaviour of relatively large population of cells by averaging its properties. However, even assuming perfect sample homogeneity, different subpopulations of cells can exhibit diverse transcriptomic profiles, as they may follow different regulatory/signaling pathways. The purpose of this study is to provide a novel methodological scheme to account for possible internal, functional heterogeneity in homogeneous cell lines, including cancer ones. We propose a novel computational method to infer the proportion between subpopulations of cells that manifest various functional behaviour in a given sample. Our method was validated using two datasets from RNA microarray experiments. Both experiments aimed to examine cell viability in specific experimental conditions. The presented methodology can be easily extended to RNA-seq data as well as other molecular processes. Moreover, it complements standard tools to indicate most important networks from transcriptomic data and in particular could be useful in the analysis of cancer cell lines affected by biologically active compounds or drugs.

  7. Inferring Molecular Processes Heterogeneity from Transcriptional Data

    PubMed Central

    Wronowska, Weronika; Lesyng, Bogdan; Gambin, Anna

    2017-01-01

    RNA microarrays and RNA-seq are nowadays standard technologies to study the transcriptional activity of cells. Most studies focus on tracking transcriptional changes caused by specific experimental conditions. Information referring to genes up- and downregulation is evaluated analyzing the behaviour of relatively large population of cells by averaging its properties. However, even assuming perfect sample homogeneity, different subpopulations of cells can exhibit diverse transcriptomic profiles, as they may follow different regulatory/signaling pathways. The purpose of this study is to provide a novel methodological scheme to account for possible internal, functional heterogeneity in homogeneous cell lines, including cancer ones. We propose a novel computational method to infer the proportion between subpopulations of cells that manifest various functional behaviour in a given sample. Our method was validated using two datasets from RNA microarray experiments. Both experiments aimed to examine cell viability in specific experimental conditions. The presented methodology can be easily extended to RNA-seq data as well as other molecular processes. Moreover, it complements standard tools to indicate most important networks from transcriptomic data and in particular could be useful in the analysis of cancer cell lines affected by biologically active compounds or drugs. PMID:29362714

  8. Demographic Divergence History of Pied Flycatcher and Collared Flycatcher Inferred from Whole-Genome Re-sequencing Data

    PubMed Central

    Nadachowska-Brzyska, Krystyna; Burri, Reto; Olason, Pall I.; Kawakami, Takeshi; Smeds, Linnéa; Ellegren, Hans

    2013-01-01

    Profound knowledge of demographic history is a prerequisite for the understanding and inference of processes involved in the evolution of population differentiation and speciation. Together with new coalescent-based methods, the recent availability of genome-wide data enables investigation of differentiation and divergence processes at unprecedented depth. We combined two powerful approaches, full Approximate Bayesian Computation analysis (ABC) and pairwise sequentially Markovian coalescent modeling (PSMC), to reconstruct the demographic history of the split between two avian speciation model species, the pied flycatcher and collared flycatcher. Using whole-genome re-sequencing data from 20 individuals, we investigated 15 demographic models including different levels and patterns of gene flow, and changes in effective population size over time. ABC provided high support for recent (mode 0.3 my, range <0.7 my) species divergence, declines in effective population size of both species since their initial divergence, and unidirectional recent gene flow from pied flycatcher into collared flycatcher. The estimated divergence time and population size changes, supported by PSMC results, suggest that the ancestral species persisted through one of the glacial periods of middle Pleistocene and then split into two large populations that first increased in size before going through severe bottlenecks and expanding into their current ranges. Secondary contact appears to have been established after the last glacial maximum. The severity of the bottlenecks at the last glacial maximum is indicated by the discrepancy between current effective population sizes (20,000–80,000) and census sizes (5–50 million birds) of the two species. The recent divergence time challenges the supposition that avian speciation is a relatively slow process with extended times for intrinsic postzygotic reproductive barriers to evolve. Our study emphasizes the importance of using genome-wide data to

  9. Demographic divergence history of pied flycatcher and collared flycatcher inferred from whole-genome re-sequencing data.

    PubMed

    Nadachowska-Brzyska, Krystyna; Burri, Reto; Olason, Pall I; Kawakami, Takeshi; Smeds, Linnéa; Ellegren, Hans

    2013-11-01

    Profound knowledge of demographic history is a prerequisite for the understanding and inference of processes involved in the evolution of population differentiation and speciation. Together with new coalescent-based methods, the recent availability of genome-wide data enables investigation of differentiation and divergence processes at unprecedented depth. We combined two powerful approaches, full Approximate Bayesian Computation analysis (ABC) and pairwise sequentially Markovian coalescent modeling (PSMC), to reconstruct the demographic history of the split between two avian speciation model species, the pied flycatcher and collared flycatcher. Using whole-genome re-sequencing data from 20 individuals, we investigated 15 demographic models including different levels and patterns of gene flow, and changes in effective population size over time. ABC provided high support for recent (mode 0.3 my, range <0.7 my) species divergence, declines in effective population size of both species since their initial divergence, and unidirectional recent gene flow from pied flycatcher into collared flycatcher. The estimated divergence time and population size changes, supported by PSMC results, suggest that the ancestral species persisted through one of the glacial periods of middle Pleistocene and then split into two large populations that first increased in size before going through severe bottlenecks and expanding into their current ranges. Secondary contact appears to have been established after the last glacial maximum. The severity of the bottlenecks at the last glacial maximum is indicated by the discrepancy between current effective population sizes (20,000-80,000) and census sizes (5-50 million birds) of the two species. The recent divergence time challenges the supposition that avian speciation is a relatively slow process with extended times for intrinsic postzygotic reproductive barriers to evolve. Our study emphasizes the importance of using genome-wide data to

  10. 4P: fast computing of population genetics statistics from large DNA polymorphism panels

    PubMed Central

    Benazzo, Andrea; Panziera, Alex; Bertorelle, Giorgio

    2015-01-01

    Massive DNA sequencing has significantly increased the amount of data available for population genetics and molecular ecology studies. However, the parallel computation of simple statistics within and between populations from large panels of polymorphic sites is not yet available, making the exploratory analyses of a set or subset of data a very laborious task. Here, we present 4P (parallel processing of polymorphism panels), a stand-alone software program for the rapid computation of genetic variation statistics (including the joint frequency spectrum) from millions of DNA variants in multiple individuals and multiple populations. It handles a standard input file format commonly used to store DNA variation from empirical or simulation experiments. The computational performance of 4P was evaluated using large SNP (single nucleotide polymorphism) datasets from human genomes or obtained by simulations. 4P was faster or much faster than other comparable programs, and the impact of parallel computing using multicore computers or servers was evident. 4P is a useful tool for biologists who need a simple and rapid computer program to run exploratory population genetics analyses in large panels of genomic data. It is also particularly suitable to analyze multiple data sets produced in simulation studies. Unix, Windows, and MacOs versions are provided, as well as the source code for easier pipeline implementations. PMID:25628874

  11. Inferring Characteristics of Sensorimotor Behavior by Quantifying Dynamics of Animal Locomotion

    NASA Astrophysics Data System (ADS)

    Leung, KaWai

    Locomotion is one of the most well-studied topics in animal behavioral studies. Many fundamental and clinical research make use of the locomotion of an animal model to explore various aspects in sensorimotor behavior. In the past, most of these studies focused on population average of a specific trait due to limitation of data collection and processing power. With recent advance in computer vision and statistical modeling techniques, it is now possible to track and analyze large amounts of behavioral data. In this thesis, I present two projects that aim to infer the characteristics of sensorimotor behavior by quantifying the dynamics of locomotion of nematode Caenorhabditis elegans and fruit fly Drosophila melanogaster, shedding light on statistical dependence between sensing and behavior. In the first project, I investigate the possibility of inferring noxious sensory information from the behavior of Caenorhabditis elegans. I develop a statistical model to infer the heat stimulus level perceived by individual animals from their stereotyped escape responses after stimulation by an IR laser. The model allows quantification of analgesic-like effects of chemical agents or genetic mutations in the worm. At the same time, the method is able to differentiate perturbations of locomotion behavior that are beyond affecting the sensory system. With this model I propose experimental designs that allows statistically significant identification of analgesic-like effects. In the second project, I investigate the relationship of energy budget and stability of locomotion in determining the walking speed distribution of Drosophila melanogaster during aging. The locomotion stability at different age groups is estimated from video recordings using Floquet theory. I calculate the power consumption of different locomotion speed using a biomechanics model. In conclusion, the power consumption, not stability, predicts the locomotion speed distribution at different ages.

  12. Inferring cortical function in the mouse visual system through large-scale systems neuroscience.

    PubMed

    Hawrylycz, Michael; Anastassiou, Costas; Arkhipov, Anton; Berg, Jim; Buice, Michael; Cain, Nicholas; Gouwens, Nathan W; Gratiy, Sergey; Iyer, Ramakrishnan; Lee, Jung Hoon; Mihalas, Stefan; Mitelut, Catalin; Olsen, Shawn; Reid, R Clay; Teeter, Corinne; de Vries, Saskia; Waters, Jack; Zeng, Hongkui; Koch, Christof

    2016-07-05

    The scientific mission of the Project MindScope is to understand neocortex, the part of the mammalian brain that gives rise to perception, memory, intelligence, and consciousness. We seek to quantitatively evaluate the hypothesis that neocortex is a relatively homogeneous tissue, with smaller functional modules that perform a common computational function replicated across regions. We here focus on the mouse as a mammalian model organism with genetics, physiology, and behavior that can be readily studied and manipulated in the laboratory. We seek to describe the operation of cortical circuitry at the computational level by comprehensively cataloging and characterizing its cellular building blocks along with their dynamics and their cell type-specific connectivities. The project is also building large-scale experimental platforms (i.e., brain observatories) to record the activity of large populations of cortical neurons in behaving mice subject to visual stimuli. A primary goal is to understand the series of operations from visual input in the retina to behavior by observing and modeling the physical transformations of signals in the corticothalamic system. We here focus on the contribution that computer modeling and theory make to this long-term effort.

  13. Inferring genetic interactions from comparative fitness data

    PubMed Central

    2017-01-01

    Darwinian fitness is a central concept in evolutionary biology. In practice, however, it is hardly possible to measure fitness for all genotypes in a natural population. Here, we present quantitative tools to make inferences about epistatic gene interactions when the fitness landscape is only incompletely determined due to imprecise measurements or missing observations. We demonstrate that genetic interactions can often be inferred from fitness rank orders, where all genotypes are ordered according to fitness, and even from partial fitness orders. We provide a complete characterization of rank orders that imply higher order epistasis. Our theory applies to all common types of gene interactions and facilitates comprehensive investigations of diverse genetic interactions. We analyzed various genetic systems comprising HIV-1, the malaria-causing parasite Plasmodium vivax, the fungus Aspergillus niger, and the TEM-family of β-lactamase associated with antibiotic resistance. For all systems, our approach revealed higher order interactions among mutations. PMID:29260711

  14. Setting population targets for mammals using body mass as a predictor of population persistence.

    PubMed

    Hilbers, Jelle P; Santini, Luca; Visconti, Piero; Schipper, Aafke M; Pinto, Cecilia; Rondinini, Carlo; Huijbregts, Mark A J

    2017-04-01

    Conservation planning and biodiversity assessments need quantitative targets to optimize planning options and assess the adequacy of current species protection. However, targets aiming at persistence require population-specific data, which limit their use in favor of fixed and nonspecific targets, likely leading to unequal distribution of conservation efforts among species. We devised a method to derive equitable population targets; that is, quantitative targets of population size that ensure equal probabilities of persistence across a set of species and that can be easily inferred from species-specific traits. In our method, we used models of population dynamics across a range of life-history traits related to species' body mass to estimate minimum viable population targets. We applied our method to a range of body masses of mammals, from 2 g to 3825 kg. The minimum viable population targets decreased asymptotically with increasing body mass and were on the same order of magnitude as minimum viable population estimates from species- and context-specific studies. Our approach provides a compromise between pragmatic, nonspecific population targets and detailed context-specific estimates of population viability for which only limited data are available. It enables a first estimation of species-specific population targets based on a readily available trait and thus allows setting equitable targets for population persistence in large-scale and multispecies conservation assessments and planning. © 2016 The Authors. Conservation Biology published by Wiley Periodicals, Inc. on behalf of Society for Conservation Biology.

  15. Correction of Population Stratification in Large Multi-Ethnic Association Studies

    PubMed Central

    Serre, David; Montpetit, Alexandre; Paré, Guillaume; Engert, James C.; Yusuf, Salim; Keavney, Bernard; Hudson, Thomas J.; Anand, Sonia

    2008-01-01

    Background The vast majority of genetic risk factors for complex diseases have, taken individually, a small effect on the end phenotype. Population-based association studies therefore need very large sample sizes to detect significant differences between affected and non-affected individuals. Including thousands of affected individuals in a study requires recruitment in numerous centers, possibly from different geographic regions. Unfortunately such a recruitment strategy is likely to complicate the study design and to generate concerns regarding population stratification. Methodology/Principal Findings We analyzed 9,751 individuals representing three main ethnic groups - Europeans, Arabs and South Asians - that had been enrolled from 154 centers involving 52 countries for a global case/control study of acute myocardial infarction. All individuals were genotyped at 103 candidate genes using 1,536 SNPs selected with a tagging strategy that captures most of the genetic diversity in different populations. We show that relying solely on self-reported ethnicity is not sufficient to exclude population stratification and we present additional methods to identify and correct for stratification. Conclusions/Significance Our results highlight the importance of carefully addressing population stratification and of carefully “cleaning” the sample prior to analyses to obtain stronger signals of association and to avoid spurious results. PMID:18196181

  16. Population-expression models of immune response

    NASA Astrophysics Data System (ADS)

    Stromberg, Sean P.; Antia, Rustom; Nemenman, Ilya

    2013-06-01

    The immune response to a pathogen has two basic features. The first is the expansion of a few pathogen-specific cells to form a population large enough to control the pathogen. The second is the process of differentiation of cells from an initial naive phenotype to an effector phenotype which controls the pathogen, and subsequently to a memory phenotype that is maintained and responsible for long-term protection. The expansion and the differentiation have been considered largely independently. Changes in cell populations are typically described using ecologically based ordinary differential equation models. In contrast, differentiation of single cells is studied within systems biology and is frequently modeled by considering changes in gene and protein expression in individual cells. Recent advances in experimental systems biology make available for the first time data to allow the coupling of population and high dimensional expression data of immune cells during infections. Here we describe and develop population-expression models which integrate these two processes into systems biology on the multicellular level. When translated into mathematical equations, these models result in non-conservative, non-local advection-diffusion equations. We describe situations where the population-expression approach can make correct inference from data while previous modeling approaches based on common simplifying assumptions would fail. We also explore how model reduction techniques can be used to build population-expression models, minimizing the complexity of the model while keeping the essential features of the system. While we consider problems in immunology in this paper, we expect population-expression models to be more broadly applicable.

  17. Statistical Estimation of Orbital Debris Populations with a Spectrum of Object Size

    NASA Technical Reports Server (NTRS)

    Xu, Y. -l; Horstman, M.; Krisko, P. H.; Liou, J. -C; Matney, M.; Stansbery, E. G.; Stokely, C. L.; Whitlock, D.

    2008-01-01

    Orbital debris is a real concern for the safe operations of satellites. In general, the hazard of debris impact is a function of the size and spatial distributions of the debris populations. To describe and characterize the debris environment as reliably as possible, the current NASA Orbital Debris Engineering Model (ORDEM2000) is being upgraded to a new version based on new and better quality data. The data-driven ORDEM model covers a wide range of object sizes from 10 microns to greater than 1 meter. This paper reviews the statistical process for the estimation of the debris populations in the new ORDEM upgrade, and discusses the representation of large-size (greater than or equal to 1 m and greater than or equal to 10 cm) populations by SSN catalog objects and the validation of the statistical approach. Also, it presents results for the populations with sizes of greater than or equal to 3.3 cm, greater than or equal to 1 cm, greater than or equal to 100 micrometers, and greater than or equal to 10 micrometers. The orbital debris populations used in the new version of ORDEM are inferred from data based upon appropriate reference (or benchmark) populations instead of the binning of the multi-dimensional orbital-element space. This paper describes all of the major steps used in the population-inference procedure for each size-range. Detailed discussions on data analysis, parameter definition, the correlation between parameters and data, and uncertainty assessment are included.

  18. Probabilistic measures of persistence and extinction in measles (meta)populations.

    PubMed

    Gunning, Christian E; Wearing, Helen J

    2013-08-01

    Persistence and extinction are fundamental processes in ecological systems that are difficult to accurately measure due to stochasticity and incomplete observation. Moreover, these processes operate on multiple scales, from individual populations to metapopulations. Here, we examine an extensive new data set of measles case reports and associated demographics in pre-vaccine era US cities, alongside a classic England & Wales data set. We first infer the per-population quasi-continuous distribution of log incidence. We then use stochastic, spatially implicit metapopulation models to explore the frequency of rescue events and apparent extinctions. We show that, unlike critical community size, the inferred distributions account for observational processes, allowing direct comparisons between metapopulations. The inferred distributions scale with population size. We use these scalings to estimate extinction boundary probabilities. We compare these predictions with measurements in individual populations and random aggregates of populations, highlighting the importance of medium-sized populations in metapopulation persistence. © 2013 John Wiley & Sons Ltd/CNRS.

  19. Exact Bayesian Inference for Phylogenetic Birth-Death Models.

    PubMed

    Parag, K V; Pybus, O G

    2018-04-26

    Inferring the rates of change of a population from a reconstructed phylogeny of genetic sequences is a central problem in macro-evolutionary biology, epidemiology, and many other disciplines. A popular solution involves estimating the parameters of a birth-death process (BDP), which links the shape of the phylogeny to its birth and death rates. Modern BDP estimators rely on random Markov chain Monte Carlo (MCMC) sampling to infer these rates. Such methods, while powerful and scalable, cannot be guaranteed to converge, leading to results that may be hard to replicate or difficult to validate. We present a conceptually and computationally different parametric BDP inference approach using flexible and easy to implement Snyder filter (SF) algorithms. This method is deterministic so its results are provable, guaranteed, and reproducible. We validate the SF on constant rate BDPs and find that it solves BDP likelihoods known to produce robust estimates. We then examine more complex BDPs with time-varying rates. Our estimates compare well with a recently developed parametric MCMC inference method. Lastly, we performmodel selection on an empirical Agamid species phylogeny, obtaining results consistent with the literature. The SF makes no approximations, beyond those required for parameter quantisation and numerical integration, and directly computes the posterior distribution of model parameters. It is a promising alternative inference algorithm that may serve either as a standalone Bayesian estimator or as a useful diagnostic reference for validating more involved MCMC strategies. The Snyder filter is implemented in Matlab and the time-varying BDP models are simulated in R. The source code and data are freely available at https://github.com/kpzoo/snyder-birth-death-code. kris.parag@zoo.ox.ac.uk. Supplementary material is available at Bioinformatics online.

  20. Inferring Binary and Trinary Stellar Populations in Photometric and Astrometric Surveys

    NASA Astrophysics Data System (ADS)

    Widmark, Axel; Leistedt, Boris; Hogg, David W.

    2018-04-01

    Multiple stellar systems are ubiquitous in the Milky Way but are often unresolved and seen as single objects in spectroscopic, photometric, and astrometric surveys. However, modeling them is essential for developing a full understanding of large surveys such as Gaia and connecting them to stellar and Galactic models. In this paper, we address this problem by jointly fitting the Gaia and Two Micron All Sky Survey photometric and astrometric data using a data-driven Bayesian hierarchical model that includes populations of binary and trinary systems. This allows us to classify observations into singles, binaries, and trinaries, in a robust and efficient manner, without resorting to external models. We are able to identify multiple systems and, in some cases, make strong predictions for the properties of their unresolved stars. We will be able to compare such predictions with Gaia Data Release 4, which will contain astrometric identification and analysis of binary systems.

  1. Comparison of Drive Counts and Mark-Resight As Methods of Population Size Estimation of Highly Dense Sika Deer (Cervus nippon) Populations

    PubMed Central

    Takeshita, Kazutaka; Yoshida, Tsuyoshi; Igota, Hiromasa; Matsuura, Yukiko

    2016-01-01

    Assessing temporal changes in abundance indices is an important issue in the management of large herbivore populations. The drive counts method has been frequently used as a deer abundance index in mountainous regions. However, despite an inherent risk for observation errors in drive counts, which increase with deer density, evaluations of the utility of drive counts at a high deer density remain scarce. We compared the drive counts and mark-resight (MR) methods in the evaluation of a highly dense sika deer population (MR estimates ranged between 11 and 53 individuals/km2) on Nakanoshima Island, Hokkaido, Japan, between 1999 and 2006. This deer population experienced two large reductions in density; approximately 200 animals in total were taken from the population through a large-scale population removal and a separate winter mass mortality event. Although the drive counts tracked temporal changes in deer abundance on the island, they overestimated the counts for all years in comparison to the MR method. Increased overestimation in drive count estimates after the winter mass mortality event may be due to a double count derived from increased deer movement and recovery of body condition secondary to the mitigation of density-dependent food limitations. Drive counts are unreliable because they are affected by unfavorable factors such as bad weather, and they are cost-prohibitive to repeat, which precludes the calculation of confidence intervals. Therefore, the use of drive counts to infer the deer abundance needs to be reconsidered. PMID:27711181

  2. Comparison of Drive Counts and Mark-Resight As Methods of Population Size Estimation of Highly Dense Sika Deer (Cervus nippon) Populations.

    PubMed

    Takeshita, Kazutaka; Ikeda, Takashi; Takahashi, Hiroshi; Yoshida, Tsuyoshi; Igota, Hiromasa; Matsuura, Yukiko; Kaji, Koichi

    2016-01-01

    Assessing temporal changes in abundance indices is an important issue in the management of large herbivore populations. The drive counts method has been frequently used as a deer abundance index in mountainous regions. However, despite an inherent risk for observation errors in drive counts, which increase with deer density, evaluations of the utility of drive counts at a high deer density remain scarce. We compared the drive counts and mark-resight (MR) methods in the evaluation of a highly dense sika deer population (MR estimates ranged between 11 and 53 individuals/km2) on Nakanoshima Island, Hokkaido, Japan, between 1999 and 2006. This deer population experienced two large reductions in density; approximately 200 animals in total were taken from the population through a large-scale population removal and a separate winter mass mortality event. Although the drive counts tracked temporal changes in deer abundance on the island, they overestimated the counts for all years in comparison to the MR method. Increased overestimation in drive count estimates after the winter mass mortality event may be due to a double count derived from increased deer movement and recovery of body condition secondary to the mitigation of density-dependent food limitations. Drive counts are unreliable because they are affected by unfavorable factors such as bad weather, and they are cost-prohibitive to repeat, which precludes the calculation of confidence intervals. Therefore, the use of drive counts to infer the deer abundance needs to be reconsidered.

  3. Genomic inferences of domestication events are corroborated by written records in Brassica rapa.

    PubMed

    Qi, Xinshuai; An, Hong; Ragsdale, Aaron P; Hall, Tara E; Gutenkunst, Ryan N; Chris Pires, J; Barker, Michael S

    2017-07-01

    Demographic modelling is often used with population genomic data to infer the relationships and ages among populations. However, relatively few analyses are able to validate these inferences with independent data. Here, we leverage written records that describe distinct Brassica rapa crops to corroborate demographic models of domestication. Brassica rapa crops are renowned for their outstanding morphological diversity, but the relationships and order of domestication remain unclear. We generated genomewide SNPs from 126 accessions collected globally using high-throughput transcriptome data. Analyses of more than 31,000 SNPs across the B. rapa genome revealed evidence for five distinct genetic groups and supported a European-Central Asian origin of B. rapa crops. Our results supported the traditionally recognized South Asian and East Asian B. rapa groups with evidence that pak choi, Chinese cabbage and yellow sarson are likely monophyletic groups. In contrast, the oil-type B. rapa subsp. oleifera and brown sarson were polyphyletic. We also found no evidence to support the contention that rapini is the wild type or the earliest domesticated subspecies of B. rapa. Demographic analyses suggested that B. rapa was introduced to Asia 2,400-4,100 years ago, and that Chinese cabbage originated 1,200-2,100 years ago via admixture of pak choi and European-Central Asian B. rapa. We also inferred significantly different levels of founder effect among the B. rapa subspecies. Written records from antiquity that document these crops are consistent with these inferences. The concordance between our age estimates of domestication events with historical records provides unique support for our demographic inferences. © 2017 John Wiley & Sons Ltd.

  4. Evolution of prokaryote and eukaryote lines inferred from sequence evidence

    NASA Technical Reports Server (NTRS)

    Hunt, L. T.; George, D. G.; Yeh, L.-S.; Dayhoff, M. O.

    1984-01-01

    This paper describes the evolution of prokaryotes and early eukaryotes, including their symbiotic relationships, as inferred from phylogenetic trees of bacterial ferredoxin, 5S ribosomal RNA, ribulose-1,5-biphosphate carboxylase large chain, and mitochondrial cytochrome oxidase polypeptide II.

  5. Visual recognition and inference using dynamic overcomplete sparse learning.

    PubMed

    Murray, Joseph F; Kreutz-Delgado, Kenneth

    2007-09-01

    We present a hierarchical architecture and learning algorithm for visual recognition and other visual inference tasks such as imagination, reconstruction of occluded images, and expectation-driven segmentation. Using properties of biological vision for guidance, we posit a stochastic generative world model and from it develop a simplified world model (SWM) based on a tractable variational approximation that is designed to enforce sparse coding. Recent developments in computational methods for learning overcomplete representations (Lewicki & Sejnowski, 2000; Teh, Welling, Osindero, & Hinton, 2003) suggest that overcompleteness can be useful for visual tasks, and we use an overcomplete dictionary learning algorithm (Kreutz-Delgado, et al., 2003) as a preprocessing stage to produce accurate, sparse codings of images. Inference is performed by constructing a dynamic multilayer network with feedforward, feedback, and lateral connections, which is trained to approximate the SWM. Learning is done with a variant of the back-propagation-through-time algorithm, which encourages convergence to desired states within a fixed number of iterations. Vision tasks require large networks, and to make learning efficient, we take advantage of the sparsity of each layer to update only a small subset of elements in a large weight matrix at each iteration. Experiments on a set of rotated objects demonstrate various types of visual inference and show that increasing the degree of overcompleteness improves recognition performance in difficult scenes with occluded objects in clutter.

  6. Dopamine, reward learning, and active inference

    PubMed Central

    FitzGerald, Thomas H. B.; Dolan, Raymond J.; Friston, Karl

    2015-01-01

    Temporal difference learning models propose phasic dopamine signaling encodes reward prediction errors that drive learning. This is supported by studies where optogenetic stimulation of dopamine neurons can stand in lieu of actual reward. Nevertheless, a large body of data also shows that dopamine is not necessary for learning, and that dopamine depletion primarily affects task performance. We offer a resolution to this paradox based on an hypothesis that dopamine encodes the precision of beliefs about alternative actions, and thus controls the outcome-sensitivity of behavior. We extend an active inference scheme for solving Markov decision processes to include learning, and show that simulated dopamine dynamics strongly resemble those actually observed during instrumental conditioning. Furthermore, simulated dopamine depletion impairs performance but spares learning, while simulated excitation of dopamine neurons drives reward learning, through aberrant inference about outcome states. Our formal approach provides a novel and parsimonious reconciliation of apparently divergent experimental findings. PMID:26581305

  7. Functional Inference of Complex Anatomical Tendinous Networks at a Macroscopic Scale via Sparse Experimentation

    PubMed Central

    Saxena, Anupam; Lipson, Hod; Valero-Cuevas, Francisco J.

    2012-01-01

    In systems and computational biology, much effort is devoted to functional identification of systems and networks at the molecular-or cellular scale. However, similarly important networks exist at anatomical scales such as the tendon network of human fingers: the complex array of collagen fibers that transmits and distributes muscle forces to finger joints. This network is critical to the versatility of the human hand, and its function has been debated since at least the 16th century. Here, we experimentally infer the structure (both topology and parameter values) of this network through sparse interrogation with force inputs. A population of models representing this structure co-evolves in simulation with a population of informative future force inputs via the predator-prey estimation-exploration algorithm. Model fitness depends on their ability to explain experimental data, while the fitness of future force inputs depends on causing maximal functional discrepancy among current models. We validate our approach by inferring two known synthetic Latex networks, and one anatomical tendon network harvested from a cadaver's middle finger. We find that functionally similar but structurally diverse models can exist within a narrow range of the training set and cross-validation errors. For the Latex networks, models with low training set error [<4%] and resembling the known network have the smallest cross-validation errors [∼5%]. The low training set [<4%] and cross validation [<7.2%] errors for models for the cadaveric specimen demonstrate what, to our knowledge, is the first experimental inference of the functional structure of complex anatomical networks. This work expands current bioinformatics inference approaches by demonstrating that sparse, yet informative interrogation of biological specimens holds significant computational advantages in accurate and efficient inference over random testing, or assuming model topology and only inferring parameters values. These

  8. Functional inference of complex anatomical tendinous networks at a macroscopic scale via sparse experimentation.

    PubMed

    Saxena, Anupam; Lipson, Hod; Valero-Cuevas, Francisco J

    2012-01-01

    In systems and computational biology, much effort is devoted to functional identification of systems and networks at the molecular-or cellular scale. However, similarly important networks exist at anatomical scales such as the tendon network of human fingers: the complex array of collagen fibers that transmits and distributes muscle forces to finger joints. This network is critical to the versatility of the human hand, and its function has been debated since at least the 16(th) century. Here, we experimentally infer the structure (both topology and parameter values) of this network through sparse interrogation with force inputs. A population of models representing this structure co-evolves in simulation with a population of informative future force inputs via the predator-prey estimation-exploration algorithm. Model fitness depends on their ability to explain experimental data, while the fitness of future force inputs depends on causing maximal functional discrepancy among current models. We validate our approach by inferring two known synthetic Latex networks, and one anatomical tendon network harvested from a cadaver's middle finger. We find that functionally similar but structurally diverse models can exist within a narrow range of the training set and cross-validation errors. For the Latex networks, models with low training set error [<4%] and resembling the known network have the smallest cross-validation errors [∼5%]. The low training set [<4%] and cross validation [<7.2%] errors for models for the cadaveric specimen demonstrate what, to our knowledge, is the first experimental inference of the functional structure of complex anatomical networks. This work expands current bioinformatics inference approaches by demonstrating that sparse, yet informative interrogation of biological specimens holds significant computational advantages in accurate and efficient inference over random testing, or assuming model topology and only inferring parameters values. These

  9. Metis: A Pure Metropolis Markov Chain Monte Carlo Bayesian Inference Library

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bates, Cameron Russell; Mckigney, Edward Allen

    The use of Bayesian inference in data analysis has become the standard for large scienti c experiments [1, 2]. The Monte Carlo Codes Group(XCP-3) at Los Alamos has developed a simple set of algorithms currently implemented in C++ and Python to easily perform at-prior Markov Chain Monte Carlo Bayesian inference with pure Metropolis sampling. These implementations are designed to be user friendly and extensible for customization based on speci c application requirements. This document describes the algorithmic choices made and presents two use cases.

  10. Cancer Evolution: Mathematical Models and Computational Inference

    PubMed Central

    Beerenwinkel, Niko; Schwarz, Roland F.; Gerstung, Moritz; Markowetz, Florian

    2015-01-01

    Cancer is a somatic evolutionary process characterized by the accumulation of mutations, which contribute to tumor growth, clinical progression, immune escape, and drug resistance development. Evolutionary theory can be used to analyze the dynamics of tumor cell populations and to make inference about the evolutionary history of a tumor from molecular data. We review recent approaches to modeling the evolution of cancer, including population dynamics models of tumor initiation and progression, phylogenetic methods to model the evolutionary relationship between tumor subclones, and probabilistic graphical models to describe dependencies among mutations. Evolutionary modeling helps to understand how tumors arise and will also play an increasingly important prognostic role in predicting disease progression and the outcome of medical interventions, such as targeted therapy. PMID:25293804

  11. The mass distribution of Population III stars

    NASA Astrophysics Data System (ADS)

    Fraser, M.; Casey, A. R.; Gilmore, G.; Heger, A.; Chan, C.

    2017-06-01

    Extremely metal-poor (EMP) stars are uniquely informative on the nature of massive Population III stars. Modulo a few elements that vary with stellar evolution, the present-day photospheric abundances observed in EMP stars are representative of their natal gas cloud composition. For this reason, the chemistry of EMP stars closely reflects the nucleosynthetic yields of supernovae from massive Population III stars. Here we collate detailed abundances of 53 EMP stars from the literature and infer the masses of their Population III progenitors. We fit a simple initial mass function (IMF) to a subset of 29 of the inferred Population III star masses, and find that the mass distribution is well represented by a power-law IMF with exponent α = 2.35^{+0.29}_{-0.24}. The inferred maximum progenitor mass for supernovae from massive Population III stars is M_{max} = 87^{+13}_{-33} M⊙, and we find no evidence in our sample for a contribution from stars with masses above ˜120 M⊙. The minimum mass is strongly consistent with the theoretical lower mass limit for Population III supernovae. We conclude that the IMF for massive Population III stars is consistent with the IMF of present-day massive stars and there may well have formed stars much below the supernova mass limit that could have survived to the present day.

  12. Stan: Statistical inference

    NASA Astrophysics Data System (ADS)

    Stan Development Team

    2018-01-01

    Stan facilitates statistical inference at the frontiers of applied statistics and provides both a modeling language for specifying complex statistical models and a library of statistical algorithms for computing inferences with those models. These components are exposed through interfaces in environments such as R, Python, and the command line.

  13. Inference from Samples of DNA Sequences Using a Two-Locus Model

    PubMed Central

    Griffiths, Robert C.

    2011-01-01

    Abstract Performing inference on contemporary samples of DNA sequence data is an important and challenging task. Computationally intensive methods such as importance sampling (IS) are attractive because they make full use of the available data, but in the presence of recombination the large state space of genealogies can be prohibitive. In this article, we make progress by developing an efficient IS proposal distribution for a two-locus model of sequence data. We show that the proposal developed here leads to much greater efficiency, outperforming existing IS methods that could be adapted to this model. Among several possible applications, the algorithm can be used to find maximum likelihood estimates for mutation and crossover rates, and to perform ancestral inference. We illustrate the method on previously reported sequence data covering two loci either side of the well-studied TAP2 recombination hotspot. The two loci are themselves largely non-recombining, so we obtain a gene tree at each locus and are able to infer in detail the effect of the hotspot on their joint ancestry. We summarize this joint ancestry by introducing the gene graph, a summary of the well-known ancestral recombination graph. PMID:21210733

  14. Road Traffic Anomaly Detection via Collaborative Path Inference from GPS Snippets

    PubMed Central

    Wang, Hongtao; Wen, Hui; Yi, Feng; Zhu, Hongsong; Sun, Limin

    2017-01-01

    Road traffic anomaly denotes a road segment that is anomalous in terms of traffic flow of vehicles. Detecting road traffic anomalies from GPS (Global Position System) snippets data is becoming critical in urban computing since they often suggest underlying events. However, the noisy and sparse nature of GPS snippets data have ushered multiple problems, which have prompted the detection of road traffic anomalies to be very challenging. To address these issues, we propose a two-stage solution which consists of two components: a Collaborative Path Inference (CPI) model and a Road Anomaly Test (RAT) model. CPI model performs path inference incorporating both static and dynamic features into a Conditional Random Field (CRF). Dynamic context features are learned collaboratively from large GPS snippets via a tensor decomposition technique. Then RAT calculates the anomalous degree for each road segment from the inferred fine-grained trajectories in given time intervals. We evaluated our method using a large scale real world dataset, which includes one-month GPS location data from more than eight thousand taxicabs in Beijing. The evaluation results show the advantages of our method beyond other baseline techniques. PMID:28282948

  15. Efficient Moment-Based Inference of Admixture Parameters and Sources of Gene Flow

    PubMed Central

    Levin, Alex; Reich, David; Patterson, Nick; Berger, Bonnie

    2013-01-01

    The recent explosion in available genetic data has led to significant advances in understanding the demographic histories of and relationships among human populations. It is still a challenge, however, to infer reliable parameter values for complicated models involving many populations. Here, we present MixMapper, an efficient, interactive method for constructing phylogenetic trees including admixture events using single nucleotide polymorphism (SNP) genotype data. MixMapper implements a novel two-phase approach to admixture inference using moment statistics, first building an unadmixed scaffold tree and then adding admixed populations by solving systems of equations that express allele frequency divergences in terms of mixture parameters. Importantly, all features of the model, including topology, sources of gene flow, branch lengths, and mixture proportions, are optimized automatically from the data and include estimates of statistical uncertainty. MixMapper also uses a new method to express branch lengths in easily interpretable drift units. We apply MixMapper to recently published data for Human Genome Diversity Cell Line Panel individuals genotyped on a SNP array designed especially for use in population genetics studies, obtaining confident results for 30 populations, 20 of them admixed. Notably, we confirm a signal of ancient admixture in European populations—including previously undetected admixture in Sardinians and Basques—involving a proportion of 20–40% ancient northern Eurasian ancestry. PMID:23709261

  16. Inferring probabilistic stellar rotation periods using Gaussian processes

    NASA Astrophysics Data System (ADS)

    Angus, Ruth; Morton, Timothy; Aigrain, Suzanne; Foreman-Mackey, Daniel; Rajpaul, Vinesh

    2018-02-01

    Variability in the light curves of spotted, rotating stars is often non-sinusoidal and quasi-periodic - spots move on the stellar surface and have finite lifetimes, causing stellar flux variations to slowly shift in phase. A strictly periodic sinusoid therefore cannot accurately model a rotationally modulated stellar light curve. Physical models of stellar surfaces have many drawbacks preventing effective inference, such as highly degenerate or high-dimensional parameter spaces. In this work, we test an appropriate effective model: a Gaussian Process with a quasi-periodic covariance kernel function. This highly flexible model allows sampling of the posterior probability density function of the periodic parameter, marginalizing over the other kernel hyperparameters using a Markov Chain Monte Carlo approach. To test the effectiveness of this method, we infer rotation periods from 333 simulated stellar light curves, demonstrating that the Gaussian process method produces periods that are more accurate than both a sine-fitting periodogram and an autocorrelation function method. We also demonstrate that it works well on real data, by inferring rotation periods for 275 Kepler stars with previously measured periods. We provide a table of rotation periods for these and many more, altogether 1102 Kepler objects of interest, and their posterior probability density function samples. Because this method delivers posterior probability density functions, it will enable hierarchical studies involving stellar rotation, particularly those involving population modelling, such as inferring stellar ages, obliquities in exoplanet systems, or characterizing star-planet interactions. The code used to implement this method is available online.

  17. Scalable Probabilistic Inference for Global Seismic Monitoring

    NASA Astrophysics Data System (ADS)

    Arora, N. S.; Dear, T.; Russell, S.

    2011-12-01

    We describe a probabilistic generative model for seismic events, their transmission through the earth, and their detection (or mis-detection) at seismic stations. We also describe an inference algorithm that constructs the most probable event bulletin explaining the observed set of detections. The model and inference are called NET-VISA (network processing vertically integrated seismic analysis) and is designed to replace the current automated network processing at the IDC, the SEL3 bulletin. Our results (attached table) demonstrate that NET-VISA significantly outperforms SEL3 by reducing the missed events from 30.3% down to 12.5%. The difference is even more dramatic for smaller magnitude events. NET-VISA has no difficulty in locating nuclear explosions as well. The attached figure demonstrates the location predicted by NET-VISA versus other bulletins for the second DPRK event. Further evaluation on dense regional networks demonstrates that NET-VISA finds many events missed in the LEB bulletin, which is produced by the human analysts. Large aftershock sequences, as produced by the 2004 December Sumatra earthquake and the 2011 March Tohoku earthquake, can pose a significant load for automated processing, often delaying the IDC bulletins by weeks or months. Indeed these sequences can overload the serial NET-VISA inference as well. We describe an enhancement to NET-VISA to make it multi-threaded, and hence take full advantage of the processing power of multi-core and -cpu machines. Our experiments show that the new inference algorithm is able to achieve 80% efficiency in parallel speedup.

  18. HIV populations are large and accumulate high genetic diversity in a nonlinear fashion.

    PubMed

    Maldarelli, Frank; Kearney, Mary; Palmer, Sarah; Stephens, Robert; Mican, JoAnn; Polis, Michael A; Davey, Richard T; Kovacs, Joseph; Shao, Wei; Rock-Kress, Diane; Metcalf, Julia A; Rehm, Catherine; Greer, Sarah E; Lucey, Daniel L; Danley, Kristen; Alter, Harvey; Mellors, John W; Coffin, John M

    2013-09-01

    HIV infection is characterized by rapid and error-prone viral replication resulting in genetically diverse virus populations. The rate of accumulation of diversity and the mechanisms involved are under intense study to provide useful information to understand immune evasion and the development of drug resistance. To characterize the development of viral diversity after infection, we carried out an in-depth analysis of single genome sequences of HIV pro-pol to assess diversity and divergence and to estimate replicating population sizes in a group of treatment-naive HIV-infected individuals sampled at single (n = 22) or multiple, longitudinal (n = 11) time points. Analysis of single genome sequences revealed nonlinear accumulation of sequence diversity during the course of infection. Diversity accumulated in recently infected individuals at rates 30-fold higher than in patients with chronic infection. Accumulation of synonymous changes accounted for most of the diversity during chronic infection. Accumulation of diversity resulted in population shifts, but the rates of change were low relative to estimated replication cycle times, consistent with relatively large population sizes. Analysis of changes in allele frequencies revealed effective population sizes that are substantially higher than previous estimates of approximately 1,000 infectious particles/infected individual. Taken together, these observations indicate that HIV populations are large, diverse, and slow to change in chronic infection and that the emergence of new mutations, including drug resistance mutations, is governed by both selection forces and drift.

  19. Is awareness necessary for true inference?

    PubMed

    Leo, Peter D; Greene, Anthony J

    2008-09-01

    In transitive inference, participants learn a set of context-dependent discriminations that can be organized into a hierarchy that supports inference. Several studies show that inference occurs with or without task awareness. However, some studies assert that without awareness, performance is attributable to pseudoinference. By this account, inference-like performance is achieved by differential stimulus weighting according to the stimuli's proximity to the end items of the hierarchy. We implement an inference task that cannot be based on differential stimulus weighting. The design itself rules out pseudoinference strategies. Success on the task without evidence of deliberative strategies would therefore suggest that true inference can be achieved implicitly. We found that accurate performance on the inference task was not dependent on explicit awareness. The finding is consistent with a growing body of evidence that indicates that forms of learning and memory supporting inference and flexibility do not necessarily depend on task awareness.

  20. On the Inference of Functional Circadian Networks Using Granger Causality

    PubMed Central

    Pourzanjani, Arya; Herzog, Erik D.; Petzold, Linda R.

    2015-01-01

    Being able to infer one way direct connections in an oscillatory network such as the suprachiastmatic nucleus (SCN) of the mammalian brain using time series data is difficult but crucial to understanding network dynamics. Although techniques have been developed for inferring networks from time series data, there have been no attempts to adapt these techniques to infer directional connections in oscillatory time series, while accurately distinguishing between direct and indirect connections. In this paper an adaptation of Granger Causality is proposed that allows for inference of circadian networks and oscillatory networks in general called Adaptive Frequency Granger Causality (AFGC). Additionally, an extension of this method is proposed to infer networks with large numbers of cells called LASSO AFGC. The method was validated using simulated data from several different networks. For the smaller networks the method was able to identify all one way direct connections without identifying connections that were not present. For larger networks of up to twenty cells the method shows excellent performance in identifying true and false connections; this is quantified by an area-under-the-curve (AUC) 96.88%. We note that this method like other Granger Causality-based methods, is based on the detection of high frequency signals propagating between cell traces. Thus it requires a relatively high sampling rate and a network that can propagate high frequency signals. PMID:26413748

  1. Inferring Geographic Coordinates of Origin for Europeans Using Small Panels of Ancestry Informative Markers

    PubMed Central

    Paschou, Peristera

    2010-01-01

    Recent large-scale studies of European populations have demonstrated the existence of population genetic structure within Europe and the potential to accurately infer individual ancestry when information from hundreds of thousands of genetic markers is used. In fact, when genomewide genetic variation of European populations is projected down to a two-dimensional Principal Components Analysis plot, a surprising correlation with actual geographic coordinates of self-reported ancestry has been reported. This substructure can hamper the search of susceptibility genes for common complex disorders leading to spurious correlations. The identification of genetic markers that can correct for population stratification becomes therefore of paramount importance. Analyzing 1,200 individuals from 11 populations genotyped for more than 500,000 SNPs (Population Reference Sample), we present a systematic exploration of the extent to which geographic coordinates of origin within Europe can be predicted, with small panels of SNPs. Markers are selected to correlate with the top principal components of the dataset, as we have previously demonstrated. Performing thorough cross-validation experiments we show that it is indeed possible to predict individual ancestry within Europe down to a few hundred kilometers from actual individual origin, using information from carefully selected panels of 500 or 1,000 SNPs. Furthermore, we show that these panels can be used to correctly assign the HapMap Phase 3 European populations to their geographic origin. The SNPs that we propose can prove extremely useful in a variety of different settings, such as stratification correction or genetic ancestry testing, and the study of the history of European populations. PMID:20805874

  2. Population Genomics Reveals Seahorses (Hippocampus erectus) of the Western Mid-Atlantic Coast to Be Residents Rather than Vagrants

    PubMed Central

    Boehm, J. T.; Waldman, John; Robinson, John D.; Hickerson, Michael J.

    2015-01-01

    Understanding population structure and areas of demographic persistence and transients is critical for effective species management. However, direct observational evidence to address the geographic scale and delineation of ephemeral or persistent populations for many marine fishes is limited. The Lined seahorse (Hippocampus erectus) can be commonly found in three western Atlantic zoogeographic provinces, though inhabitants of the temperate northern Virginia Province are often considered tropical vagrants that only arrive during warm seasons from the southern provinces and perish as temperatures decline. Although genetics can locate regions of historical population persistence and isolation, previous evidence of Virginia Province persistence is only provisional due to limited genetic sampling (i.e., mitochondrial DNA and five nuclear loci). To test alternative hypotheses of historical persistence versus the ephemerality of a northern Virginia Province population we used a RADseq generated dataset consisting of 11,708 single nucleotide polymorphisms (SNP) sampled from individuals collected from the eastern Gulf of Mexico to Long Island, NY. Concordant results from genomic analyses all infer three genetically divergent subpopulations, and strongly support Virginia Province inhabitants as a genetically diverged and a historically persistent ancestral gene pool. These results suggest that individuals that emerge in coastal areas during the warm season can be considered “local” and supports offshore migration during the colder months. This research demonstrates how a large number of genes sampled across a geographical range can capture the diversity of coalescent histories (across loci) while inferring population history. Moreover, these results clearly demonstrate the utility of population genomic data to infer peripheral subpopulation persistence in difficult-to-observe species. PMID:25629166

  3. Population genomics reveals seahorses (Hippocampus erectus) of the western mid-Atlantic coast to be residents rather than vagrants.

    PubMed

    Boehm, J T; Waldman, John; Robinson, John D; Hickerson, Michael J

    2015-01-01

    Understanding population structure and areas of demographic persistence and transients is critical for effective species management. However, direct observational evidence to address the geographic scale and delineation of ephemeral or persistent populations for many marine fishes is limited. The Lined seahorse (Hippocampus erectus) can be commonly found in three western Atlantic zoogeographic provinces, though inhabitants of the temperate northern Virginia Province are often considered tropical vagrants that only arrive during warm seasons from the southern provinces and perish as temperatures decline. Although genetics can locate regions of historical population persistence and isolation, previous evidence of Virginia Province persistence is only provisional due to limited genetic sampling (i.e., mitochondrial DNA and five nuclear loci). To test alternative hypotheses of historical persistence versus the ephemerality of a northern Virginia Province population we used a RADseq generated dataset consisting of 11,708 single nucleotide polymorphisms (SNP) sampled from individuals collected from the eastern Gulf of Mexico to Long Island, NY. Concordant results from genomic analyses all infer three genetically divergent subpopulations, and strongly support Virginia Province inhabitants as a genetically diverged and a historically persistent ancestral gene pool. These results suggest that individuals that emerge in coastal areas during the warm season can be considered "local" and supports offshore migration during the colder months. This research demonstrates how a large number of genes sampled across a geographical range can capture the diversity of coalescent histories (across loci) while inferring population history. Moreover, these results clearly demonstrate the utility of population genomic data to infer peripheral subpopulation persistence in difficult-to-observe species.

  4. King penguin demography since the last glaciation inferred from genome-wide data.

    PubMed

    Trucchi, Emiliano; Gratton, Paolo; Whittington, Jason D; Cristofari, Robin; Le Maho, Yvon; Stenseth, Nils Chr; Le Bohec, Céline

    2014-07-22

    How natural climate cycles, such as past glacial/interglacial patterns, have shaped species distributions at the high-latitude regions of the Southern Hemisphere is still largely unclear. Here, we show how the post-glacial warming following the Last Glacial Maximum (ca 18 000 years ago), allowed the (re)colonization of the fragmented sub-Antarctic habitat by an upper-level marine predator, the king penguin Aptenodytes patagonicus. Using restriction site-associated DNA sequencing and standard mitochondrial data, we tested the behaviour of subsets of anonymous nuclear loci in inferring past demography through coalescent-based and allele frequency spectrum analyses. Our results show that the king penguin population breeding on Crozet archipelago steeply increased in size, closely following the Holocene warming recorded in the Epica Dome C ice core. The following population growth can be explained by a threshold model in which the ecological requirements of this species (year-round ice-free habitat for breeding and access to a major source of food such as the Antarctic Polar Front) were met on Crozet soon after the Pleistocene/Holocene climatic transition. © 2014 The Author(s) Published by the Royal Society. All rights reserved.

  5. Comparing Inference Approaches for RD Designs: A Reexamination of the Effect of Head Start on Child Mortality

    ERIC Educational Resources Information Center

    Cattaneo, Matias D.; Titiunik, Rocío; Vazquez-Bare, Gonzalo

    2017-01-01

    The regression discontinuity (RD) design is a popular quasi-experimental design for causal inference and policy evaluation. The most common inference approaches in RD designs employ "flexible" parametric and nonparametric local polynomial methods, which rely on extrapolation and large-sample approximations of conditional expectations…

  6. Populational equilibrium through exosome-mediated Wnt signaling in tumor progression of diffuse large B-cell lymphoma.

    PubMed

    Koch, Raphael; Demant, Martin; Aung, Thiha; Diering, Nina; Cicholas, Anna; Chapuy, Bjoern; Wenzel, Dirk; Lahmann, Marlen; Güntsch, Annemarie; Kiecke, Christina; Becker, Sabrina; Hupfeld, Timo; Venkataramani, Vivek; Ziepert, Marita; Opitz, Lennart; Klapper, Wolfram; Trümper, Lorenz; Wulf, Gerald G

    2014-04-03

    Tumors are composed of phenotypically heterogeneous cell populations. The nongenomic mechanisms underlying transitions and interactions between cell populations are largely unknown. Here, we show that diffuse large B-cell lymphomas possess a self-organized infrastructure comprising side population (SP) and non-SP cells, where transitions between clonogenic states are modulated by exosome-mediated Wnt signaling. DNA methylation modulated SP-non-SP transitions and was correlated with the reciprocal expressions of Wnt signaling pathway agonist Wnt3a in SP cells and the antagonist secreted frizzled-related protein 4 in non-SP cells. Lymphoma SP cells exhibited autonomous clonogenicity and exported Wnt3a via exosomes to neighboring cells, thus modulating population equilibrium in the tumor.

  7. The NIFTy way of Bayesian signal inference

    NASA Astrophysics Data System (ADS)

    Selig, Marco

    2014-12-01

    We introduce NIFTy, "Numerical Information Field Theory", a software package for the development of Bayesian signal inference algorithms that operate independently from any underlying spatial grid and its resolution. A large number of Bayesian and Maximum Entropy methods for 1D signal reconstruction, 2D imaging, as well as 3D tomography, appear formally similar, but one often finds individualized implementations that are neither flexible nor easily transferable. Signal inference in the framework of NIFTy can be done in an abstract way, such that algorithms, prototyped in 1D, can be applied to real world problems in higher-dimensional settings. NIFTy as a versatile library is applicable and already has been applied in 1D, 2D, 3D and spherical settings. A recent application is the D3PO algorithm targeting the non-trivial task of denoising, deconvolving, and decomposing photon observations in high energy astronomy.

  8. Children's Category-Based Inferences Affect Classification

    ERIC Educational Resources Information Center

    Ross, Brian H.; Gelman, Susan A.; Rosengren, Karl S.

    2005-01-01

    Children learn many new categories and make inferences about these categories. Much work has examined how children make inferences on the basis of category knowledge. However, inferences may also affect what is learned about a category. Four experiments examine whether category-based inferences during category learning influence category knowledge…

  9. Using Approximate Bayesian Computation to infer sex ratios from acoustic data.

    PubMed

    Lehnen, Lisa; Schorcht, Wigbert; Karst, Inken; Biedermann, Martin; Kerth, Gerald; Puechmaille, Sebastien J

    2018-01-01

    Population sex ratios are of high ecological relevance, but are challenging to determine in species lacking conspicuous external cues indicating their sex. Acoustic sexing is an option if vocalizations differ between sexes, but is precluded by overlapping distributions of the values of male and female vocalizations in many species. A method allowing the inference of sex ratios despite such an overlap will therefore greatly increase the information extractable from acoustic data. To meet this demand, we developed a novel approach using Approximate Bayesian Computation (ABC) to infer the sex ratio of populations from acoustic data. Additionally, parameters characterizing the male and female distribution of acoustic values (mean and standard deviation) are inferred. This information is then used to probabilistically assign a sex to a single acoustic signal. We furthermore develop a simpler means of sex ratio estimation based on the exclusion of calls from the overlap zone. Applying our methods to simulated data demonstrates that sex ratio and acoustic parameter characteristics of males and females are reliably inferred by the ABC approach. Applying both the ABC and the exclusion method to empirical datasets (echolocation calls recorded in colonies of lesser horseshoe bats, Rhinolophus hipposideros) provides similar sex ratios as molecular sexing. Our methods aim to facilitate evidence-based conservation, and to benefit scientists investigating ecological or conservation questions related to sex- or group specific behaviour across a wide range of organisms emitting acoustic signals. The developed methodology is non-invasive, low-cost and time-efficient, thus allowing the study of many sites and individuals. We provide an R-script for the easy application of the method and discuss potential future extensions and fields of applications. The script can be easily adapted to account for numerous biological systems by adjusting the type and number of groups to be

  10. Approximation of epidemic models by diffusion processes and their statistical inference.

    PubMed

    Guy, Romain; Larédo, Catherine; Vergu, Elisabeta

    2015-02-01

    Multidimensional continuous-time Markov jump processes [Formula: see text] on [Formula: see text] form a usual set-up for modeling [Formula: see text]-like epidemics. However, when facing incomplete epidemic data, inference based on [Formula: see text] is not easy to be achieved. Here, we start building a new framework for the estimation of key parameters of epidemic models based on statistics of diffusion processes approximating [Formula: see text]. First, previous results on the approximation of density-dependent [Formula: see text]-like models by diffusion processes with small diffusion coefficient [Formula: see text], where [Formula: see text] is the population size, are generalized to non-autonomous systems. Second, our previous inference results on discretely observed diffusion processes with small diffusion coefficient are extended to time-dependent diffusions. Consistent and asymptotically Gaussian estimates are obtained for a fixed number [Formula: see text] of observations, which corresponds to the epidemic context, and for [Formula: see text]. A correction term, which yields better estimates non asymptotically, is also included. Finally, performances and robustness of our estimators with respect to various parameters such as [Formula: see text] (the basic reproduction number), [Formula: see text], [Formula: see text] are investigated on simulations. Two models, [Formula: see text] and [Formula: see text], corresponding to single and recurrent outbreaks, respectively, are used to simulate data. The findings indicate that our estimators have good asymptotic properties and behave noticeably well for realistic numbers of observations and population sizes. This study lays the foundations of a generic inference method currently under extension to incompletely observed epidemic data. Indeed, contrary to the majority of current inference techniques for partially observed processes, which necessitates computer intensive simulations, our method being mostly an

  11. Sizing the star cluster population of the Large Magellanic Cloud

    NASA Astrophysics Data System (ADS)

    Piatti, Andrés E.

    2018-04-01

    The number of star clusters that populate the Large Magellanic Cloud (LMC) at deprojected distances <4 deg has been recently found to be nearly double the known size of the system. Because of the unprecedented consequences of this outcome in our knowledge of the LMC cluster formation and dissolution histories, we closely revisited such a compilation of objects and found that only ˜35 per cent of the previously known catalogued clusters have been included. The remaining entries are likely related to stellar overdensities of the LMC composite star field, because there is a remarkable enhancement of objects with assigned ages older than log(t yr-1) ˜ 9.4, which contrasts with the existence of the LMC cluster age gap; the assumption of a cluster formation rate similar to that of the LMC star field does not help to conciliate so large amount of clusters either; and nearly 50 per cent of them come from cluster search procedures known to produce more than 90 per cent of false detections. The lack of further analyses to confirm the physical reality as genuine star clusters of the identified overdensities also glooms those results. We support that the actual size of the LMC main body cluster population is close to that previously known.

  12. Assessing population genetic structure via the maximisation of genetic distance

    PubMed Central

    2009-01-01

    Background The inference of the hidden structure of a population is an essential issue in population genetics. Recently, several methods have been proposed to infer population structure in population genetics. Methods In this study, a new method to infer the number of clusters and to assign individuals to the inferred populations is proposed. This approach does not make any assumption on Hardy-Weinberg and linkage equilibrium. The implemented criterion is the maximisation (via a simulated annealing algorithm) of the averaged genetic distance between a predefined number of clusters. The performance of this method is compared with two Bayesian approaches: STRUCTURE and BAPS, using simulated data and also a real human data set. Results The simulations show that with a reduced number of markers, BAPS overestimates the number of clusters and presents a reduced proportion of correct groupings. The accuracy of the new method is approximately the same as for STRUCTURE. Also, in Hardy-Weinberg and linkage disequilibrium cases, BAPS performs incorrectly. In these situations, STRUCTURE and the new method show an equivalent behaviour with respect to the number of inferred clusters, although the proportion of correct groupings is slightly better with the new method. Re-establishing equilibrium with the randomisation procedures improves the precision of the Bayesian approaches. All methods have a good precision for FST ≥ 0.03, but only STRUCTURE estimates the correct number of clusters for FST as low as 0.01. In situations with a high number of clusters or a more complex population structure, MGD performs better than STRUCTURE and BAPS. The results for a human data set analysed with the new method are congruent with the geographical regions previously found. Conclusion This new method used to infer the hidden structure in a population, based on the maximisation of the genetic distance and not taking into consideration any assumption about Hardy-Weinberg and linkage equilibrium

  13. Recombination gives a new insight in the effective population size and the history of the old world human populations.

    PubMed

    Melé, Marta; Javed, Asif; Pybus, Marc; Zalloua, Pierre; Haber, Marc; Comas, David; Netea, Mihai G; Balanovsky, Oleg; Balanovska, Elena; Jin, Li; Yang, Yajun; Pitchappan, R M; Arunkumar, G; Parida, Laxmi; Calafell, Francesc; Bertranpetit, Jaume

    2012-01-01

    The information left by recombination in our genomes can be used to make inferences on our recent evolutionary history. Specifically, the number of past recombination events in a population sample is a function of its effective population size (Ne). We have applied a method, Identifying Recombination in Sequences (IRiS), to detect specific past recombination events in 30 Old World populations to infer their Ne. We have found that sub-Saharan African populations have an Ne that is approximately four times greater than those of non-African populations and that outside of Africa, South Asian populations had the largest Ne. We also observe that the patterns of recombinational diversity of these populations correlate with distance out of Africa if that distance is measured along a path crossing South Arabia. No such correlation is found through a Sinai route, suggesting that anatomically modern humans first left Africa through the Bab-el-Mandeb strait rather than through present Egypt.

  14. MetaPIGA v2.0: maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm and other stochastic heuristics.

    PubMed

    Helaers, Raphaël; Milinkovitch, Michel C

    2010-07-15

    The development, in the last decade, of stochastic heuristics implemented in robust application softwares has made large phylogeny inference a key step in most comparative studies involving molecular sequences. Still, the choice of a phylogeny inference software is often dictated by a combination of parameters not related to the raw performance of the implemented algorithm(s) but rather by practical issues such as ergonomics and/or the availability of specific functionalities. Here, we present MetaPIGA v2.0, a robust implementation of several stochastic heuristics for large phylogeny inference (under maximum likelihood), including a Simulated Annealing algorithm, a classical Genetic Algorithm, and the Metapopulation Genetic Algorithm (metaGA) together with complex substitution models, discrete Gamma rate heterogeneity, and the possibility to partition data. MetaPIGA v2.0 also implements the Likelihood Ratio Test, the Akaike Information Criterion, and the Bayesian Information Criterion for automated selection of substitution models that best fit the data. Heuristics and substitution models are highly customizable through manual batch files and command line processing. However, MetaPIGA v2.0 also offers an extensive graphical user interface for parameters setting, generating and running batch files, following run progress, and manipulating result trees. MetaPIGA v2.0 uses standard formats for data sets and trees, is platform independent, runs in 32 and 64-bits systems, and takes advantage of multiprocessor and multicore computers. The metaGA resolves the major problem inherent to classical Genetic Algorithms by maintaining high inter-population variation even under strong intra-population selection. Implementation of the metaGA together with additional stochastic heuristics into a single software will allow rigorous optimization of each heuristic as well as a meaningful comparison of performances among these algorithms. MetaPIGA v2.0 gives access both to high

  15. Is there a hierarchy of social inferences? The likelihood and speed of inferring intentionality, mind, and personality.

    PubMed

    Malle, Bertram F; Holbrook, Jess

    2012-04-01

    People interpret behavior by making inferences about agents' intentionality, mind, and personality. Past research studied such inferences 1 at a time; in real life, people make these inferences simultaneously. The present studies therefore examined whether 4 major inferences (intentionality, desire, belief, and personality), elicited simultaneously in response to an observed behavior, might be ordered in a hierarchy of likelihood and speed. To achieve generalizability, the studies included a wide range of stimulus behaviors, presented them verbally and as dynamic videos, and assessed inferences both in a retrieval paradigm (measuring the likelihood and speed of accessing inferences immediately after they were made) and in an online processing paradigm (measuring the speed of forming inferences during behavior observation). Five studies provide evidence for a hierarchy of social inferences-from intentionality and desire to belief to personality-that is stable across verbal and visual presentations and that parallels the order found in developmental and primate research. (c) 2012 APA, all rights reserved.

  16. Large-scale control site selection for population monitoring: an example assessing Sage-grouse trends

    USGS Publications Warehouse

    Fedy, Bradley C.; O'Donnell, Michael; Bowen, Zachary H.

    2015-01-01

    Human impacts on wildlife populations are widespread and prolific and understanding wildlife responses to human impacts is a fundamental component of wildlife management. The first step to understanding wildlife responses is the documentation of changes in wildlife population parameters, such as population size. Meaningful assessment of population changes in potentially impacted sites requires the establishment of monitoring at similar, nonimpacted, control sites. However, it is often difficult to identify appropriate control sites in wildlife populations. We demonstrated use of Geographic Information System (GIS) data across large spatial scales to select biologically relevant control sites for population monitoring. Greater sage-grouse (Centrocercus urophasianus; hearafter, sage-grouse) are negatively affected by energy development, and monitoring of sage-grouse population within energy development areas is necessary to detect population-level responses. Weused population data (1995–2012) from an energy development area in Wyoming, USA, the Atlantic Rim Project Area (ARPA), and GIS data to identify control sites that were not impacted by energy development for population monitoring. Control sites were surrounded by similar habitat and were within similar climate areas to the ARPA. We developed nonlinear trend models for both the ARPA and control sites and compared long-term trends from the 2 areas. We found little difference between the ARPA and control sites trends over time. This research demonstrated an approach for control site selection across large landscapes and can be used as a template for similar impact-monitoring studies. It is important to note that identification of changes in population parameters between control and treatment sites is only the first step in understanding the mechanisms that underlie those changes. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.

  17. A Hierarchical Poisson Log-Normal Model for Network Inference from RNA Sequencing Data

    PubMed Central

    Gallopin, Mélina; Rau, Andrea; Jaffrézic, Florence

    2013-01-01

    Gene network inference from transcriptomic data is an important methodological challenge and a key aspect of systems biology. Although several methods have been proposed to infer networks from microarray data, there is a need for inference methods able to model RNA-seq data, which are count-based and highly variable. In this work we propose a hierarchical Poisson log-normal model with a Lasso penalty to infer gene networks from RNA-seq data; this model has the advantage of directly modelling discrete data and accounting for inter-sample variance larger than the sample mean. Using real microRNA-seq data from breast cancer tumors and simulations, we compare this method to a regularized Gaussian graphical model on log-transformed data, and a Poisson log-linear graphical model with a Lasso penalty on power-transformed data. For data simulated with large inter-sample dispersion, the proposed model performs better than the other methods in terms of sensitivity, specificity and area under the ROC curve. These results show the necessity of methods specifically designed for gene network inference from RNA-seq data. PMID:24147011

  18. Causal Inference and Explaining Away in a Spiking Network

    PubMed Central

    Moreno-Bote, Rubén; Drugowitsch, Jan

    2015-01-01

    While the brain uses spiking neurons for communication, theoretical research on brain computations has mostly focused on non-spiking networks. The nature of spike-based algorithms that achieve complex computations, such as object probabilistic inference, is largely unknown. Here we demonstrate that a family of high-dimensional quadratic optimization problems with non-negativity constraints can be solved exactly and efficiently by a network of spiking neurons. The network naturally imposes the non-negativity of causal contributions that is fundamental to causal inference, and uses simple operations, such as linear synapses with realistic time constants, and neural spike generation and reset non-linearities. The network infers the set of most likely causes from an observation using explaining away, which is dynamically implemented by spike-based, tuned inhibition. The algorithm performs remarkably well even when the network intrinsically generates variable spike trains, the timing of spikes is scrambled by external sources of noise, or the network is mistuned. This type of network might underlie tasks such as odor identification and classification. PMID:26621426

  19. Causal Inference and Explaining Away in a Spiking Network.

    PubMed

    Moreno-Bote, Rubén; Drugowitsch, Jan

    2015-12-01

    While the brain uses spiking neurons for communication, theoretical research on brain computations has mostly focused on non-spiking networks. The nature of spike-based algorithms that achieve complex computations, such as object probabilistic inference, is largely unknown. Here we demonstrate that a family of high-dimensional quadratic optimization problems with non-negativity constraints can be solved exactly and efficiently by a network of spiking neurons. The network naturally imposes the non-negativity of causal contributions that is fundamental to causal inference, and uses simple operations, such as linear synapses with realistic time constants, and neural spike generation and reset non-linearities. The network infers the set of most likely causes from an observation using explaining away, which is dynamically implemented by spike-based, tuned inhibition. The algorithm performs remarkably well even when the network intrinsically generates variable spike trains, the timing of spikes is scrambled by external sources of noise, or the network is mistuned. This type of network might underlie tasks such as odor identification and classification.

  20. Missing data imputation and haplotype phase inference for genome-wide association studies

    PubMed Central

    Browning, Sharon R.

    2009-01-01

    Imputation of missing data and the use of haplotype-based association tests can improve the power of genome-wide association studies (GWAS). In this article, I review methods for haplotype inference and missing data imputation, and discuss their application to GWAS. I discuss common features of the best algorithms for haplotype phase inference and missing data imputation in large-scale data sets, as well as some important differences between classes of methods, and highlight the methods that provide the highest accuracy and fastest computational performance. PMID:18850115

  1. Smoothed Particle Inference Analysis of SNR RCW 103

    NASA Astrophysics Data System (ADS)

    Frank, Kari A.; Burrows, David N.; Dwarkadas, Vikram

    2016-04-01

    We present preliminary results of applying a novel analysis method, Smoothed Particle Inference (SPI), to an XMM-Newton observation of SNR RCW 103. SPI is a Bayesian modeling process that fits a population of gas blobs ("smoothed particles") such that their superposed emission reproduces the observed spatial and spectral distribution of photons. Emission-weighted distributions of plasma properties, such as abundances and temperatures, are then extracted from the properties of the individual blobs. This technique has important advantages over analysis techniques which implicitly assume that remnants are two-dimensional objects in which each line of sight encompasses a single plasma. By contrast, SPI allows superposition of as many blobs of plasma as are needed to match the spectrum observed in each direction, without the need to bin the data spatially. This RCW 103 analysis is part of a pilot study for the larger SPIES (Smoothed Particle Inference Exploration of SNRs) project, in which SPI will be applied to a sample of 12 bright SNRs.

  2. Mapping the social network: tracking lice in a wild primate (Microcebus rufus) population to infer social contacts and vector potential

    PubMed Central

    2012-01-01

    Background Studies of host-parasite interactions have the potential to provide insights into the ecology of both organisms involved. We monitored the movement of sucking lice (Lemurpediculus verruculosus), parasites that require direct host-host contact to be transferred, in their host population of wild mouse lemurs (Microcebus rufus). These lemurs live in the rainforests of Madagascar, are small (40 g), arboreal, nocturnal, solitary foraging primates for which data on population-wide interactions are difficult to obtain. We developed a simple, cost effective method exploiting the intimate relationship between louse and lemur, whereby individual lice were marked, without removal from their host, with an individualized code, and tracked throughout the lemur population. We then tested the hypotheses that 1) the frequency of louse transfers, and thus interactions, would decrease with increasing distance between paired individual lemurs; 2) due to host polygynandry, social interactions and hence louse transfers would increase during the onset of the breeding season; and 3) individual mouse lemurs would vary in their contributions to the spread of lice. Results We show that louse transfers involved 43.75% of the studied lemur population, exclusively males. Louse transfers peaked during the breeding season, perhaps due to increased social interactions between lemurs. Although trap-based individual lemur ranging patterns are restricted, louse transfer rate does not correlate with the distance between lemur trapping locales, indicating wider host ranging behavior and a greater risk of rapid population-wide pathogen transmission than predicted by standard trapping data alone. Furthermore, relatively few lemur individuals contributed disproportionately to the rapid spread of lice throughout the population. Conclusions Using a simple method, we were able to visualize exchanges of lice in a population of cryptic wild primates. This method not only provided insight into the

  3. Inference on cancer screening exam accuracy using population-level administrative data.

    PubMed

    Jiang, H; Brown, P E; Walter, S D

    2016-01-15

    This paper develops a model for cancer screening and cancer incidence data, accommodating the partially unobserved disease status, clustered data structures, general covariate effects, and dependence between exams. The true unobserved cancer and detection status of screening participants are treated as latent variables, and a Markov Chain Monte Carlo algorithm is used to estimate the Bayesian posterior distributions of the diagnostic error rates and disease prevalence. We show how the Bayesian approach can be used to draw inferences about screening exam properties and disease prevalence while allowing for the possibility of conditional dependence between two exams. The techniques are applied to the estimation of the diagnostic accuracy of mammography and clinical breast examination using data from the Ontario Breast Screening Program in Canada. Copyright © 2015 John Wiley & Sons, Ltd.

  4. Distinguishing between statistical significance and practical/clinical meaningfulness using statistical inference.

    PubMed

    Wilkinson, Michael

    2014-03-01

    Decisions about support for predictions of theories in light of data are made using statistical inference. The dominant approach in sport and exercise science is the Neyman-Pearson (N-P) significance-testing approach. When applied correctly it provides a reliable procedure for making dichotomous decisions for accepting or rejecting zero-effect null hypotheses with known and controlled long-run error rates. Type I and type II error rates must be specified in advance and the latter controlled by conducting an a priori sample size calculation. The N-P approach does not provide the probability of hypotheses or indicate the strength of support for hypotheses in light of data, yet many scientists believe it does. Outcomes of analyses allow conclusions only about the existence of non-zero effects, and provide no information about the likely size of true effects or their practical/clinical value. Bayesian inference can show how much support data provide for different hypotheses, and how personal convictions should be altered in light of data, but the approach is complicated by formulating probability distributions about prior subjective estimates of population effects. A pragmatic solution is magnitude-based inference, which allows scientists to estimate the true magnitude of population effects and how likely they are to exceed an effect magnitude of practical/clinical importance, thereby integrating elements of subjective Bayesian-style thinking. While this approach is gaining acceptance, progress might be hastened if scientists appreciate the shortcomings of traditional N-P null hypothesis significance testing.

  5. The Effects of Population Size Histories on Estimates of Selection Coefficients from Time-Series Genetic Data

    PubMed Central

    Jewett, Ethan M.; Steinrücken, Matthias; Song, Yun S.

    2016-01-01

    Many approaches have been developed for inferring selection coefficients from time series data while accounting for genetic drift. These approaches have been motivated by the intuition that properly accounting for the population size history can significantly improve estimates of selective strengths. However, the improvement in inference accuracy that can be attained by modeling drift has not been characterized. Here, by comparing maximum likelihood estimates of selection coefficients that account for the true population size history with estimates that ignore drift by assuming allele frequencies evolve deterministically in a population of infinite size, we address the following questions: how much can modeling the population size history improve estimates of selection coefficients? How much can mis-inferred population sizes hurt inferences of selection coefficients? We conduct our analysis under the discrete Wright–Fisher model by deriving the exact probability of an allele frequency trajectory in a population of time-varying size and we replicate our results under the diffusion model. For both models, we find that ignoring drift leads to estimates of selection coefficients that are nearly as accurate as estimates that account for the true population history, even when population sizes are small and drift is high. This result is of interest because inference methods that ignore drift are widely used in evolutionary studies and can be many orders of magnitude faster than methods that account for population sizes. PMID:27550904

  6. Inferring ontology graph structures using OWL reasoning.

    PubMed

    Rodríguez-García, Miguel Ángel; Hoehndorf, Robert

    2018-01-05

    Ontologies are representations of a conceptualization of a domain. Traditionally, ontologies in biology were represented as directed acyclic graphs (DAG) which represent the backbone taxonomy and additional relations between classes. These graphs are widely exploited for data analysis in the form of ontology enrichment or computation of semantic similarity. More recently, ontologies are developed in a formal language such as the Web Ontology Language (OWL) and consist of a set of axioms through which classes are defined or constrained. While the taxonomy of an ontology can be inferred directly from the axioms of an ontology as one of the standard OWL reasoning tasks, creating general graph structures from OWL ontologies that exploit the ontologies' semantic content remains a challenge. We developed a method to transform ontologies into graphs using an automated reasoner while taking into account all relations between classes. Searching for (existential) patterns in the deductive closure of ontologies, we can identify relations between classes that are implied but not asserted and generate graph structures that encode for a large part of the ontologies' semantic content. We demonstrate the advantages of our method by applying it to inference of protein-protein interactions through semantic similarity over the Gene Ontology and demonstrate that performance is increased when graph structures are inferred using deductive inference according to our method. Our software and experiment results are available at http://github.com/bio-ontology-research-group/Onto2Graph . Onto2Graph is a method to generate graph structures from OWL ontologies using automated reasoning. The resulting graphs can be used for improved ontology visualization and ontology-based data analysis.

  7. Network inference using informative priors.

    PubMed

    Mukherjee, Sach; Speed, Terence P

    2008-09-23

    Recent years have seen much interest in the study of systems characterized by multiple interacting components. A class of statistical models called graphical models, in which graphs are used to represent probabilistic relationships between variables, provides a framework for formal inference regarding such systems. In many settings, the object of inference is the network structure itself. This problem of "network inference" is well known to be a challenging one. However, in scientific settings there is very often existing information regarding network connectivity. A natural idea then is to take account of such information during inference. This article addresses the question of incorporating prior information into network inference. We focus on directed models called Bayesian networks, and use Markov chain Monte Carlo to draw samples from posterior distributions over network structures. We introduce prior distributions on graphs capable of capturing information regarding network features including edges, classes of edges, degree distributions, and sparsity. We illustrate our approach in the context of systems biology, applying our methods to network inference in cancer signaling.

  8. The Large Impact Process Inferred from the Geology of Lunar Multiring Basins

    NASA Technical Reports Server (NTRS)

    Spudis, Paul D.

    1994-01-01

    The study of the geology of multiring impact basins on the Moon over the past ten years has given us a rudimentary understanding of how these large structures have formed and evolved on the Moon and other bodies. Two-ring basins on the Moon begin to form at diameters of about 300 km; the transition diameter at which more than two rings appear is uncertain, but it appears to be between 400 and 500 km in diameter. Inner rings tend to be made up of clusters or aligned segments of massifs and are arranged into a crudely concentric pattern; scarp-like elements may or may not be present. Outer rings are much more scarp-like and massifs are rare to absent. Basins display textured deposits, interpreted as ejecta, extending roughly an apparent basin radius exterior to the main topographic rim. Ejecta may have various morphologies, ranging from wormy and hummocky deposits to knobby surfaces; the causes of these variations are not known, but may be related to the energy regime in which the ejecta are deposited. Outside the limits of the textured ejecta are found both fields of satellitic craters (secondaries) and light plains deposits. Impact melt sheets are observed on the floors of relatively unflooded basins. Samples of impact melts from lunar basins have basaltic major-element chemistry, characterized by K, rare-earth elements (REE), P, and other trace elements of varying concentration (KREEP); ages are between 3.8 and 3.9 Ga. These lithologies cannot be produced through the fusion of known pristine (plutonic) rock types, suggesting the occurrence of unknown lithologies within the Moon. These melts were probably generated at middle to lower crustal levels. Ejecta compositions, preservation of pre-basin topography, and deposit morphologies all indicate that the excavation cavity of multiring basins is between about 0.4 and 0.6 times the diameter of the apparent crater diameter. Basin depths of excavation can be inferred from the composition of basin ejecta. A variety of

  9. Modular Spectral Inference Framework Applied to Young Stars and Brown Dwarfs

    NASA Technical Reports Server (NTRS)

    Gully-Santiago, Michael A.; Marley, Mark S.

    2017-01-01

    In practice, synthetic spectral models are imperfect, causing inaccurate estimates of stellar parameters. Using forward modeling and statistical inference, we derive accurate stellar parameters for a given observed spectrum by emulating a grid of precomputed spectra to track uncertainties. Spectral inference as applied to brown dwarfs re: Synthetic spectral models (Marley et al 1996 and 2014) via the newest grid spans a massive multi-dimensional grid applied to IGRINS spectra, improving atmospheric models for JWST. When applied to young stars(10Myr) with large starpots, they can be measured spectroscopically, especially in the near-IR with IGRINS.

  10. Genetic relatedness of indigenous ethnic groups in northern Borneo to neighboring populations from Southeast Asia, as inferred from genome-wide SNP data.

    PubMed

    Yew, Chee Wei; Hoque, Mohd Zahirul; Pugh-Kitingan, Jacqueline; Minsong, Alexander; Voo, Christopher Lok Yung; Ransangan, Julian; Lau, Sophia Tiek Ying; Wang, Xu; Saw, Woei Yuh; Ong, Rick Twee-Hee; Teo, Yik-Ying; Xu, Shuhua; Hoh, Boon-Peng; Phipps, Maude E; Kumar, S Vijay

    2018-07-01

    The region of northern Borneo is home to the current state of Sabah, Malaysia. It is located closest to the southern Philippine islands and may have served as a viaduct for ancient human migration onto or off of Borneo Island. In this study, five indigenous ethnic groups from Sabah were subjected to genome-wide SNP genotyping. These individuals represent the "North Borneo"-speaking group of the great Austronesian family. They have traditionally resided in the inland region of Sabah. The dataset was merged with public datasets, and the genetic relatedness of these groups to neighboring populations from the islands of Southeast Asia, mainland Southeast Asia and southern China was inferred. Genetic structure analysis revealed that these groups formed a genetic cluster that was independent of the clusters of neighboring populations. Additionally, these groups exhibited near-absolute proportions of a genetic component that is also common among Austronesians from Taiwan and the Philippines. They showed no genetic admixture with Austro-Melanesian populations. Furthermore, phylogenetic analysis showed that they are closely related to non-Austro-Melansian Filipinos as well as to Taiwan natives but are distantly related to populations from mainland Southeast Asia. Relatively lower heterozygosity and higher pairwise genetic differentiation index (F ST ) values than those of nearby populations indicate that these groups might have experienced genetic drift in the past, resulting in their differentiation from other Austronesians. Subsequent formal testing suggested that these populations have received no gene flow from neighboring populations. Taken together, these results imply that the indigenous ethnic groups of northern Borneo shared a common ancestor with Taiwan natives and non-Austro-Melanesian Filipinos and then isolated themselves on the inland of Sabah. This isolation presumably led to no admixture with other populations, and these individuals therefore underwent

  11. Road Traffic Anomaly Detection via Collaborative Path Inference from GPS Snippets.

    PubMed

    Wang, Hongtao; Wen, Hui; Yi, Feng; Zhu, Hongsong; Sun, Limin

    2017-03-09

    Road traffic anomaly denotes a road segment that is anomalous in terms of traffic flow of vehicles. Detecting road traffic anomalies from GPS (Global Position System) snippets data is becoming critical in urban computing since they often suggest underlying events. However, the noisy ands parse nature of GPS snippets data have ushered multiple problems, which have prompted the detection of road traffic anomalies to be very challenging. To address these issues, we propose a two-stage solution which consists of two components: a Collaborative Path Inference (CPI) model and a Road Anomaly Test (RAT) model. CPI model performs path inference incorporating both static and dynamic features into a Conditional Random Field (CRF). Dynamic context features are learned collaboratively from large GPS snippets via a tensor decomposition technique. Then RAT calculates the anomalous degree for each road segment from the inferred fine-grained trajectories in given time intervals. We evaluated our method using a large scale real world dataset, which includes one-month GPS location data from more than eight thousand taxi cabs in Beijing. The evaluation results show the advantages of our method beyond other baseline techniques.

  12. A nonparametric method to generate synthetic populations to adjust for complex sampling design features.

    PubMed

    Dong, Qi; Elliott, Michael R; Raghunathan, Trivellore E

    2014-06-01

    Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.

  13. A nonparametric method to generate synthetic populations to adjust for complex sampling design features

    PubMed Central

    Dong, Qi; Elliott, Michael R.; Raghunathan, Trivellore E.

    2017-01-01

    Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs. PMID:29200608

  14. Bayes factors and multimodel inference

    USGS Publications Warehouse

    Link, W.A.; Barker, R.J.; Thomson, David L.; Cooch, Evan G.; Conroy, Michael J.

    2009-01-01

    Multimodel inference has two main themes: model selection, and model averaging. Model averaging is a means of making inference conditional on a model set, rather than on a selected model, allowing formal recognition of the uncertainty associated with model choice. The Bayesian paradigm provides a natural framework for model averaging, and provides a context for evaluation of the commonly used AIC weights. We review Bayesian multimodel inference, noting the importance of Bayes factors. Noting the sensitivity of Bayes factors to the choice of priors on parameters, we define and propose nonpreferential priors as offering a reasonable standard for objective multimodel inference.

  15. Sex change and effective population size: implications for population genetic studies in marine fish.

    PubMed

    Coscia, I; Chopelet, J; Waples, R S; Mann, B Q; Mariani, S

    2016-10-01

    Large variance in reproductive success is the primary factor that reduces effective population size (Ne) in natural populations. In sequentially hermaphroditic (sex-changing) fish, the sex ratio is typically skewed and biased towards the 'first' sex, while reproductive success increases considerably after sex change. Therefore, sex-changing fish populations are theoretically expected to have lower Ne than gonochorists (separate sexes), assuming all other parameters are essentially equal. In this study, we estimate Ne from genetic data collected from two ecologically similar species living along the eastern coast of South Africa: one gonochoristic, the 'santer' sea bream Cheimerius nufar, and one protogynous (female-first) sex changer, the 'slinger' sea bream Chrysoblephus puniceus. For both species, no evidence of genetic structuring, nor significant variation in genetic diversity, was found in the study area. Estimates of contemporary Ne were significantly lower in the protogynous species, but the same pattern was not apparent over historical timescales. Overall, our results show that sequential hermaphroditism may affect Ne differently over varying time frames, and that demographic signatures inferred from genetic markers with different inheritance modes also need to be interpreted cautiously, in relation to sex-changing life histories.

  16. Causal language and strength of inference in academic and media articles shared in social media (CLAIMS): A systematic review.

    PubMed

    Haber, Noah; Smith, Emily R; Moscoe, Ellen; Andrews, Kathryn; Audy, Robin; Bell, Winnie; Brennan, Alana T; Breskin, Alexander; Kane, Jeremy C; Karra, Mahesh; McClure, Elizabeth S; Suarez, Elizabeth A

    2018-01-01

    The pathway from evidence generation to consumption contains many steps which can lead to overstatement or misinformation. The proliferation of internet-based health news may encourage selection of media and academic research articles that overstate strength of causal inference. We investigated the state of causal inference in health research as it appears at the end of the pathway, at the point of social media consumption. We screened the NewsWhip Insights database for the most shared media articles on Facebook and Twitter reporting about peer-reviewed academic studies associating an exposure with a health outcome in 2015, extracting the 50 most-shared academic articles and media articles covering them. We designed and utilized a review tool to systematically assess and summarize studies' strength of causal inference, including generalizability, potential confounders, and methods used. These were then compared with the strength of causal language used to describe results in both academic and media articles. Two randomly assigned independent reviewers and one arbitrating reviewer from a pool of 21 reviewers assessed each article. We accepted the most shared 64 media articles pertaining to 50 academic articles for review, representing 68% of Facebook and 45% of Twitter shares in 2015. Thirty-four percent of academic studies and 48% of media articles used language that reviewers considered too strong for their strength of causal inference. Seventy percent of academic studies were considered low or very low strength of inference, with only 6% considered high or very high strength of causal inference. The most severe issues with academic studies' causal inference were reported to be omitted confounding variables and generalizability. Fifty-eight percent of media articles were found to have inaccurately reported the question, results, intervention, or population of the academic study. We find a large disparity between the strength of language as presented to the

  17. Causal language and strength of inference in academic and media articles shared in social media (CLAIMS): A systematic review

    PubMed Central

    Smith, Emily R.; Moscoe, Ellen; Audy, Robin; Bell, Winnie; Brennan, Alana T.; Breskin, Alexander; Kane, Jeremy C.; Suarez, Elizabeth A.

    2018-01-01

    Background The pathway from evidence generation to consumption contains many steps which can lead to overstatement or misinformation. The proliferation of internet-based health news may encourage selection of media and academic research articles that overstate strength of causal inference. We investigated the state of causal inference in health research as it appears at the end of the pathway, at the point of social media consumption. Methods We screened the NewsWhip Insights database for the most shared media articles on Facebook and Twitter reporting about peer-reviewed academic studies associating an exposure with a health outcome in 2015, extracting the 50 most-shared academic articles and media articles covering them. We designed and utilized a review tool to systematically assess and summarize studies’ strength of causal inference, including generalizability, potential confounders, and methods used. These were then compared with the strength of causal language used to describe results in both academic and media articles. Two randomly assigned independent reviewers and one arbitrating reviewer from a pool of 21 reviewers assessed each article. Results We accepted the most shared 64 media articles pertaining to 50 academic articles for review, representing 68% of Facebook and 45% of Twitter shares in 2015. Thirty-four percent of academic studies and 48% of media articles used language that reviewers considered too strong for their strength of causal inference. Seventy percent of academic studies were considered low or very low strength of inference, with only 6% considered high or very high strength of causal inference. The most severe issues with academic studies’ causal inference were reported to be omitted confounding variables and generalizability. Fifty-eight percent of media articles were found to have inaccurately reported the question, results, intervention, or population of the academic study. Conclusions We find a large disparity between the

  18. Asymptotic Distributions of Coalescence Times and Ancestral Lineage Numbers for Populations with Temporally Varying Size

    PubMed Central

    Chen, Hua; Chen, Kun

    2013-01-01

    The distributions of coalescence times and ancestral lineage numbers play an essential role in coalescent modeling and ancestral inference. Both exact distributions of coalescence times and ancestral lineage numbers are expressed as the sum of alternating series, and the terms in the series become numerically intractable for large samples. More computationally attractive are their asymptotic distributions, which were derived in Griffiths (1984) for populations with constant size. In this article, we derive the asymptotic distributions of coalescence times and ancestral lineage numbers for populations with temporally varying size. For a sample of size n, denote by Tm the mth coalescent time, when m + 1 lineages coalesce into m lineages, and An(t) the number of ancestral lineages at time t back from the current generation. Similar to the results in Griffiths (1984), the number of ancestral lineages, An(t), and the coalescence times, Tm, are asymptotically normal, with the mean and variance of these distributions depending on the population size function, N(t). At the very early stage of the coalescent, when t → 0, the number of coalesced lineages n − An(t) follows a Poisson distribution, and as m → n, n(n−1)Tm/2N(0) follows a gamma distribution. We demonstrate the accuracy of the asymptotic approximations by comparing to both exact distributions and coalescent simulations. Several applications of the theoretical results are also shown: deriving statistics related to the properties of gene genealogies, such as the time to the most recent common ancestor (TMRCA) and the total branch length (TBL) of the genealogy, and deriving the allele frequency spectrum for large genealogies. With the advent of genomic-level sequencing data for large samples, the asymptotic distributions are expected to have wide applications in theoretical and methodological development for population genetic inference. PMID:23666939

  19. Asymptotic distributions of coalescence times and ancestral lineage numbers for populations with temporally varying size.

    PubMed

    Chen, Hua; Chen, Kun

    2013-07-01

    The distributions of coalescence times and ancestral lineage numbers play an essential role in coalescent modeling and ancestral inference. Both exact distributions of coalescence times and ancestral lineage numbers are expressed as the sum of alternating series, and the terms in the series become numerically intractable for large samples. More computationally attractive are their asymptotic distributions, which were derived in Griffiths (1984) for populations with constant size. In this article, we derive the asymptotic distributions of coalescence times and ancestral lineage numbers for populations with temporally varying size. For a sample of size n, denote by Tm the mth coalescent time, when m + 1 lineages coalesce into m lineages, and An(t) the number of ancestral lineages at time t back from the current generation. Similar to the results in Griffiths (1984), the number of ancestral lineages, An(t), and the coalescence times, Tm, are asymptotically normal, with the mean and variance of these distributions depending on the population size function, N(t). At the very early stage of the coalescent, when t → 0, the number of coalesced lineages n - An(t) follows a Poisson distribution, and as m → n, $$n\\left(n-1\\right){T}_{m}/2N\\left(0\\right)$$ follows a gamma distribution. We demonstrate the accuracy of the asymptotic approximations by comparing to both exact distributions and coalescent simulations. Several applications of the theoretical results are also shown: deriving statistics related to the properties of gene genealogies, such as the time to the most recent common ancestor (TMRCA) and the total branch length (TBL) of the genealogy, and deriving the allele frequency spectrum for large genealogies. With the advent of genomic-level sequencing data for large samples, the asymptotic distributions are expected to have wide applications in theoretical and methodological development for population genetic inference.

  20. An Intuitive Dashboard for Bayesian Network Inference

    NASA Astrophysics Data System (ADS)

    Reddy, Vikas; Charisse Farr, Anna; Wu, Paul; Mengersen, Kerrie; Yarlagadda, Prasad K. D. V.

    2014-03-01

    Current Bayesian network software packages provide good graphical interface for users who design and develop Bayesian networks for various applications. However, the intended end-users of these networks may not necessarily find such an interface appealing and at times it could be overwhelming, particularly when the number of nodes in the network is large. To circumvent this problem, this paper presents an intuitive dashboard, which provides an additional layer of abstraction, enabling the end-users to easily perform inferences over the Bayesian networks. Unlike most software packages, which display the nodes and arcs of the network, the developed tool organises the nodes based on the cause-and-effect relationship, making the user-interaction more intuitive and friendly. In addition to performing various types of inferences, the users can conveniently use the tool to verify the behaviour of the developed Bayesian network. The tool has been developed using QT and SMILE libraries in C++.

  1. Evaluating and improving count-based population inference: A case study from 31 years of monitoring Sandhill Cranes

    USGS Publications Warehouse

    Gerber, Brian D.; Kendall, William L.

    2017-01-01

    Monitoring animal populations can be difficult. Limited resources often force monitoring programs to rely on unadjusted or smoothed counts as an index of abundance. Smoothing counts is commonly done using a moving-average estimator to dampen sampling variation. These indices are commonly used to inform management decisions, although their reliability is often unknown. We outline a process to evaluate the biological plausibility of annual changes in population counts and indices from a typical monitoring scenario and compare results with a hierarchical Bayesian time series (HBTS) model. We evaluated spring and fall counts, fall indices, and model-based predictions for the Rocky Mountain population (RMP) of Sandhill Cranes (Antigone canadensis) by integrating juvenile recruitment, harvest, and survival into a stochastic stage-based population model. We used simulation to evaluate population indices from the HBTS model and the commonly used 3-yr moving average estimator. We found counts of the RMP to exhibit biologically unrealistic annual change, while the fall population index was largely biologically realistic. HBTS model predictions suggested that the RMP changed little over 31 yr of monitoring, but the pattern depended on assumptions about the observational process. The HBTS model fall population predictions were biologically plausible if observed crane harvest mortality was compensatory up to natural mortality, as empirical evidence suggests. Simulations indicated that the predicted mean of the HBTS model was generally a more reliable estimate of the true population than population indices derived using a moving 3-yr average estimator. Practitioners could gain considerable advantages from modeling population counts using a hierarchical Bayesian autoregressive approach. Advantages would include: (1) obtaining measures of uncertainty; (2) incorporating direct knowledge of the observational and population processes; (3) accommodating missing years of data; and (4

  2. Inference by Exclusion in Goffin Cockatoos (Cacatua goffini)

    PubMed Central

    O’Hara, Mark; Auersperg, Alice M. I.; Bugnyar, Thomas; Huber, Ludwig

    2015-01-01

    Inference by exclusion, the ability to base choices on the systematic exclusion of alternatives, has been studied in many nonhuman species over the past decade. However, the majority of methodologies employed so far are hard to integrate into a comparative framework as they rarely use controls for the effect of neophilia. Here, we present an improved approach that takes neophilia into account, using an abstract two-choice task on a touch screen, which is equally feasible for a large variety of species. To test this approach we chose Goffin cockatoos (Cacatua goffini), a highly explorative Indonesian parrot species, which have recently been reported to have sophisticated cognitive skills in the technical domain. Our results indicate that Goffin cockatoos are able to solve such abstract two-choice tasks employing inference by exclusion but also highlight the importance of other response strategies. PMID:26244692

  3. Fractional Yields Inferred from Halo and Thick Disk Stars

    NASA Astrophysics Data System (ADS)

    Caimmi, R.

    2013-12-01

    Linear [Q/H]-[O/H] relations, Q = Na, Mg, Si, Ca, Ti, Cr, Fe, Ni, are inferred from a sample (N=67) of recently studied FGK-type dwarf stars in the solar neighbourhood including different populations (Nissen and Schuster 2010, Ramirez et al. 2012), namely LH (N=24, low-α halo), HH (N=25, high-α halo), KD (N=16, thick disk), and OL (N=2, globular cluster outliers). Regression line slope and intercept estimators and related variance estimators are determined. With regard to the straight line, [Q/H]=a_{Q}[O/H]+b_{Q}, sample stars are displayed along a "main sequence", [Q,O] = [a_{Q},b_{Q},Δ b_{Q}], leaving aside the two OL stars, which, in most cases (e.g. Na), lie outside. The unit slope, a_{Q}=1, implies Q is a primary element synthesised via SNII progenitors in the presence of a universal stellar initial mass function (defined as simple primary element). In this respect, Mg, Si, Ti, show hat a_{Q}=1 within ∓2hatσ_ {hat a_{Q}}; Cr, Fe, Ni, within ∓3hatσ_{hat a_{Q}}; Na, Ca, within ∓ rhatσ_{hat a_{Q}}, r>3. The empirical, differential element abundance distributions are inferred from LH, HH, KD, HA = HH + KD subsamples, where related regression lines represent their theoretical counterparts within the framework of simple MCBR (multistage closed box + reservoir) chemical evolution models. Hence, the fractional yields, hat{p}_{Q}/hat{p}_{O}, are determined and (as an example) a comparison is shown with their theoretical counterparts inferred from SNII progenitor nucleosynthesis under the assumption of a power-law stellar initial mass function. The generalized fractional yields, C_{Q}=Z_{Q}/Z_{O}^{a_{Q}}, are determined regardless of the chemical evolution model. The ratio of outflow to star formation rate is compared for different populations in the framework of simple MCBR models. The opposite situation of element abundance variation entirely due to cosmic scatter is also considered under reasonable assumptions. The related differential element abundance

  4. Identifying currents in the gene pool for bacterial populations using an integrative approach.

    PubMed

    Tang, Jing; Hanage, William P; Fraser, Christophe; Corander, Jukka

    2009-08-01

    The evolution of bacterial populations has recently become considerably better understood due to large-scale sequencing of population samples. It has become clear that DNA sequences from a multitude of genes, as well as a broad sample coverage of a target population, are needed to obtain a relatively unbiased view of its genetic structure and the patterns of ancestry connected to the strains. However, the traditional statistical methods for evolutionary inference, such as phylogenetic analysis, are associated with several difficulties under such an extensive sampling scenario, in particular when a considerable amount of recombination is anticipated to have taken place. To meet the needs of large-scale analyses of population structure for bacteria, we introduce here several statistical tools for the detection and representation of recombination between populations. Also, we introduce a model-based description of the shape of a population in sequence space, in terms of its molecular variability and affinity towards other populations. Extensive real data from the genus Neisseria are utilized to demonstrate the potential of an approach where these population genetic tools are combined with an phylogenetic analysis. The statistical tools introduced here are freely available in BAPS 5.2 software, which can be downloaded from http://web.abo.fi/fak/mnf/mate/jc/software/baps.html.

  5. Large-Scale Optimization for Bayesian Inference in Complex Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Willcox, Karen; Marzouk, Youssef

    2013-11-12

    The SAGUARO (Scalable Algorithms for Groundwater Uncertainty Analysis and Robust Optimization) Project focused on the development of scalable numerical algorithms for large-scale Bayesian inversion in complex systems that capitalize on advances in large-scale simulation-based optimization and inversion methods. The project was a collaborative effort among MIT, the University of Texas at Austin, Georgia Institute of Technology, and Sandia National Laboratories. The research was directed in three complementary areas: efficient approximations of the Hessian operator, reductions in complexity of forward simulations via stochastic spectral approximations and model reduction, and employing large-scale optimization concepts to accelerate sampling. The MIT--Sandia component of themore » SAGUARO Project addressed the intractability of conventional sampling methods for large-scale statistical inverse problems by devising reduced-order models that are faithful to the full-order model over a wide range of parameter values; sampling then employs the reduced model rather than the full model, resulting in very large computational savings. Results indicate little effect on the computed posterior distribution. On the other hand, in the Texas--Georgia Tech component of the project, we retain the full-order model, but exploit inverse problem structure (adjoint-based gradients and partial Hessian information of the parameter-to-observation map) to implicitly extract lower dimensional information on the posterior distribution; this greatly speeds up sampling methods, so that fewer sampling points are needed. We can think of these two approaches as ``reduce then sample'' and ``sample then reduce.'' In fact, these two approaches are complementary, and can be used in conjunction with each other. Moreover, they both exploit deterministic inverse problem structure, in the form of adjoint-based gradient and Hessian information of the underlying parameter-to-observation map, to achieve

  6. Molecular computational elements encode large populations of small objects

    NASA Astrophysics Data System (ADS)

    Prasanna de Silva, A.; James, Mark R.; McKinney, Bernadine O. F.; Pears, David A.; Weir, Sheenagh M.

    2006-10-01

    Since the introduction of molecular computation, experimental molecular computational elements have grown to encompass small-scale integration, arithmetic and games, among others. However, the need for a practical application has been pressing. Here we present molecular computational identification (MCID), a demonstration that molecular logic and computation can be applied to a widely relevant issue. Examples of populations that need encoding in the microscopic world are cells in diagnostics or beads in combinatorial chemistry (tags). Taking advantage of the small size (about 1nm) and large `on/off' output ratios of molecular logic gates and using the great variety of logic types, input chemical combinations, switching thresholds and even gate arrays in addition to colours, we produce unique identifiers for members of populations of small polymer beads (about 100μm) used for synthesis of combinatorial libraries. Many millions of distinguishable tags become available. This method should be extensible to far smaller objects, with the only requirement being a `wash and watch' protocol. Our focus on converting molecular science into technology concerning analog sensors, turns to digital logic devices in the present work.

  7. Molecular computational elements encode large populations of small objects.

    PubMed

    de Silva, A Prasanna; James, Mark R; McKinney, Bernadine O F; Pears, David A; Weir, Sheenagh M

    2006-10-01

    Since the introduction of molecular computation, experimental molecular computational elements have grown to encompass small-scale integration, arithmetic and games, among others. However, the need for a practical application has been pressing. Here we present molecular computational identification (MCID), a demonstration that molecular logic and computation can be applied to a widely relevant issue. Examples of populations that need encoding in the microscopic world are cells in diagnostics or beads in combinatorial chemistry (tags). Taking advantage of the small size (about 1 nm) and large 'on/off' output ratios of molecular logic gates and using the great variety of logic types, input chemical combinations, switching thresholds and even gate arrays in addition to colours, we produce unique identifiers for members of populations of small polymer beads (about 100 microm) used for synthesis of combinatorial libraries. Many millions of distinguishable tags become available. This method should be extensible to far smaller objects, with the only requirement being a 'wash and watch' protocol. Our focus on converting molecular science into technology concerning analog sensors, turns to digital logic devices in the present work.

  8. Cancer evolution: mathematical models and computational inference.

    PubMed

    Beerenwinkel, Niko; Schwarz, Roland F; Gerstung, Moritz; Markowetz, Florian

    2015-01-01

    Cancer is a somatic evolutionary process characterized by the accumulation of mutations, which contribute to tumor growth, clinical progression, immune escape, and drug resistance development. Evolutionary theory can be used to analyze the dynamics of tumor cell populations and to make inference about the evolutionary history of a tumor from molecular data. We review recent approaches to modeling the evolution of cancer, including population dynamics models of tumor initiation and progression, phylogenetic methods to model the evolutionary relationship between tumor subclones, and probabilistic graphical models to describe dependencies among mutations. Evolutionary modeling helps to understand how tumors arise and will also play an increasingly important prognostic role in predicting disease progression and the outcome of medical interventions, such as targeted therapy. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society of Systematic Biologists.

  9. Inferring the photometric and size evolution of galaxies from image simulations. I. Method

    NASA Astrophysics Data System (ADS)

    Carassou, Sébastien; de Lapparent, Valérie; Bertin, Emmanuel; Le Borgne, Damien

    2017-09-01

    Context. Current constraints on models of galaxy evolution rely on morphometric catalogs extracted from multi-band photometric surveys. However, these catalogs are altered by selection effects that are difficult to model, that correlate in non trivial ways, and that can lead to contradictory predictions if not taken into account carefully. Aims: To address this issue, we have developed a new approach combining parametric Bayesian indirect likelihood (pBIL) techniques and empirical modeling with realistic image simulations that reproduce a large fraction of these selection effects. This allows us to perform a direct comparison between observed and simulated images and to infer robust constraints on model parameters. Methods: We use a semi-empirical forward model to generate a distribution of mock galaxies from a set of physical parameters. These galaxies are passed through an image simulator reproducing the instrumental characteristics of any survey and are then extracted in the same way as the observed data. The discrepancy between the simulated and observed data is quantified, and minimized with a custom sampling process based on adaptive Markov chain Monte Carlo methods. Results: Using synthetic data matching most of the properties of a Canada-France-Hawaii Telescope Legacy Survey Deep field, we demonstrate the robustness and internal consistency of our approach by inferring the parameters governing the size and luminosity functions and their evolutions for different realistic populations of galaxies. We also compare the results of our approach with those obtained from the classical spectral energy distribution fitting and photometric redshift approach. Conclusions: Our pipeline infers efficiently the luminosity and size distribution and evolution parameters with a very limited number of observables (three photometric bands). When compared to SED fitting based on the same set of observables, our method yields results that are more accurate and free from

  10. Fundamental limits on dynamic inference from single-cell snapshots

    PubMed Central

    Weinreb, Caleb; Tusi, Betsabeh K.; Socolovsky, Merav

    2018-01-01

    Single-cell expression profiling reveals the molecular states of individual cells with unprecedented detail. Because these methods destroy cells in the process of analysis, they cannot measure how gene expression changes over time. However, some information on dynamics is present in the data: the continuum of molecular states in the population can reflect the trajectory of a typical cell. Many methods for extracting single-cell dynamics from population data have been proposed. However, all such attempts face a common limitation: for any measured distribution of cell states, there are multiple dynamics that could give rise to it, and by extension, multiple possibilities for underlying mechanisms of gene regulation. Here, we describe the aspects of gene expression dynamics that cannot be inferred from a static snapshot alone and identify assumptions necessary to constrain a unique solution for cell dynamics from static snapshots. We translate these constraints into a practical algorithmic approach, population balance analysis (PBA), which makes use of a method from spectral graph theory to solve a class of high-dimensional differential equations. We use simulations to show the strengths and limitations of PBA, and then apply it to single-cell profiles of hematopoietic progenitor cells (HPCs). Cell state predictions from this analysis agree with HPC fate assays reported in several papers over the past two decades. By highlighting the fundamental limits on dynamic inference faced by any method, our framework provides a rigorous basis for dynamic interpretation of a gene expression continuum and clarifies best experimental designs for trajectory reconstruction from static snapshot measurements. PMID:29463712

  11. Variations on Bayesian Prediction and Inference

    DTIC Science & Technology

    2016-05-09

    inference 2.2.1 Background There are a number of statistical inference problems that are not generally formulated via a full probability model...problem of inference about an unknown parameter, the Bayesian approach requires a full probability 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND...the problem of inference about an unknown parameter, the Bayesian approach requires a full probability model/likelihood which can be an obstacle

  12. Algorithmic methods to infer the evolutionary trajectories in cancer progression

    PubMed Central

    Graudenzi, Alex; Ramazzotti, Daniele; Sanz-Pamplona, Rebeca; De Sano, Luca; Mauri, Giancarlo; Moreno, Victor; Antoniotti, Marco; Mishra, Bud

    2016-01-01

    The genomic evolution inherent to cancer relates directly to a renewed focus on the voluminous next-generation sequencing data and machine learning for the inference of explanatory models of how the (epi)genomic events are choreographed in cancer initiation and development. However, despite the increasing availability of multiple additional -omics data, this quest has been frustrated by various theoretical and technical hurdles, mostly stemming from the dramatic heterogeneity of the disease. In this paper, we build on our recent work on the “selective advantage” relation among driver mutations in cancer progression and investigate its applicability to the modeling problem at the population level. Here, we introduce PiCnIc (Pipeline for Cancer Inference), a versatile, modular, and customizable pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes. The pipeline has many translational implications because it combines state-of-the-art techniques for sample stratification, driver selection, identification of fitness-equivalent exclusive alterations, and progression model inference. We demonstrate PiCnIc’s ability to reproduce much of the current knowledge on colorectal cancer progression as well as to suggest novel experimentally verifiable hypotheses. PMID:27357673

  13. Population histories of right whales (Cetacea: Eubalaena) inferred from mitochondrial sequence diversities and divergences of their whale lice (Amphipoda: Cyamus).

    PubMed

    Kaliszewska, Zofia A; Seger, Jon; Rowntree, Victoria J; Barco, Susan G; Benegas, Rafael; Best, Peter B; Brown, Moira W; Brownell, Robert L; Carribero, Alejandro; Harcourt, Robert; Knowlton, Amy R; Marshall-Tilas, Kim; Patenaude, Nathalie J; Rivarola, Mariana; Schaeff, Catherine M; Sironi, Mariano; Smith, Wendy A; Yamada, Tadasu K

    2005-10-01

    Right whales carry large populations of three 'whale lice' (Cyamus ovalis, Cyamus gracilis, Cyamus erraticus) that have no other hosts. We used sequence variation in the mitochondrial COI gene to ask (i) whether cyamid population structures might reveal associations among right whale individuals and subpopulations, (ii) whether the divergences of the three nominally conspecific cyamid species on North Atlantic, North Pacific, and southern right whales (Eubalaena glacialis, Eubalaena japonica, Eubalaena australis) might indicate their times of separation, and (iii) whether the shapes of cyamid gene trees might contain information about changes in the population sizes of right whales. We found high levels of nucleotide diversity but almost no population structure within oceans, indicating large effective population sizes and high rates of transfer between whales and subpopulations. North Atlantic and Southern Ocean populations of all three species are reciprocally monophyletic, and North Pacific C. erraticus is well separated from North Atlantic and southern C. erraticus. Mitochondrial clock calibrations suggest that these divergences occurred around 6 million years ago (Ma), and that the Eubalaena mitochondrial clock is very slow. North Pacific C. ovalis forms a clade inside the southern C. ovalis gene tree, implying that at least one right whale has crossed the equator in the Pacific Ocean within the last 1-2 million years (Myr). Low-frequency polymorphisms are more common than expected under neutrality for populations of constant size, but there is no obvious signal of rapid, interspecifically congruent expansion of the kind that would be expected if North Atlantic or southern right whales had experienced a prolonged population bottleneck within the last 0.5 Myr.

  14. Robust Inference of Risks of Large Portfolios

    PubMed Central

    Fan, Jianqing; Han, Fang; Liu, Han; Vickers, Byron

    2016-01-01

    We propose a bootstrap-based robust high-confidence level upper bound (Robust H-CLUB) for assessing the risks of large portfolios. The proposed approach exploits rank-based and quantile-based estimators, and can be viewed as a robust extension of the H-CLUB procedure (Fan et al., 2015). Such an extension allows us to handle possibly misspecified models and heavy-tailed data, which are stylized features in financial returns. Under mixing conditions, we analyze the proposed approach and demonstrate its advantage over H-CLUB. We further provide thorough numerical results to back up the developed theory, and also apply the proposed method to analyze a stock market dataset. PMID:27818569

  15. Generic comparison of protein inference engines.

    PubMed

    Claassen, Manfred; Reiter, Lukas; Hengartner, Michael O; Buhmann, Joachim M; Aebersold, Ruedi

    2012-04-01

    Protein identifications, instead of peptide-spectrum matches, constitute the biologically relevant result of shotgun proteomics studies. How to appropriately infer and report protein identifications has triggered a still ongoing debate. This debate has so far suffered from the lack of appropriate performance measures that allow us to objectively assess protein inference approaches. This study describes an intuitive, generic and yet formal performance measure and demonstrates how it enables experimentalists to select an optimal protein inference strategy for a given collection of fragment ion spectra. We applied the performance measure to systematically explore the benefit of excluding possibly unreliable protein identifications, such as single-hit wonders. Therefore, we defined a family of protein inference engines by extending a simple inference engine by thousands of pruning variants, each excluding a different specified set of possibly unreliable identifications. We benchmarked these protein inference engines on several data sets representing different proteomes and mass spectrometry platforms. Optimally performing inference engines retained all high confidence spectral evidence, without posterior exclusion of any type of protein identifications. Despite the diversity of studied data sets consistently supporting this rule, other data sets might behave differently. In order to ensure maximal reliable proteome coverage for data sets arising in other studies we advocate abstaining from rigid protein inference rules, such as exclusion of single-hit wonders, and instead consider several protein inference approaches and assess these with respect to the presented performance measure in the specific application context.

  16. CellTree: an R/bioconductor package to infer the hierarchical structure of cell populations from single-cell RNA-seq data.

    PubMed

    duVerle, David A; Yotsukura, Sohiya; Nomura, Seitaro; Aburatani, Hiroyuki; Tsuda, Koji

    2016-09-13

    Single-cell RNA sequencing is fast becoming one the standard method for gene expression measurement, providing unique insights into cellular processes. A number of methods, based on general dimensionality reduction techniques, have been suggested to help infer and visualise the underlying structure of cell populations from single-cell expression levels, yet their models generally lack proper biological grounding and struggle at identifying complex differentiation paths. Here we introduce cellTree: an R/Bioconductor package that uses a novel statistical approach, based on document analysis techniques, to produce tree structures outlining the hierarchical relationship between single-cell samples, while identifying latent groups of genes that can provide biological insights. With cellTree, we provide experimentalists with an easy-to-use tool, based on statistically and biologically-sound algorithms, to efficiently explore and visualise single-cell RNA data. The cellTree package is publicly available in the online Bionconductor repository at: http://bioconductor.org/packages/cellTree/ .

  17. The Effects of Population Size Histories on Estimates of Selection Coefficients from Time-Series Genetic Data.

    PubMed

    Jewett, Ethan M; Steinrücken, Matthias; Song, Yun S

    2016-11-01

    Many approaches have been developed for inferring selection coefficients from time series data while accounting for genetic drift. These approaches have been motivated by the intuition that properly accounting for the population size history can significantly improve estimates of selective strengths. However, the improvement in inference accuracy that can be attained by modeling drift has not been characterized. Here, by comparing maximum likelihood estimates of selection coefficients that account for the true population size history with estimates that ignore drift by assuming allele frequencies evolve deterministically in a population of infinite size, we address the following questions: how much can modeling the population size history improve estimates of selection coefficients? How much can mis-inferred population sizes hurt inferences of selection coefficients? We conduct our analysis under the discrete Wright-Fisher model by deriving the exact probability of an allele frequency trajectory in a population of time-varying size and we replicate our results under the diffusion model. For both models, we find that ignoring drift leads to estimates of selection coefficients that are nearly as accurate as estimates that account for the true population history, even when population sizes are small and drift is high. This result is of interest because inference methods that ignore drift are widely used in evolutionary studies and can be many orders of magnitude faster than methods that account for population sizes. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  18. Phylodynamic Inference with Kernel ABC and Its Application to HIV Epidemiology.

    PubMed

    Poon, Art F Y

    2015-09-01

    The shapes of phylogenetic trees relating virus populations are determined by the adaptation of viruses within each host, and by the transmission of viruses among hosts. Phylodynamic inference attempts to reverse this flow of information, estimating parameters of these processes from the shape of a virus phylogeny reconstructed from a sample of genetic sequences from the epidemic. A key challenge to phylodynamic inference is quantifying the similarity between two trees in an efficient and comprehensive way. In this study, I demonstrate that a new distance measure, based on a subset tree kernel function from computational linguistics, confers a significant improvement over previous measures of tree shape for classifying trees generated under different epidemiological scenarios. Next, I incorporate this kernel-based distance measure into an approximate Bayesian computation (ABC) framework for phylodynamic inference. ABC bypasses the need for an analytical solution of model likelihood, as it only requires the ability to simulate data from the model. I validate this "kernel-ABC" method for phylodynamic inference by estimating parameters from data simulated under a simple epidemiological model. Results indicate that kernel-ABC attained greater accuracy for parameters associated with virus transmission than leading software on the same data sets. Finally, I apply the kernel-ABC framework to study a recent outbreak of a recombinant HIV subtype in China. Kernel-ABC provides a versatile framework for phylodynamic inference because it can fit a broader range of models than methods that rely on the computation of exact likelihoods. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  19. Massive optimal data compression and density estimation for scalable, likelihood-free inference in cosmology

    NASA Astrophysics Data System (ADS)

    Alsing, Justin; Wandelt, Benjamin; Feeney, Stephen

    2018-07-01

    Many statistical models in cosmology can be simulated forwards but have intractable likelihood functions. Likelihood-free inference methods allow us to perform Bayesian inference from these models using only forward simulations, free from any likelihood assumptions or approximations. Likelihood-free inference generically involves simulating mock data and comparing to the observed data; this comparison in data space suffers from the curse of dimensionality and requires compression of the data to a small number of summary statistics to be tractable. In this paper, we use massive asymptotically optimal data compression to reduce the dimensionality of the data space to just one number per parameter, providing a natural and optimal framework for summary statistic choice for likelihood-free inference. Secondly, we present the first cosmological application of Density Estimation Likelihood-Free Inference (DELFI), which learns a parametrized model for joint distribution of data and parameters, yielding both the parameter posterior and the model evidence. This approach is conceptually simple, requires less tuning than traditional Approximate Bayesian Computation approaches to likelihood-free inference and can give high-fidelity posteriors from orders of magnitude fewer forward simulations. As an additional bonus, it enables parameter inference and Bayesian model comparison simultaneously. We demonstrate DELFI with massive data compression on an analysis of the joint light-curve analysis supernova data, as a simple validation case study. We show that high-fidelity posterior inference is possible for full-scale cosmological data analyses with as few as ˜104 simulations, with substantial scope for further improvement, demonstrating the scalability of likelihood-free inference to large and complex cosmological data sets.

  20. Back to BaySICS: a user-friendly program for Bayesian Statistical Inference from Coalescent Simulations.

    PubMed

    Sandoval-Castellanos, Edson; Palkopoulou, Eleftheria; Dalén, Love

    2014-01-01

    Inference of population demographic history has vastly improved in recent years due to a number of technological and theoretical advances including the use of ancient DNA. Approximate Bayesian computation (ABC) stands among the most promising methods due to its simple theoretical fundament and exceptional flexibility. However, limited availability of user-friendly programs that perform ABC analysis renders it difficult to implement, and hence programming skills are frequently required. In addition, there is limited availability of programs able to deal with heterochronous data. Here we present the software BaySICS: Bayesian Statistical Inference of Coalescent Simulations. BaySICS provides an integrated and user-friendly platform that performs ABC analyses by means of coalescent simulations from DNA sequence data. It estimates historical demographic population parameters and performs hypothesis testing by means of Bayes factors obtained from model comparisons. Although providing specific features that improve inference from datasets with heterochronous data, BaySICS also has several capabilities making it a suitable tool for analysing contemporary genetic datasets. Those capabilities include joint analysis of independent tables, a graphical interface and the implementation of Markov-chain Monte Carlo without likelihoods.

  1. Children's and Adults' Evaluation of Their Own Inductive Inferences, Deductive Inferences, and Guesses

    ERIC Educational Resources Information Center

    Pillow, Bradford H.; Pearson, RaeAnne M.

    2009-01-01

    Adults' and kindergarten through fourth-grade children's evaluations and explanations of inductive inferences, deductive inferences, and guesses were assessed. Beginning in kindergarten, participants rated deductions as more certain than weak inductions or guesses. Beginning in third grade, deductions were rated as more certain than strong…

  2. Incorporating non-equilibrium dynamics into demographic history inferences of a migratory marine species.

    PubMed

    Carroll, E L; Alderman, R; Bannister, J L; Bérubé, M; Best, P B; Boren, L; Baker, C S; Constantine, R; Findlay, K; Harcourt, R; Lemaire, L; Palsbøll, P J; Patenaude, N J; Rowntree, V J; Seger, J; Steel, D; Valenzuela, L O; Watson, M; Gaggiotti, O E

    2018-05-03

    Understanding how dispersal and gene flow link geographically separated the populations over evolutionary history is challenging, particularly in migratory marine species. In southern right whales (SRWs, Eubalaena australis), patterns of genetic diversity are likely influenced by the glacial climate cycle and recent history of whaling. Here we use a dataset of mitochondrial DNA (mtDNA) sequences (n = 1327) and nuclear markers (17 microsatellite loci, n = 222) from major wintering grounds to investigate circumpolar population structure, historical demography and effective population size. Analyses of nuclear genetic variation identify two population clusters that correspond to the South Atlantic and Indo-Pacific ocean basins that have similar effective breeder estimates. In contrast, all wintering grounds show significant differentiation for mtDNA, but no sex-biased dispersal was detected using the microsatellite genotypes. An approximate Bayesian computation (ABC) approach with microsatellite markers compared the scenarios with gene flow through time, or isolation and secondary contact between ocean basins, while modelling declines in abundance linked to whaling. Secondary-contact scenarios yield the highest posterior probabilities, implying that populations in different ocean basins were largely isolated and came into secondary contact within the last 25,000 years, but the role of whaling in changes in genetic diversity and gene flow over recent generations could not be resolved. We hypothesise that these findings are driven by factors that promote isolation, such as female philopatry, and factors that could promote dispersal, such as oceanographic changes. These findings highlight the application of ABC approaches to infer the connectivity in mobile species with complex population histories and, currently, low levels of differentiation.

  3. LASSIM-A network inference toolbox for genome-wide mechanistic modeling.

    PubMed

    Magnusson, Rasmus; Mariotti, Guido Pio; Köpsén, Mattias; Lövfors, William; Gawel, Danuta R; Jörnsten, Rebecka; Linde, Jörg; Nordling, Torbjörn E M; Nyman, Elin; Schulze, Sylvie; Nestor, Colm E; Zhang, Huan; Cedersund, Gunnar; Benson, Mikael; Tjärnberg, Andreas; Gustafsson, Mika

    2017-06-01

    Recent technological advancements have made time-resolved, quantitative, multi-omics data available for many model systems, which could be integrated for systems pharmacokinetic use. Here, we present large-scale simulation modeling (LASSIM), which is a novel mathematical tool for performing large-scale inference using mechanistically defined ordinary differential equations (ODE) for gene regulatory networks (GRNs). LASSIM integrates structural knowledge about regulatory interactions and non-linear equations with multiple steady state and dynamic response expression datasets. The rationale behind LASSIM is that biological GRNs can be simplified using a limited subset of core genes that are assumed to regulate all other gene transcription events in the network. The LASSIM method is implemented as a general-purpose toolbox using the PyGMO Python package to make the most of multicore computers and high performance clusters, and is available at https://gitlab.com/Gustafsson-lab/lassim. As a method, LASSIM works in two steps, where it first infers a non-linear ODE system of the pre-specified core gene expression. Second, LASSIM in parallel optimizes the parameters that model the regulation of peripheral genes by core system genes. We showed the usefulness of this method by applying LASSIM to infer a large-scale non-linear model of naïve Th2 cell differentiation, made possible by integrating Th2 specific bindings, time-series together with six public and six novel siRNA-mediated knock-down experiments. ChIP-seq showed significant overlap for all tested transcription factors. Next, we performed novel time-series measurements of total T-cells during differentiation towards Th2 and verified that our LASSIM model could monitor those data significantly better than comparable models that used the same Th2 bindings. In summary, the LASSIM toolbox opens the door to a new type of model-based data analysis that combines the strengths of reliable mechanistic models with truly

  4. As the raven flies: using genetic data to infer the history of invasive common raven (Corvus corax) populations in the Mojave Desert.

    PubMed

    Fleischer, Robert C; Boarman, William I; Gonzalez, Elena G; Godinez, Alvaro; Omland, Kevin E; Young, Sarah; Helgen, Lauren; Syed, Gracia; McIntosh, Carl E

    2008-01-01

    Common raven (Corvus corax) populations in Mojave Desert regions of southern California and Nevada have increased dramatically over the past five decades. This growth has been attributed to increased human development in the region, as ravens have a commensal relationship with humans and feed extensively at landfills and on road-killed wildlife. Ravens, as a partially subsidized predator, also represent a problem for native desert wildlife, in particular threatened desert tortoises (Gopherus agassizii). However, it is unclear whether the more than 15-fold population increase is due to in situ population growth or to immigration from adjacent regions where ravens have been historically common. Ravens were sampled for genetic analysis at several local sites within five major areas: the West Mojave Desert (California), East Mojave Desert (southern Nevada), southern coastal California, northern coastal California (Bay Area), and northern Nevada (Great Basin). Analyses of mtDNA control region sequences reveal an increased frequency of raven 'Holarctic clade' haplotypes from south to north inland, with 'California clade' haplotypes nearly fixed in the California populations. There was significant structuring among regions for mtDNA, with high F(ST) values among sampling regions, especially between the Nevada and California samples. Analyses of eight microsatellite loci reveal a mostly similar pattern of regional population structure, with considerably smaller, but mostly significant, values. The greater mtDNA divergences may be due to lower female dispersal relative to males, lower N(e), or effects of high mutation rates on maximal values of F(ST). Analyses indicate recent population growth in the West Mojave Desert and a bottleneck in the northern California populations. While we cannot rule out in situ population growth as a factor, patterns of movement inferred from our data suggest that the increase in raven populations in the West Mojave Desert resulted from

  5. The limits of weak selection and large population size in evolutionary game theory.

    PubMed

    Sample, Christine; Allen, Benjamin

    2017-11-01

    Evolutionary game theory is a mathematical approach to studying how social behaviors evolve. In many recent works, evolutionary competition between strategies is modeled as a stochastic process in a finite population. In this context, two limits are both mathematically convenient and biologically relevant: weak selection and large population size. These limits can be combined in different ways, leading to potentially different results. We consider two orderings: the [Formula: see text] limit, in which weak selection is applied before the large population limit, and the [Formula: see text] limit, in which the order is reversed. Formal mathematical definitions of the [Formula: see text] and [Formula: see text] limits are provided. Applying these definitions to the Moran process of evolutionary game theory, we obtain asymptotic expressions for fixation probability and conditions for success in these limits. We find that the asymptotic expressions for fixation probability, and the conditions for a strategy to be favored over a neutral mutation, are different in the [Formula: see text] and [Formula: see text] limits. However, the ordering of limits does not affect the conditions for one strategy to be favored over another.

  6. Inference of chromosomal inversion dynamics from Pool-Seq data in natural and laboratory populations of Drosophila melanogaster.

    PubMed

    Kapun, Martin; van Schalkwyk, Hester; McAllister, Bryant; Flatt, Thomas; Schlötterer, Christian

    2014-04-01

    Sequencing of pools of individuals (Pool-Seq) represents a reliable and cost-effective approach for estimating genome-wide SNP and transposable element insertion frequencies. However, Pool-Seq does not provide direct information on haplotypes so that, for example, obtaining inversion frequencies has not been possible until now. Here, we have developed a new set of diagnostic marker SNPs for seven cosmopolitan inversions in Drosophila melanogaster that can be used to infer inversion frequencies from Pool-Seq data. We applied our novel marker set to Pool-Seq data from an experimental evolution study and from North American and Australian latitudinal clines. In the experimental evolution data, we find evidence that positive selection has driven the frequencies of In(3R)C and In(3R)Mo to increase over time. In the clinal data, we confirm the existence of frequency clines for In(2L)t, In(3L)P and In(3R)Payne in both North America and Australia and detect a previously unknown latitudinal cline for In(3R)Mo in North America. The inversion markers developed here provide a versatile and robust tool for characterizing inversion frequencies and their dynamics in Pool-Seq data from diverse D. melanogaster populations. © 2013 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.

  7. Inference of chromosomal inversion dynamics from Pool-Seq data in natural and laboratory populations of Drosophila melanogaster

    PubMed Central

    Kapun, Martin; van Schalkwyk, Hester; McAllister, Bryant; Flatt, Thomas; Schlötterer, Christian

    2014-01-01

    Sequencing of pools of individuals (Pool-Seq) represents a reliable and cost-effective approach for estimating genome-wide SNP and transposable element insertion frequencies. However, Pool-Seq does not provide direct information on haplotypes so that, for example, obtaining inversion frequencies has not been possible until now. Here, we have developed a new set of diagnostic marker SNPs for seven cosmopolitan inversions in Drosophila melanogaster that can be used to infer inversion frequencies from Pool-Seq data. We applied our novel marker set to Pool-Seq data from an experimental evolution study and from North American and Australian latitudinal clines. In the experimental evolution data, we find evidence that positive selection has driven the frequencies of In(3R)C and In(3R)Mo to increase over time. In the clinal data, we confirm the existence of frequency clines for In(2L)t, In(3L)P and In(3R)Payne in both North America and Australia and detect a previously unknown latitudinal cline for In(3R)Mo in North America. The inversion markers developed here provide a versatile and robust tool for characterizing inversion frequencies and their dynamics in Pool-Seq data from diverse D. melanogaster populations. PMID:24372777

  8. Population structure in Argentina

    PubMed Central

    Motti, Josefina M. B.; Paz Sepulveda, Paula B.; Yee, Muh-ching; Cooke, Thomas; Santos, María R.; Ramallo, Virginia; Alfaro, Emma L.; Dipierri, Jose E.; Bailliet, Graciela; Bravi, Claudio M.; Bustamante, Carlos D.; Kenny, Eimear E.

    2018-01-01

    We analyzed 391 samples from 12 Argentinian populations from the Center-West, East and North-West regions with the Illumina Human Exome Beadchip v1.0 (HumanExome-12v1-A). We did Principal Components analysis to infer patterns of populational divergence and migrations. We identified proportions and patterns of European, African and Native American ancestry and found a correlation between distance to Buenos Aires and proportion of Native American ancestry, where the highest proportion corresponds to the Northernmost populations, which is also the furthest from the Argentinian capital. Most of the European sources are from a South European origin, matching historical records, and we see two different Native American components, one that spreads all over Argentina and another specifically Andean. The highest percentages of African ancestry were in the Center West of Argentina, where the old trade routes took the slaves from Buenos Aires to Chile and Peru. Subcontinentaly, sources of this African component are represented by both West Africa and groups influenced by the Bantu expansion, the second slightly higher than the first, unlike North America and the Caribbean, where the main source is West Africa. This is reasonable, considering that a large proportion of the ships arriving at the Southern Hemisphere came from Mozambique, Loango and Angola. PMID:29715266

  9. Population structure in Argentina.

    PubMed

    Muzzio, Marina; Motti, Josefina M B; Paz Sepulveda, Paula B; Yee, Muh-Ching; Cooke, Thomas; Santos, María R; Ramallo, Virginia; Alfaro, Emma L; Dipierri, Jose E; Bailliet, Graciela; Bravi, Claudio M; Bustamante, Carlos D; Kenny, Eimear E

    2018-01-01

    We analyzed 391 samples from 12 Argentinian populations from the Center-West, East and North-West regions with the Illumina Human Exome Beadchip v1.0 (HumanExome-12v1-A). We did Principal Components analysis to infer patterns of populational divergence and migrations. We identified proportions and patterns of European, African and Native American ancestry and found a correlation between distance to Buenos Aires and proportion of Native American ancestry, where the highest proportion corresponds to the Northernmost populations, which is also the furthest from the Argentinian capital. Most of the European sources are from a South European origin, matching historical records, and we see two different Native American components, one that spreads all over Argentina and another specifically Andean. The highest percentages of African ancestry were in the Center West of Argentina, where the old trade routes took the slaves from Buenos Aires to Chile and Peru. Subcontinentaly, sources of this African component are represented by both West Africa and groups influenced by the Bantu expansion, the second slightly higher than the first, unlike North America and the Caribbean, where the main source is West Africa. This is reasonable, considering that a large proportion of the ships arriving at the Southern Hemisphere came from Mozambique, Loango and Angola.

  10. Dynamical Inference in the Milky Way

    NASA Astrophysics Data System (ADS)

    Bovy, Jo

    Current and future surveys of the Galaxy contain a wealth of information about the structure and evolution of the Galactic disk and halo. Teasing out this information is complicated by measurement uncertainties, missing data, and sparse sampling. I develop and describe several applications of generative modeling--creating an approximate description of the probability of the data given the physical parameters of the system--to deal with these issues. I develop a method for inferring the Galactic potential from individual observations of stellar kinematics such as will be furnished by the upcoming Gaia space astrometry mission. This method takes uncertainties in our knowledge of the distribution function of stellar tracers into account through marginalization. I demonstrate the method by inferring the force law in the Solar System from observations of the positions and velocities of the eight planets at a single epoch. I apply a similar method to derive the Milky Way's circular velocity from observations of maser kinematics. I infer the velocity distribution of nearby stars from Hipparcos data, which only consist of tangential velocities, by forward modeling the underlying distribution with a flexible multi-Gaussian model. I characterize the contribution of several "moving groups"---overdensities of co-moving stars---to the full distribution. By studying the properties of stars in these moving groups, I show that they do not form a single-burst population and that they are most likely due to transient non-axisymmetric features of the disk, such as transient spiral structure. By forward modeling one such scenario, I show how the Hercules moving group can be traced around the Galaxy by future surveys, which would confirm that the Milky Way bar's outer Lindblad resonance lies near the Solar radius.

  11. Data-driven sensitivity inference for Thomson scattering electron density measurement systems.

    PubMed

    Fujii, Keisuke; Yamada, Ichihiro; Hasuo, Masahiro

    2017-01-01

    We developed a method to infer the calibration parameters of multichannel measurement systems, such as channel variations of sensitivity and noise amplitude, from experimental data. We regard such uncertainties of the calibration parameters as dependent noise. The statistical properties of the dependent noise and that of the latent functions were modeled and implemented in the Gaussian process kernel. Based on their statistical difference, both parameters were inferred from the data. We applied this method to the electron density measurement system by Thomson scattering for the Large Helical Device plasma, which is equipped with 141 spatial channels. Based on the 210 sets of experimental data, we evaluated the correction factor of the sensitivity and noise amplitude for each channel. The correction factor varies by ≈10%, and the random noise amplitude is ≈2%, i.e., the measurement accuracy increases by a factor of 5 after this sensitivity correction. The certainty improvement in the spatial derivative inference was demonstrated.

  12. Timescales alter the inferred strength and temporal consistency of intraspecific diet specialization

    USGS Publications Warehouse

    Novak, Mark; Tinker, M. Tim

    2015-01-01

    Many populations consist of individuals that differ substantially in their diets. Quantification of the magnitude and temporal consistency of such intraspecific diet variation is needed to understand its importance, but the extent to which different approaches for doing so reflect instantaneous vs. time-aggregated measures of individual diets may bias inferences. We used direct observations of sea otter individuals (Enhydra lutris nereis) to assess how: (1) the timescale of sampling, (2) under-sampling, and (3) the incidence- vs. frequency-based consideration of prey species affect the inferred strength and consistency of intraspecific diet variation. Analyses of feeding observations aggregated over hourly to annual intervals revealed a substantial bias associated with time aggregation that decreases the inferred magnitude of specialization and increases the inferred consistency of individuals’ diets. Time aggregation also made estimates of specialization more sensitive to the consideration of prey frequency, which decreased estimates relative to the use of prey incidence; time aggregation did not affect the extent to which under-sampling contributed to its overestimation. Our analyses demonstrate the importance of studying intraspecific diet variation with an explicit consideration of time and thereby suggest guidelines for future empirical efforts. Failure to consider time will likely produce inconsistent predictions regarding the effects of intraspecific variation on predator–prey interactions.

  13. AGN Populations in Large-volume X-Ray Surveys: Photometric Redshifts and Population Types Found in the Stripe 82X Survey

    NASA Astrophysics Data System (ADS)

    Ananna, Tonima Tasnin; Salvato, Mara; LaMassa, Stephanie; Urry, C. Megan; Cappelluti, Nico; Cardamone, Carolin; Civano, Francesca; Farrah, Duncan; Gilfanov, Marat; Glikman, Eilat; Hamilton, Mark; Kirkpatrick, Allison; Lanzuisi, Giorgio; Marchesi, Stefano; Merloni, Andrea; Nandra, Kirpal; Natarajan, Priyamvada; Richards, Gordon T.; Timlin, John

    2017-11-01

    Multiwavelength surveys covering large sky volumes are necessary to obtain an accurate census of rare objects such as high-luminosity and/or high-redshift active galactic nuclei (AGNs). Stripe 82X is a 31.3 X-ray survey with Chandra and XMM-Newton observations overlapping the legacy Sloan Digital Sky Survey Stripe 82 field, which has a rich investment of multiwavelength coverage from the ultraviolet to the radio. The wide-area nature of this survey presents new challenges for photometric redshifts for AGNs compared to previous work on narrow-deep fields because it probes different populations of objects that need to be identified and represented in the library of templates. Here we present an updated X-ray plus multiwavelength matched catalog, including Spitzer counterparts, and estimated photometric redshifts for 5961 (96% of a total of 6181) X-ray sources that have a normalized median absolute deviation, σnmad=0.06, and an outlier fraction, η = 13.7%. The populations found in this survey and the template libraries used for photometric redshifts provide important guiding principles for upcoming large-area surveys such as eROSITA and 3XMM (in X-ray) and the Large Synoptic Survey Telescope (optical).

  14. Inferring Plasmodium vivax Transmission Networks from Tempo-Spatial Surveillance Data

    PubMed Central

    Shi, Benyun; Liu, Jiming; Zhou, Xiao-Nong; Yang, Guo-Jing

    2014-01-01

    Background The transmission networks of Plasmodium vivax characterize how the parasite transmits from one location to another, which are informative and insightful for public health policy makers to accurately predict the patterns of its geographical spread. However, such networks are not apparent from surveillance data because P. vivax transmission can be affected by many factors, such as the biological characteristics of mosquitoes and the mobility of human beings. Here, we pay special attention to the problem of how to infer the underlying transmission networks of P. vivax based on available tempo-spatial patterns of reported cases. Methodology We first define a spatial transmission model, which involves representing both the heterogeneous transmission potential of P. vivax at individual locations and the mobility of infected populations among different locations. Based on the proposed transmission model, we further introduce a recurrent neural network model to infer the transmission networks from surveillance data. Specifically, in this model, we take into account multiple real-world factors, including the length of P. vivax incubation period, the impact of malaria control at different locations, and the total number of imported cases. Principal Findings We implement our proposed models by focusing on the P. vivax transmission among 62 towns in Yunnan province, People's Republic China, which have been experiencing high malaria transmission in the past years. By conducting scenario analysis with respect to different numbers of imported cases, we can (i) infer the underlying P. vivax transmission networks, (ii) estimate the number of imported cases for each individual town, and (iii) quantify the roles of individual towns in the geographical spread of P. vivax. Conclusion The demonstrated models have presented a general means for inferring the underlying transmission networks from surveillance data. The inferred networks will offer new insights into how to

  15. Spontaneous Trait Inferences on Social Media.

    PubMed

    Levordashka, Ana; Utz, Sonja

    2017-01-01

    The present research investigates whether spontaneous trait inferences occur under conditions characteristic of social media and networking sites: nonextreme, ostensibly self-generated content, simultaneous presentation of multiple cues, and self-paced browsing. We used an established measure of trait inferences (false recognition paradigm) and a direct assessment of impressions. Without being asked to do so, participants spontaneously formed impressions of people whose status updates they saw. Our results suggest that trait inferences occurred from nonextreme self-generated content, which is commonly found in social media updates (Experiment 1) and when nine status updates from different people were presented in parallel (Experiment 2). Although inferences did occur during free browsing, the results suggest that participants did not necessarily associate the traits with the corresponding status update authors (Experiment 3). Overall, the findings suggest that spontaneous trait inferences occur on social media. We discuss implications for online communication and research on spontaneous trait inferences.

  16. Population genetic structure, genetic diversity, and natural history of the South American species of Nothofagus subgenus Lophozonia (Nothofagaceae) inferred from nuclear microsatellite data

    PubMed Central

    Vergara, Rodrigo; Gitzendanner, Matthew A; Soltis, Douglas E; Soltis, Pamela S

    2014-01-01

    The effect of glaciation on the levels and patterns of genetic variation has been well studied in the Northern Hemisphere. However, although glaciation has undoubtedly shaped the genetic structure of plants in the Southern Hemisphere, fewer studies have characterized the effect, and almost none of them using microsatellites. Particularly, complex patterns of genetic structure might be expected in areas such as the Andes, where both latitudinal and altitudinal glacial advance and retreat have molded modern plant communities. We therefore studied the population genetics of three closely related, hybridizing species of Nothofagus (N. obliqua, N. alpina, and N. glauca, all of subgenus Lophozonia; Nothofagaceae) from Chile. To estimate population genetic parameters and infer the influence of the last ice age on the spatial and genetic distribution of these species, we examined and analyzed genetic variability at seven polymorphic microsatellite DNA loci in 640 individuals from 40 populations covering most of the ranges of these species in Chile. Populations showed no significant inbreeding and exhibited relatively high levels of genetic diversity (HE = 0.502–0.662) and slight, but significant, genetic structure (RST = 8.7–16.0%). However, in N. obliqua, the small amount of genetic structure was spatially organized into three well-defined latitudinal groups. Our data may also suggest some introgression of N. alpina genes into N. obliqua in the northern populations. These results allowed us to reconstruct the influence of the last ice age on the genetic structure of these species, suggesting several centers of genetic diversity for N. obliqua and N. alpina, in agreement with the multiple refugia hypothesis. PMID:25360279

  17. cosmoabc: Likelihood-free inference for cosmology

    NASA Astrophysics Data System (ADS)

    Ishida, Emille E. O.; Vitenti, Sandro D. P.; Penna-Lima, Mariana; Trindade, Arlindo M.; Cisewski, Jessi; M.; de Souza, Rafael; Cameron, Ewan; Busti, Vinicius C.

    2015-05-01

    Approximate Bayesian Computation (ABC) enables parameter inference for complex physical systems in cases where the true likelihood function is unknown, unavailable, or computationally too expensive. It relies on the forward simulation of mock data and comparison between observed and synthetic catalogs. cosmoabc is a Python Approximate Bayesian Computation (ABC) sampler featuring a Population Monte Carlo variation of the original ABC algorithm, which uses an adaptive importance sampling scheme. The code can be coupled to an external simulator to allow incorporation of arbitrary distance and prior functions. When coupled with the numcosmo library, it has been used to estimate posterior probability distributions over cosmological parameters based on measurements of galaxy clusters number counts without computing the likelihood function.

  18. The inference from a single case: moral versus scientific inferences in implementing new biotechnologies.

    PubMed

    Hofmann, B

    2008-06-01

    Are there similarities between scientific and moral inference? This is the key question in this article. It takes as its point of departure an instance of one person's story in the media changing both Norwegian public opinion and a brand-new Norwegian law prohibiting the use of saviour siblings. The case appears to falsify existing norms and to establish new ones. The analysis of this case reveals similarities in the modes of inference in science and morals, inasmuch as (a) a single case functions as a counter-example to an existing rule; (b) there is a common presupposition of stability, similarity and order, which makes it possible to reason from a few cases to a general rule; and (c) this makes it possible to hold things together and retain order. In science, these modes of inference are referred to as falsification, induction and consistency. In morals, they have a variety of other names. Hence, even without abandoning the fact-value divide, there appear to be similarities between inference in science and inference in morals, which may encourage communication across the boundaries between "the two cultures" and which are relevant to medical humanities.

  19. Inferring Stop-Locations from WiFi.

    PubMed

    Wind, David Kofoed; Sapiezynski, Piotr; Furman, Magdalena Anna; Lehmann, Sune

    2016-01-01

    Human mobility patterns are inherently complex. In terms of understanding these patterns, the process of converting raw data into series of stop-locations and transitions is an important first step which greatly reduces the volume of data, thus simplifying the subsequent analyses. Previous research into the mobility of individuals has focused on inferring 'stop locations' (places of stationarity) from GPS or CDR data, or on detection of state (static/active). In this paper we bridge the gap between the two approaches: we introduce methods for detecting both mobility state and stop-locations. In addition, our methods are based exclusively on WiFi data. We study two months of WiFi data collected every two minutes by a smartphone, and infer stop-locations in the form of labelled time-intervals. For this purpose, we investigate two algorithms, both of which scale to large datasets: a greedy approach to select the most important routers and one which uses a density-based clustering algorithm to detect router fingerprints. We validate our results using participants' GPS data as well as ground truth data collected during a two month period.

  20. Inferring Stop-Locations from WiFi

    PubMed Central

    Wind, David Kofoed; Sapiezynski, Piotr; Furman, Magdalena Anna; Lehmann, Sune

    2016-01-01

    Human mobility patterns are inherently complex. In terms of understanding these patterns, the process of converting raw data into series of stop-locations and transitions is an important first step which greatly reduces the volume of data, thus simplifying the subsequent analyses. Previous research into the mobility of individuals has focused on inferring ‘stop locations’ (places of stationarity) from GPS or CDR data, or on detection of state (static/active). In this paper we bridge the gap between the two approaches: we introduce methods for detecting both mobility state and stop-locations. In addition, our methods are based exclusively on WiFi data. We study two months of WiFi data collected every two minutes by a smartphone, and infer stop-locations in the form of labelled time-intervals. For this purpose, we investigate two algorithms, both of which scale to large datasets: a greedy approach to select the most important routers and one which uses a density-based clustering algorithm to detect router fingerprints. We validate our results using participants’ GPS data as well as ground truth data collected during a two month period. PMID:26901663

  1. Management of fish populations in large rivers: a review of tools and approaches

    USGS Publications Warehouse

    Petts, Geoffrey E.; Imhoff, Jack G.; Manny, Bruce A.; Maher, John F. B.; Weisberg, Stephen B.

    1989-01-01

    In common with most branches of science, the management of riverine fish populations is characterised by reductionist and isolationist philosophies. Traditional fish management focuses on stocking and controls on fishing. This paper presents a concensus of scientists involved in the LARS workshop on the management of fish populations in large rivers. A move towards a more holistic philosophy is advocated, with fish management forming an integral part of sustainable river development. Based upon a questionnaire survey of LARS members, with wide-ranging expertise and experience from all parts of the world, lists of management tools currently in use are presented. Four categories of tools are described: flow, water-quality, habitat, and biological. The potential applications of tools for fish management in large rivers is discussed and research needs are identified. The lack of scientific evaluations of the different tools remains the major constraint to their wider application.

  2. Inferring responses to climate dynamics from historical demography in neotropical forest lizards

    PubMed Central

    Xue, Alexander T.; Brown, Jason L.; Alvarado-Serrano, Diego F.; Rodrigues, Miguel T.; Hickerson, Michael J.; Carnaval, Ana C.

    2016-01-01

    We apply a comparative framework to test for concerted demographic changes in response to climate shifts in the neotropical lowland forests, learning from the past to inform projections of the future. Using reduced genomic (SNP) data from three lizard species codistributed in Amazonia and the Atlantic Forest (Anolis punctatus, Anolis ortonii, and Polychrus marmoratus), we first reconstruct former population history and test for assemblage-level responses to cycles of moisture transport recently implicated in changes of forest distribution during the Late Quaternary. We find support for population shifts within the time frame of inferred precipitation fluctuations (the last 250,000 y) but detect idiosyncratic responses across species and uniformity of within-species responses across forest regions. These results are incongruent with expectations of concerted population expansion in response to increased rainfall and fail to detect out-of-phase demographic syndromes (expansions vs. contractions) across forest regions. Using reduced genomic data to infer species-specific demographical parameters, we then model the plausible spatial distribution of genetic diversity in the Atlantic Forest into future climates (2080) under a medium carbon emission trajectory. The models forecast very distinct trajectories for the lizard species, reflecting unique estimated population densities and dispersal abilities. Ecological and demographic constraints seemingly lead to distinct and asynchronous responses to climatic regimes in the tropics, even among similarly distributed taxa. Incorporating such constraints is key to improve modeling of the distribution of biodiversity in the past and future. PMID:27432951

  3. Inferring responses to climate dynamics from historical demography in neotropical forest lizards.

    PubMed

    Prates, Ivan; Xue, Alexander T; Brown, Jason L; Alvarado-Serrano, Diego F; Rodrigues, Miguel T; Hickerson, Michael J; Carnaval, Ana C

    2016-07-19

    We apply a comparative framework to test for concerted demographic changes in response to climate shifts in the neotropical lowland forests, learning from the past to inform projections of the future. Using reduced genomic (SNP) data from three lizard species codistributed in Amazonia and the Atlantic Forest (Anolis punctatus, Anolis ortonii, and Polychrus marmoratus), we first reconstruct former population history and test for assemblage-level responses to cycles of moisture transport recently implicated in changes of forest distribution during the Late Quaternary. We find support for population shifts within the time frame of inferred precipitation fluctuations (the last 250,000 y) but detect idiosyncratic responses across species and uniformity of within-species responses across forest regions. These results are incongruent with expectations of concerted population expansion in response to increased rainfall and fail to detect out-of-phase demographic syndromes (expansions vs. contractions) across forest regions. Using reduced genomic data to infer species-specific demographical parameters, we then model the plausible spatial distribution of genetic diversity in the Atlantic Forest into future climates (2080) under a medium carbon emission trajectory. The models forecast very distinct trajectories for the lizard species, reflecting unique estimated population densities and dispersal abilities. Ecological and demographic constraints seemingly lead to distinct and asynchronous responses to climatic regimes in the tropics, even among similarly distributed taxa. Incorporating such constraints is key to improve modeling of the distribution of biodiversity in the past and future.

  4. Sociodemographic characteristics of members of a large, integrated health care system: comparison with US Census Bureau data.

    PubMed

    Koebnick, Corinna; Langer-Gould, Annette M; Gould, Michael K; Chao, Chun R; Iyer, Rajan L; Smith, Ning; Chen, Wansu; Jacobsen, Steven J

    2012-01-01

    Data from the memberships of large, integrated health care systems can be valuable for clinical, epidemiologic, and health services research, but a potential selection bias may threaten the inference to the population of interest. We reviewed administrative records of members of Kaiser Permanente Southern California (KPSC) in 2000 and 2010, and we compared their sociodemographic characteristics with those of the underlying population in the coverage area on the basis of US Census Bureau data. We identified 3,328,579 KPSC members in 2000 and 3,357,959 KPSC members in 2010, representing approximately 16% of the population in the coverage area. The distribution of sex and age of KPSC members appeared to be similar to the census reference population in 2000 and 2010 except with a slightly higher proportion of 40 to 64 year olds. The proportion of Hispanics/Latinos was comparable between KPSC and the census reference population (37.5% vs 38.2%, respectively, in 2000 and 45.2% vs 43.3% in 2010). However, KPSC members included more blacks (14.9% vs 7.0% in 2000 and 10.8% vs 6.5% in 2010). Neighborhood educational levels and neighborhood household incomes were generally similar between KPSC members and the census reference population, but with a marginal underrepresentation of individuals with extremely low income and high education. The membership of KPSC reflects the socioeconomic diversity of the Southern California census population, suggesting that findings from this setting may provide valid inference for clinical, epidemiologic, and health services research.

  5. High prices for rare species can drive large populations extinct: the anthropogenic Allee effect revisited.

    PubMed

    Holden, Matthew H; McDonald-Madden, Eve

    2017-09-21

    Consumer demand for plant and animal products threatens many populations with extinction. The anthropogenic Allee effect (AAE) proposes that such extinctions can be caused by prices for wildlife products increasing with species rarity. This price-rarity relationship creates financial incentives to extract the last remaining individuals of a population, despite higher search and harvest costs. The AAE has become a standard approach for conceptualizing the threat of economic markets on endangered species. Despite its potential importance for conservation, AAE theory is based on a simple graphical model with limited analysis of possible population trajectories. By specifying a general class of functions for price-rarity relationships, we show that the classic theory can understate the risk of species extinction. AAE theory proposes that only populations below a critical Allee threshold will go extinct due to increasing price-rarity relationships. Our analysis shows that this threshold can be much higher than the original theory suggests, depending on initial harvest effort. More alarmingly, even species with population sizes above this Allee threshold, for which AAE predicts persistence, can be destined to extinction. Introducing even a minimum price for harvested individuals, close to zero, can cause large populations to cross the classic anthropogenic Allee threshold on a trajectory towards extinction. These results suggest that traditional AAE theory may give a false sense of security when managing large harvested populations. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Network inference using informative priors

    PubMed Central

    Mukherjee, Sach; Speed, Terence P.

    2008-01-01

    Recent years have seen much interest in the study of systems characterized by multiple interacting components. A class of statistical models called graphical models, in which graphs are used to represent probabilistic relationships between variables, provides a framework for formal inference regarding such systems. In many settings, the object of inference is the network structure itself. This problem of “network inference” is well known to be a challenging one. However, in scientific settings there is very often existing information regarding network connectivity. A natural idea then is to take account of such information during inference. This article addresses the question of incorporating prior information into network inference. We focus on directed models called Bayesian networks, and use Markov chain Monte Carlo to draw samples from posterior distributions over network structures. We introduce prior distributions on graphs capable of capturing information regarding network features including edges, classes of edges, degree distributions, and sparsity. We illustrate our approach in the context of systems biology, applying our methods to network inference in cancer signaling. PMID:18799736

  7. An empirical evaluation of two-stage species tree inference strategies using a multilocus dataset from North American pines

    Treesearch

    Michael DeGiorgio; John Syring; Andrew J. Eckert; Aaron Liston; Richard Cronn; David B. Neale; Noah A. Rosenberg

    2014-01-01

    Background: As it becomes increasingly possible to obtain DNA sequences of orthologous genes from diverse sets of taxa, species trees are frequently being inferred from multilocus data. However, the behavior of many methods for performing this inference has remained largely unexplored. Some methods have been proven to be consistent given certain evolutionary models,...

  8. Bayesian power spectrum inference with foreground and target contamination treatment

    NASA Astrophysics Data System (ADS)

    Jasche, J.; Lavaux, G.

    2017-10-01

    This work presents a joint and self-consistent Bayesian treatment of various foreground and target contaminations when inferring cosmological power spectra and three-dimensional density fields from galaxy redshift surveys. This is achieved by introducing additional block-sampling procedures for unknown coefficients of foreground and target contamination templates to the previously presented ARES framework for Bayesian large-scale structure analyses. As a result, the method infers jointly and fully self-consistently three-dimensional density fields, cosmological power spectra, luminosity-dependent galaxy biases, noise levels of the respective galaxy distributions, and coefficients for a set of a priori specified foreground templates. In addition, this fully Bayesian approach permits detailed quantification of correlated uncertainties amongst all inferred quantities and correctly marginalizes over observational systematic effects. We demonstrate the validity and efficiency of our approach in obtaining unbiased estimates of power spectra via applications to realistic mock galaxy observations that are subject to stellar contamination and dust extinction. While simultaneously accounting for galaxy biases and unknown noise levels, our method reliably and robustly infers three-dimensional density fields and corresponding cosmological power spectra from deep galaxy surveys. Furthermore, our approach correctly accounts for joint and correlated uncertainties between unknown coefficients of foreground templates and the amplitudes of the power spectrum. This effect amounts to correlations and anti-correlations of up to 10 per cent across wide ranges in Fourier space.

  9. Active inference, communication and hermeneutics☆

    PubMed Central

    Friston, Karl J.; Frith, Christopher D.

    2015-01-01

    Hermeneutics refers to interpretation and translation of text (typically ancient scriptures) but also applies to verbal and non-verbal communication. In a psychological setting it nicely frames the problem of inferring the intended content of a communication. In this paper, we offer a solution to the problem of neural hermeneutics based upon active inference. In active inference, action fulfils predictions about how we will behave (e.g., predicting we will speak). Crucially, these predictions can be used to predict both self and others – during speaking and listening respectively. Active inference mandates the suppression of prediction errors by updating an internal model that generates predictions – both at fast timescales (through perceptual inference) and slower timescales (through perceptual learning). If two agents adopt the same model, then – in principle – they can predict each other and minimise their mutual prediction errors. Heuristically, this ensures they are singing from the same hymn sheet. This paper builds upon recent work on active inference and communication to illustrate perceptual learning using simulated birdsongs. Our focus here is the neural hermeneutics implicit in learning, where communication facilitates long-term changes in generative models that are trying to predict each other. In other words, communication induces perceptual learning and enables others to (literally) change our minds and vice versa. PMID:25957007

  10. Active inference, communication and hermeneutics.

    PubMed

    Friston, Karl J; Frith, Christopher D

    2015-07-01

    Hermeneutics refers to interpretation and translation of text (typically ancient scriptures) but also applies to verbal and non-verbal communication. In a psychological setting it nicely frames the problem of inferring the intended content of a communication. In this paper, we offer a solution to the problem of neural hermeneutics based upon active inference. In active inference, action fulfils predictions about how we will behave (e.g., predicting we will speak). Crucially, these predictions can be used to predict both self and others--during speaking and listening respectively. Active inference mandates the suppression of prediction errors by updating an internal model that generates predictions--both at fast timescales (through perceptual inference) and slower timescales (through perceptual learning). If two agents adopt the same model, then--in principle--they can predict each other and minimise their mutual prediction errors. Heuristically, this ensures they are singing from the same hymn sheet. This paper builds upon recent work on active inference and communication to illustrate perceptual learning using simulated birdsongs. Our focus here is the neural hermeneutics implicit in learning, where communication facilitates long-term changes in generative models that are trying to predict each other. In other words, communication induces perceptual learning and enables others to (literally) change our minds and vice versa. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  11. Kernel learning at the first level of inference.

    PubMed

    Cawley, Gavin C; Talbot, Nicola L C

    2014-05-01

    Kernel learning methods, whether Bayesian or frequentist, typically involve multiple levels of inference, with the coefficients of the kernel expansion being determined at the first level and the kernel and regularisation parameters carefully tuned at the second level, a process known as model selection. Model selection for kernel machines is commonly performed via optimisation of a suitable model selection criterion, often based on cross-validation or theoretical performance bounds. However, if there are a large number of kernel parameters, as for instance in the case of automatic relevance determination (ARD), there is a substantial risk of over-fitting the model selection criterion, resulting in poor generalisation performance. In this paper we investigate the possibility of learning the kernel, for the Least-Squares Support Vector Machine (LS-SVM) classifier, at the first level of inference, i.e. parameter optimisation. The kernel parameters and the coefficients of the kernel expansion are jointly optimised at the first level of inference, minimising a training criterion with an additional regularisation term acting on the kernel parameters. The key advantage of this approach is that the values of only two regularisation parameters need be determined in model selection, substantially alleviating the problem of over-fitting the model selection criterion. The benefits of this approach are demonstrated using a suite of synthetic and real-world binary classification benchmark problems, where kernel learning at the first level of inference is shown to be statistically superior to the conventional approach, improves on our previous work (Cawley and Talbot, 2007) and is competitive with Multiple Kernel Learning approaches, but with reduced computational expense. Copyright © 2014 Elsevier Ltd. All rights reserved.

  12. Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization

    ERIC Educational Resources Information Center

    Gelman, Andrew; Lee, Daniel; Guo, Jiqiang

    2015-01-01

    Stan is a free and open-source C++ program that performs Bayesian inference or optimization for arbitrary user-specified models and can be called from the command line, R, Python, Matlab, or Julia and has great promise for fitting large and complex statistical models in many areas of application. We discuss Stan from users' and developers'…

  13. Social Dynamics Modeling and Inference

    DTIC Science & Technology

    2018-03-29

    AFRL-AFOSR-JP-TR-2018-0027 Social Dynamics Modeling and Inference Kwang-Cheng Chen NATIONAL TAIWAN UNIVERSITY Final Report 03/29/2018 DISTRIBUTION A...DATES COVERED (From - To)      14 May 2014 to 13 May 2017 4.  TITLE AND SUBTITLE Social Dynamics Modeling and Inference 5a.  CONTRACT NUMBER 5b.  GRANT...behavior in human society, to set up the foundation of future possible inference and even control of social collective behavior. Two primary

  14. Network inference from multimodal data: A review of approaches from infectious disease transmission.

    PubMed

    Ray, Bisakha; Ghedin, Elodie; Chunara, Rumi

    2016-12-01

    Networks inference problems are commonly found in multiple biomedical subfields such as genomics, metagenomics, neuroscience, and epidemiology. Networks are useful for representing a wide range of complex interactions ranging from those between molecular biomarkers, neurons, and microbial communities, to those found in human or animal populations. Recent technological advances have resulted in an increasing amount of healthcare data in multiple modalities, increasing the preponderance of network inference problems. Multi-domain data can now be used to improve the robustness and reliability of recovered networks from unimodal data. For infectious diseases in particular, there is a body of knowledge that has been focused on combining multiple pieces of linked information. Combining or analyzing disparate modalities in concert has demonstrated greater insight into disease transmission than could be obtained from any single modality in isolation. This has been particularly helpful in understanding incidence and transmission at early stages of infections that have pandemic potential. Novel pieces of linked information in the form of spatial, temporal, and other covariates including high-throughput sequence data, clinical visits, social network information, pharmaceutical prescriptions, and clinical symptoms (reported as free-text data) also encourage further investigation of these methods. The purpose of this review is to provide an in-depth analysis of multimodal infectious disease transmission network inference methods with a specific focus on Bayesian inference. We focus on analytical Bayesian inference-based methods as this enables recovering multiple parameters simultaneously, for example, not just the disease transmission network, but also parameters of epidemic dynamics. Our review studies their assumptions, key inference parameters and limitations, and ultimately provides insights about improving future network inference methods in multiple applications

  15. Spatial Inference for Distributed Remote Sensing Data

    NASA Astrophysics Data System (ADS)

    Braverman, A. J.; Katzfuss, M.; Nguyen, H.

    2014-12-01

    Remote sensing data are inherently spatial, and a substantial portion of their value for scientific analyses derives from the information they can provide about spatially dependent processes. Geophysical variables such as atmopsheric temperature, cloud properties, humidity, aerosols and carbon dioxide all exhibit spatial patterns, and satellite observations can help us learn about the physical mechanisms driving them. However, remote sensing observations are often noisy and incomplete, so inferring properties of true geophysical fields from them requires some care. These data can also be massive, which is both a blessing and a curse: using more data drives uncertainties down, but also drives costs up, particularly when data are stored on different computers or in different physical locations. In this talk I will discuss a methodology for spatial inference on massive, distributed data sets that does not require moving large volumes of data. The idea is based on a combination of ideas including modeling spatial covariance structures with low-rank covariance matrices, and distributed estimation in sensor or wireless networks.

  16. Drawing causal inferences using propensity scores: a practical guide for community psychologists.

    PubMed

    Lanza, Stephanie T; Moore, Julia E; Butera, Nicole M

    2013-12-01

    Confounding present in observational data impede community psychologists' ability to draw causal inferences. This paper describes propensity score methods as a conceptually straightforward approach to drawing causal inferences from observational data. A step-by-step demonstration of three propensity score methods-weighting, matching, and subclassification-is presented in the context of an empirical examination of the causal effect of preschool experiences (Head Start vs. parental care) on reading development in kindergarten. Although the unadjusted population estimate indicated that children with parental care had substantially higher reading scores than children who attended Head Start, all propensity score adjustments reduce the size of this overall causal effect by more than half. The causal effect was also defined and estimated among children who attended Head Start. Results provide no evidence for improved reading if those children had instead received parental care. We carefully define different causal effects and discuss their respective policy implications, summarize advantages and limitations of each propensity score method, and provide SAS and R syntax so that community psychologists may conduct causal inference in their own research.

  17. Drawing Causal Inferences Using Propensity Scores: A Practical Guide for Community Psychologists

    PubMed Central

    Lanza, Stephanie T.; Moore, Julia E.; Butera, Nicole M.

    2014-01-01

    Confounding present in observational data impede community psychologists’ ability to draw causal inferences. This paper describes propensity score methods as a conceptually straightforward approach to drawing causal inferences from observational data. A step-by-step demonstration of three propensity score methods – weighting, matching, and subclassification – is presented in the context of an empirical examination of the causal effect of preschool experiences (Head Start vs. parental care) on reading development in kindergarten. Although the unadjusted population estimate indicated that children with parental care had substantially higher reading scores than children who attended Head Start, all propensity score adjustments reduce the size of this overall causal effect by more than half. The causal effect was also defined and estimated among children who attended Head Start. Results provide no evidence for improved reading if those children had instead received parental care. We carefully define different causal effects and discuss their respective policy implications, summarize advantages and limitations of each propensity score method, and provide SAS and R syntax so that community psychologists may conduct causal inference in their own research. PMID:24185755

  18. MetaPIGA v2.0: maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm and other stochastic heuristics

    PubMed Central

    2010-01-01

    Background The development, in the last decade, of stochastic heuristics implemented in robust application softwares has made large phylogeny inference a key step in most comparative studies involving molecular sequences. Still, the choice of a phylogeny inference software is often dictated by a combination of parameters not related to the raw performance of the implemented algorithm(s) but rather by practical issues such as ergonomics and/or the availability of specific functionalities. Results Here, we present MetaPIGA v2.0, a robust implementation of several stochastic heuristics for large phylogeny inference (under maximum likelihood), including a Simulated Annealing algorithm, a classical Genetic Algorithm, and the Metapopulation Genetic Algorithm (metaGA) together with complex substitution models, discrete Gamma rate heterogeneity, and the possibility to partition data. MetaPIGA v2.0 also implements the Likelihood Ratio Test, the Akaike Information Criterion, and the Bayesian Information Criterion for automated selection of substitution models that best fit the data. Heuristics and substitution models are highly customizable through manual batch files and command line processing. However, MetaPIGA v2.0 also offers an extensive graphical user interface for parameters setting, generating and running batch files, following run progress, and manipulating result trees. MetaPIGA v2.0 uses standard formats for data sets and trees, is platform independent, runs in 32 and 64-bits systems, and takes advantage of multiprocessor and multicore computers. Conclusions The metaGA resolves the major problem inherent to classical Genetic Algorithms by maintaining high inter-population variation even under strong intra-population selection. Implementation of the metaGA together with additional stochastic heuristics into a single software will allow rigorous optimization of each heuristic as well as a meaningful comparison of performances among these algorithms. MetaPIGA v2

  19. CompareSVM: supervised, Support Vector Machine (SVM) inference of gene regularity networks.

    PubMed

    Gillani, Zeeshan; Akash, Muhammad Sajid Hamid; Rahaman, M D Matiur; Chen, Ming

    2014-11-30

    Predication of gene regularity network (GRN) from expression data is a challenging task. There are many methods that have been developed to address this challenge ranging from supervised to unsupervised methods. Most promising methods are based on support vector machine (SVM). There is a need for comprehensive analysis on prediction accuracy of supervised method SVM using different kernels on different biological experimental conditions and network size. We developed a tool (CompareSVM) based on SVM to compare different kernel methods for inference of GRN. Using CompareSVM, we investigated and evaluated different SVM kernel methods on simulated datasets of microarray of different sizes in detail. The results obtained from CompareSVM showed that accuracy of inference method depends upon the nature of experimental condition and size of the network. For network with nodes (<200) and average (over all sizes of networks), SVM Gaussian kernel outperform on knockout, knockdown, and multifactorial datasets compared to all the other inference methods. For network with large number of nodes (~500), choice of inference method depend upon nature of experimental condition. CompareSVM is available at http://bis.zju.edu.cn/CompareSVM/ .

  20. Using MOEA with Redistribution and Consensus Branches to Infer Phylogenies.

    PubMed

    Min, Xiaoping; Zhang, Mouzhao; Yuan, Sisi; Ge, Shengxiang; Liu, Xiangrong; Zeng, Xiangxiang; Xia, Ningshao

    2017-12-26

    In recent years, to infer phylogenies, which are NP-hard problems, more and more research has focused on using metaheuristics. Maximum Parsimony and Maximum Likelihood are two effective ways to conduct inference. Based on these methods, which can also be considered as the optimal criteria for phylogenies, various kinds of multi-objective metaheuristics have been used to reconstruct phylogenies. However, combining these two time-consuming methods results in those multi-objective metaheuristics being slower than a single objective. Therefore, we propose a novel, multi-objective optimization algorithm, MOEA-RC, to accelerate the processes of rebuilding phylogenies using structural information of elites in current populations. We compare MOEA-RC with two representative multi-objective algorithms, MOEA/D and NAGA-II, and a non-consensus version of MOEA-RC on three real-world datasets. The result is, within a given number of iterations, MOEA-RC achieves better solutions than the other algorithms.

  1. AD-LIBS: inferring ancestry across hybrid genomes using low-coverage sequence data.

    PubMed

    Schaefer, Nathan K; Shapiro, Beth; Green, Richard E

    2017-04-04

    Inferring the ancestry of each region of admixed individuals' genomes is useful in studies ranging from disease gene mapping to speciation genetics. Current methods require high-coverage genotype data and phased reference panels, and are therefore inappropriate for many data sets. We present a software application, AD-LIBS, that uses a hidden Markov model to infer ancestry across hybrid genomes without requiring variant calling or phasing. This approach is useful for non-model organisms and in cases of low-coverage data, such as ancient DNA. We demonstrate the utility of AD-LIBS with synthetic data. We then use AD-LIBS to infer ancestry in two published data sets: European human genomes with Neanderthal ancestry and brown bear genomes with polar bear ancestry. AD-LIBS correctly infers 87-91% of ancestry in simulations and produces ancestry maps that agree with published results and global ancestry estimates in humans. In brown bears, we find more polar bear ancestry than has been published previously, using both AD-LIBS and an existing software application for local ancestry inference, HAPMIX. We validate AD-LIBS polar bear ancestry maps by recovering a geographic signal within bears that mirrors what is seen in SNP data. Finally, we demonstrate that AD-LIBS is more effective than HAPMIX at inferring ancestry when preexisting phased reference data are unavailable and genomes are sequenced to low coverage. AD-LIBS is an effective tool for ancestry inference that can be used even when few individuals are available for comparison or when genomes are sequenced to low coverage. AD-LIBS is therefore likely to be useful in studies of non-model or ancient organisms that lack large amounts of genomic DNA. AD-LIBS can therefore expand the range of studies in which admixture mapping is a viable tool.

  2. Cultural effects on the association between election outcomes and face-based trait inferences

    PubMed Central

    Adolphs, Ralph; Alvarez, R. Michael

    2017-01-01

    How competent a politician looks, as assessed in the laboratory, is correlated with whether the politician wins in real elections. This finding has led many to investigate whether the association between candidate appearances and election outcomes transcends cultures. However, these studies have largely focused on European countries and Caucasian candidates. To the best of our knowledge, there are only four cross-cultural studies that have directly investigated how face-based trait inferences correlate with election outcomes across Caucasian and Asian cultures. These prior studies have provided some initial evidence regarding cultural differences, but methodological problems and inconsistent findings have complicated our understanding of how culture mediates the effects of candidate appearances on election outcomes. Additionally, these four past studies have focused on positive traits, with a relative neglect of negative traits, resulting in an incomplete picture of how culture may impact a broader range of trait inferences. To study Caucasian-Asian cultural effects with a more balanced experimental design, and to explore a more complete profile of traits, here we compared how Caucasian and Korean participants’ inferences of positive and negative traits correlated with U.S. and Korean election outcomes. Contrary to previous reports, we found that inferences of competence (made by participants from both cultures) correlated with both U.S. and Korean election outcomes. Inferences of open-mindedness and threat, two traits neglected in previous cross-cultural studies, were correlated with Korean but not U.S. election outcomes. This differential effect was found in trait judgments made by both Caucasian and Korean participants. Interestingly, the faster the participants made face-based trait inferences, the more strongly those inferences were correlated with real election outcomes. These findings provide new insights into cultural effects and the difficult question of

  3. Cultural effects on the association between election outcomes and face-based trait inferences.

    PubMed

    Lin, Chujun; Adolphs, Ralph; Alvarez, R Michael

    2017-01-01

    How competent a politician looks, as assessed in the laboratory, is correlated with whether the politician wins in real elections. This finding has led many to investigate whether the association between candidate appearances and election outcomes transcends cultures. However, these studies have largely focused on European countries and Caucasian candidates. To the best of our knowledge, there are only four cross-cultural studies that have directly investigated how face-based trait inferences correlate with election outcomes across Caucasian and Asian cultures. These prior studies have provided some initial evidence regarding cultural differences, but methodological problems and inconsistent findings have complicated our understanding of how culture mediates the effects of candidate appearances on election outcomes. Additionally, these four past studies have focused on positive traits, with a relative neglect of negative traits, resulting in an incomplete picture of how culture may impact a broader range of trait inferences. To study Caucasian-Asian cultural effects with a more balanced experimental design, and to explore a more complete profile of traits, here we compared how Caucasian and Korean participants' inferences of positive and negative traits correlated with U.S. and Korean election outcomes. Contrary to previous reports, we found that inferences of competence (made by participants from both cultures) correlated with both U.S. and Korean election outcomes. Inferences of open-mindedness and threat, two traits neglected in previous cross-cultural studies, were correlated with Korean but not U.S. election outcomes. This differential effect was found in trait judgments made by both Caucasian and Korean participants. Interestingly, the faster the participants made face-based trait inferences, the more strongly those inferences were correlated with real election outcomes. These findings provide new insights into cultural effects and the difficult question of

  4. Children's and Adults' Judgments of the Certainty of Deductive Inferences, Inductive Inferences, and Guesses

    ERIC Educational Resources Information Center

    Pillow, Bradford H.; Pearson, RaeAnne M.; Hecht, Mary; Bremer, Amanda

    2010-01-01

    Children and adults rated their own certainty following inductive inferences, deductive inferences, and guesses. Beginning in kindergarten, participants rated deductions as more certain than weak inductions or guesses. Deductions were rated as more certain than strong inductions beginning in Grade 3, and fourth-grade children and adults…

  5. It's a Girl! Random Numbers, Simulations, and the Law of Large Numbers

    ERIC Educational Resources Information Center

    Goodwin, Chris; Ortiz, Enrique

    2015-01-01

    Modeling using mathematics and making inferences about mathematical situations are becoming more prevalent in most fields of study. Descriptive statistics cannot be used to generalize about a population or make predictions of what can occur. Instead, inference must be used. Simulation and sampling are essential in building a foundation for…

  6. Inferences about population dynamics from count data using multistate models: a comparison to capture–recapture approaches

    PubMed Central

    Zipkin, Elise F; Sillett, T Scott; Grant, Evan H Campbell; Chandler, Richard B; Royle, J Andrew

    2014-01-01

    intensive data collection efforts (such as capture–recapture). Integrated population models that combine data from both intensive and extensive sources are likely to be the most efficient approach for estimating demographic rates at large spatial and temporal scales. PMID:24634726

  7. Metapopulation models for historical inference.

    PubMed

    Wakeley, John

    2004-04-01

    The genealogical process for a sample from a metapopulation, in which local populations are connected by migration and can undergo extinction and subsequent recolonization, is shown to have a relatively simple structure in the limit as the number of populations in the metapopulation approaches infinity. The result, which is an approximation to the ancestral behaviour of samples from a metapopulation with a large number of populations, is the same as that previously described for other metapopulation models, namely that the genealogical process is closely related to Kingman's unstructured coalescent. The present work considers a more general class of models that includes two kinds of extinction and recolonization, and the possibility that gamete production precedes extinction. In addition, following other recent work, this result for a metapopulation divided into many populations is shown to hold both for finite population sizes and in the usual diffusion limit, which assumes that population sizes are large. Examples illustrate when the usual diffusion limit is appropriate and when it is not. Some shortcomings and extensions of the model are considered, and the relevance of such models to understanding human history is discussed.

  8. Safe Upper-Bounds Inference of Energy Consumption for Java Bytecode Applications

    NASA Technical Reports Server (NTRS)

    Navas, Jorge; Mendez-Lojo, Mario; Hermenegildo, Manuel V.

    2008-01-01

    Many space applications such as sensor networks, on-board satellite-based platforms, on-board vehicle monitoring systems, etc. handle large amounts of data and analysis of such data is often critical for the scientific mission. Transmitting such large amounts of data to the remote control station for analysis is usually too expensive for time-critical applications. Instead, modern space applications are increasingly relying on autonomous on-board data analysis. All these applications face many resource constraints. A key requirement is to minimize energy consumption. Several approaches have been developed for estimating the energy consumption of such applications (e.g. [3, 1]) based on measuring actual consumption at run-time for large sets of random inputs. However, this approach has the limitation that it is in general not possible to cover all possible inputs. Using formal techniques offers the potential for inferring safe energy consumption bounds, thus being specially interesting for space exploration and safety-critical systems. We have proposed and implemented a general frame- work for resource usage analysis of Java bytecode [2]. The user defines a set of resource(s) of interest to be tracked and some annotations that describe the cost of some elementary elements of the program for those resources. These values can be constants or, more generally, functions of the input data sizes. The analysis then statically derives an upper bound on the amount of those resources that the program as a whole will consume or provide, also as functions of the input data sizes. This article develops a novel application of the analysis of [2] to inferring safe upper bounds on the energy consumption of Java bytecode applications. We first use a resource model that describes the cost of each bytecode instruction in terms of the joules it consumes. With this resource model, we then generate energy consumption cost relations, which are then used to infer safe upper bounds. How

  9. Inference of gene regulatory networks from time series by Tsallis entropy

    PubMed Central

    2011-01-01

    Background The inference of gene regulatory networks (GRNs) from large-scale expression profiles is one of the most challenging problems of Systems Biology nowadays. Many techniques and models have been proposed for this task. However, it is not generally possible to recover the original topology with great accuracy, mainly due to the short time series data in face of the high complexity of the networks and the intrinsic noise of the expression measurements. In order to improve the accuracy of GRNs inference methods based on entropy (mutual information), a new criterion function is here proposed. Results In this paper we introduce the use of generalized entropy proposed by Tsallis, for the inference of GRNs from time series expression profiles. The inference process is based on a feature selection approach and the conditional entropy is applied as criterion function. In order to assess the proposed methodology, the algorithm is applied to recover the network topology from temporal expressions generated by an artificial gene network (AGN) model as well as from the DREAM challenge. The adopted AGN is based on theoretical models of complex networks and its gene transference function is obtained from random drawing on the set of possible Boolean functions, thus creating its dynamics. On the other hand, DREAM time series data presents variation of network size and its topologies are based on real networks. The dynamics are generated by continuous differential equations with noise and perturbation. By adopting both data sources, it is possible to estimate the average quality of the inference with respect to different network topologies, transfer functions and network sizes. Conclusions A remarkable improvement of accuracy was observed in the experimental results by reducing the number of false connections in the inferred topology by the non-Shannon entropy. The obtained best free parameter of the Tsallis entropy was on average in the range 2.5 ≤ q ≤ 3.5 (hence

  10. Boosting Bayesian parameter inference of stochastic differential equation models with methods from statistical physics

    NASA Astrophysics Data System (ADS)

    Albert, Carlo; Ulzega, Simone; Stoop, Ruedi

    2016-04-01

    Measured time-series of both precipitation and runoff are known to exhibit highly non-trivial statistical properties. For making reliable probabilistic predictions in hydrology, it is therefore desirable to have stochastic models with output distributions that share these properties. When parameters of such models have to be inferred from data, we also need to quantify the associated parametric uncertainty. For non-trivial stochastic models, however, this latter step is typically very demanding, both conceptually and numerically, and always never done in hydrology. Here, we demonstrate that methods developed in statistical physics make a large class of stochastic differential equation (SDE) models amenable to a full-fledged Bayesian parameter inference. For concreteness we demonstrate these methods by means of a simple yet non-trivial toy SDE model. We consider a natural catchment that can be described by a linear reservoir, at the scale of observation. All the neglected processes are assumed to happen at much shorter time-scales and are therefore modeled with a Gaussian white noise term, the standard deviation of which is assumed to scale linearly with the system state (water volume in the catchment). Even for constant input, the outputs of this simple non-linear SDE model show a wealth of desirable statistical properties, such as fat-tailed distributions and long-range correlations. Standard algorithms for Bayesian inference fail, for models of this kind, because their likelihood functions are extremely high-dimensional intractable integrals over all possible model realizations. The use of Kalman filters is illegitimate due to the non-linearity of the model. Particle filters could be used but become increasingly inefficient with growing number of data points. Hamiltonian Monte Carlo algorithms allow us to translate this inference problem to the problem of simulating the dynamics of a statistical mechanics system and give us access to most sophisticated methods

  11. Temporal dynamics and impact of event interactions in cyber-social populations

    NASA Astrophysics Data System (ADS)

    Zhang, Yi-Qing; Li, Xiang

    2013-03-01

    The advance of information technologies provides powerful measures to digitize social interactions and facilitate quantitative investigations. To explore large-scale indoor interactions of a social population, we analyze 18 715 users' Wi-Fi access logs recorded in a Chinese university campus during 3 months, and define event interaction (EI) to characterize the concurrent interactions of multiple users inferred by their geographic coincidences—co-locating in the same small region at the same time. We propose three rules to construct a transmission graph, which depicts the topological and temporal features of event interactions. The vertex dynamics of transmission graph tells that the active durations of EIs fall into the truncated power-law distributions, which is independent on the number of involved individuals. The edge dynamics of transmission graph reports that the transmission durations present a truncated power-law pattern independent on the daily and weekly periodicities. Besides, in the aggregated transmission graph, low-degree vertices previously neglected in the aggregated static networks may participate in the large-degree EIs, which is verified by three data sets covering different sizes of social populations with various rendezvouses. This work highlights the temporal significance of event interactions in cyber-social populations.

  12. Inferring species divergence times using pairwise sequential Markovian coalescent modelling and low-coverage genomic data.

    PubMed

    Cahill, James A; Soares, André E R; Green, Richard E; Shapiro, Beth

    2016-07-19

    Understanding when species diverged aids in identifying the drivers of speciation, but the end of gene flow between populations can be difficult to ascertain from genetic data. We explore the use of pairwise sequential Markovian coalescent (PSMC) modelling to infer the timing of divergence between species and populations. PSMC plots generated using artificial hybrid genomes show rapid increases in effective population size at the time when the two parent lineages diverge, and this approach has been used previously to infer divergence between human lineages. We show that, even without high coverage or phased input data, PSMC can detect the end of significant gene flow between populations by comparing the PSMC output from artificial hybrids to the output of simulations with known demographic histories. We then apply PSMC to detect divergence times among lineages within two real datasets: great apes and bears within the genus Ursus Our results confirm most previously proposed divergence times for these lineages, and suggest that gene flow between recently diverged lineages may have been common among bears and great apes, including up to one million years of continued gene flow between chimpanzees and bonobos after the formation of the Congo River.This article is part of the themed issue 'Dating species divergences using rocks and clocks'. © 2016 The Author(s).

  13. Explosive genetic evidence for explosive human population growth

    PubMed Central

    Gao, Feng; Keinan, Alon

    2016-01-01

    The advent of next-generation sequencing technology has allowed the collection of vast amounts of genetic variation data. A recurring discovery from studying larger and larger samples of individuals had been the extreme, previously unexpected, excess of very rare genetic variants, which has been shown to be mostly due to the recent explosive growth of human populations. Here, we review recent literature that inferred recent changes in population size in different human populations and with different methodologies, with many pointing to recent explosive growth, especially in European populations for which more data has been available. We also review the state-of-the-art methods and software for the inference of historical population size changes that lead to these discoveries. Finally, we discuss the implications of recent population growth on personalized genomics, on purifying selection in the non-equilibrium state it entails and, as a consequence, on the genetic architecture underlying complex disease and the performance of mapping methods in discovering rare variants that contribute to complex disease risk. PMID:27710906

  14. Genetic diversity and population structure inferred from the partially duplicated genome of domesticated carp, Cyprinus carpio L.

    PubMed

    David, Lior; Rosenberg, Noah A; Lavi, Uri; Feldman, Marcus W; Hillel, Jossi

    2007-01-01

    Genetic relationships among eight populations of domesticated carp (Cyprinus carpio L.), a species with a partially duplicated genome, were studied using 12 microsatellites and 505 AFLP bands. The populations included three aquacultured carp strains and five ornamental carp (koi) variants. Grass carp (Ctenopharyngodon idella) was used as an outgroup. AFLP-based gene diversity varied from 5% (grass carp) to 32% (koi) and reflected the reasonably well understood histories and breeding practices of the populations. A large fraction of the molecular variance was due to differences between aquacultured and ornamental carps. Further analyses based on microsatellite data, including cluster analysis and neighbor-joining trees, supported the genetic distinctiveness of aquacultured and ornamental carps, despite the recent divergence of the two groups. In contrast to what was observed for AFLP-based diversity, the frequency of heterozygotes based on microsatellites was comparable among all populations. This discrepancy can potentially be explained by duplication of some loci in Cyprinus carpio L., and a model that shows how duplication can increase heterozygosity estimates for microsatellites but not for AFLP loci is discussed. Our analyses in carp can help in understanding the consequences of genotyping duplicated loci and in interpreting discrepancies between dominant and co-dominant markers in species with recent genome duplication.

  15. Genetic diversity and population structure inferred from the partially duplicated genome of domesticated carp, Cyprinus carpio L.

    PubMed Central

    David, Lior; Rosenberg, Noah A; Lavi, Uri; Feldman, Marcus W; Hillel, Jossi

    2007-01-01

    Genetic relationships among eight populations of domesticated carp (Cyprinus carpio L.), a species with a partially duplicated genome, were studied using 12 microsatellites and 505 AFLP bands. The populations included three aquacultured carp strains and five ornamental carp (koi) variants. Grass carp (Ctenopharyngodon idella) was used as an outgroup. AFLP-based gene diversity varied from 5% (grass carp) to 32% (koi) and reflected the reasonably well understood histories and breeding practices of the populations. A large fraction of the molecular variance was due to differences between aquacultured and ornamental carps. Further analyses based on microsatellite data, including cluster analysis and neighbor-joining trees, supported the genetic distinctiveness of aquacultured and ornamental carps, despite the recent divergence of the two groups. In contrast to what was observed for AFLP-based diversity, the frequency of heterozygotes based on microsatellites was comparable among all populations. This discrepancy can potentially be explained by duplication of some loci in Cyprinus carpio L., and a model that shows how duplication can increase heterozygosity estimates for microsatellites but not for AFLP loci is discussed. Our analyses in carp can help in understanding the consequences of genotyping duplicated loci and in interpreting discrepancies between dominant and co-dominant markers in species with recent genome duplication. PMID:17433244

  16. When is an image a health claim? A false-recollection method to detect implicit inferences about products' health benefits.

    PubMed

    Klepacz, Naomi A; Nash, Robert A; Egan, M Bernadette; Hodgkins, Charo E; Raats, Monique M

    2016-08-01

    Images on food and dietary supplement packaging might lead people to infer (appropriately or inappropriately) certain health benefits of those products. Research on this issue largely involves direct questions, which could (a) elicit inferences that would not be made unprompted, and (b) fail to capture inferences made implicitly. Using a novel memory-based method, in the present research, we explored whether packaging imagery elicits health inferences without prompting, and the extent to which these inferences are made implicitly. In 3 experiments, participants saw fictional product packages accompanied by written claims. Some packages contained an image that implied a health-related function (e.g., a brain), and some contained no image. Participants studied these packages and claims, and subsequently their memory for seen and unseen claims were tested. When a health image was featured on a package, participants often subsequently recognized health claims that-despite being implied by the image-were not truly presented. In Experiment 2, these recognition errors persisted despite an explicit warning against treating the images as informative. In Experiment 3, these findings were replicated in a large consumer sample from 5 European countries, and with a cued-recall test. These findings confirm that images can act as health claims, by leading people to infer health benefits without prompting. These inferences appear often to be implicit, and could therefore be highly pervasive. The data underscore the importance of regulating imagery on product packaging; memory-based methods represent innovative ways to measure how leading (or misleading) specific images can be. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  17. Generating Models of Infinite-State Communication Protocols Using Regular Inference with Abstraction

    NASA Astrophysics Data System (ADS)

    Aarts, Fides; Jonsson, Bengt; Uijen, Johan

    In order to facilitate model-based verification and validation, effort is underway to develop techniques for generating models of communication system components from observations of their external behavior. Most previous such work has employed regular inference techniques which generate modest-size finite-state models. They typically suppress parameters of messages, although these have a significant impact on control flow in many communication protocols. We present a framework, which adapts regular inference to include data parameters in messages and states for generating components with large or infinite message alphabets. A main idea is to adapt the framework of predicate abstraction, successfully used in formal verification. Since we are in a black-box setting, the abstraction must be supplied externally, using information about how the component manages data parameters. We have implemented our techniques by connecting the LearnLib tool for regular inference with the protocol simulator ns-2, and generated a model of the SIP component as implemented in ns-2.

  18. Forward and backward inference in spatial cognition.

    PubMed

    Penny, Will D; Zeidman, Peter; Burgess, Neil

    2013-01-01

    This paper shows that the various computations underlying spatial cognition can be implemented using statistical inference in a single probabilistic model. Inference is implemented using a common set of 'lower-level' computations involving forward and backward inference over time. For example, to estimate where you are in a known environment, forward inference is used to optimally combine location estimates from path integration with those from sensory input. To decide which way to turn to reach a goal, forward inference is used to compute the likelihood of reaching that goal under each option. To work out which environment you are in, forward inference is used to compute the likelihood of sensory observations under the different hypotheses. For reaching sensory goals that require a chaining together of decisions, forward inference can be used to compute a state trajectory that will lead to that goal, and backward inference to refine the route and estimate control signals that produce the required trajectory. We propose that these computations are reflected in recent findings of pattern replay in the mammalian brain. Specifically, that theta sequences reflect decision making, theta flickering reflects model selection, and remote replay reflects route and motor planning. We also propose a mapping of the above computational processes onto lateral and medial entorhinal cortex and hippocampus.

  19. Galactic dual population models of gamma-ray bursts

    NASA Technical Reports Server (NTRS)

    Higdon, J. C.; Lingenfelter, R. E.

    1994-01-01

    We investigate in more detail the properties of two-population models for gamma-ray bursts in the galactic disk and halo. We calculate the gamma-ray burst statistical properties, mean value of (V/V(sub max)), mean value of cos Theta, and mean value of (sin(exp 2) b), as functions of the detection flux threshold for bursts coming from both Galactic disk and massive halo populations. We consider halo models inferred from the observational constraints on the large-scale Galactic structure and we compare the expected values of mean value of (V/V(sub max)), mean value of cos Theta, and mean value of (sin(exp 2) b), with those measured by Burst and Transient Source Experiment (BATSE) and other detectors. We find that the measured values are consistent with solely Galactic populations having a range of halo distributions, mixed with local disk distributions, which can account for as much as approximately 25% of the observed BATSE bursts. M31 does not contribute to these modeled bursts. We also demonstrate, contrary to recent arguments, that the size-frequency distributions of dual population models are quite consistent with the BATSE observations.

  20. Inference of Gorilla Demographic and Selective History from Whole-Genome Sequence Data

    PubMed Central

    McManus, Kimberly F.; Kelley, Joanna L.; Song, Shiya; Veeramah, Krishna R.; Woerner, August E.; Stevison, Laurie S.; Ryder, Oliver A.; Ape Genome Project, Great; Kidd, Jeffrey M.; Wall, Jeffrey D.; Bustamante, Carlos D.; Hammer, Michael F.

    2015-01-01

    Although population-level genomic sequence data have been gathered extensively for humans, similar data from our closest living relatives are just beginning to emerge. Examination of genomic variation within great apes offers many opportunities to increase our understanding of the forces that have differentially shaped the evolutionary history of hominid taxa. Here, we expand upon the work of the Great Ape Genome Project by analyzing medium to high coverage whole-genome sequences from 14 western lowland gorillas (Gorilla gorilla gorilla), 2 eastern lowland gorillas (G. beringei graueri), and a single Cross River individual (G. gorilla diehli). We infer that the ancestors of western and eastern lowland gorillas diverged from a common ancestor approximately 261 ka, and that the ancestors of the Cross River population diverged from the western lowland gorilla lineage approximately 68 ka. Using a diffusion approximation approach to model the genome-wide site frequency spectrum, we infer a history of western lowland gorillas that includes an ancestral population expansion of 1.4-fold around 970 ka and a recent 5.6-fold contraction in population size 23 ka. The latter may correspond to a major reduction in African equatorial forests around the Last Glacial Maximum. We also analyze patterns of variation among western lowland gorillas to identify several genomic regions with strong signatures of recent selective sweeps. We find that processes related to taste, pancreatic and saliva secretion, sodium ion transmembrane transport, and cardiac muscle function are overrepresented in genomic regions predicted to have experienced recent positive selection. PMID:25534031

  1. PedNavigator: a pedigree drawing servlet for large and inbred populations.

    PubMed

    Mancosu, Gianmaria; Ledda, Giuseppe; Melis, Paola M

    2003-03-22

    PedNavigator is a pedigree drawing application for large and complex pedigrees. It has been developed especially for genetic and epidemiological studies of isolated populations characterized by high inbreeding and multiple matrimonies. PedNavigator is written in Java and is intended as a server-side web application, allowing researchers to 'walk' through family ties by point-and-clicking on person's symbols. The application is able to enrich the pedigree drawings with genotypic and phenotypic information taken from the underlying relational database.

  2. When can the cause of a population decline be determined?

    USGS Publications Warehouse

    Hefley, Trevor J.; Hooten, Mevin B.; Drake, John M.; Russell, Robin E.; Walsh, Daniel P.

    2016-01-01

    Inferring the factors responsible for declines in abundance is a prerequisite to preventing the extinction of wild populations. Many of the policies and programmes intended to prevent extinctions operate on the assumption that the factors driving the decline of a population can be determined. Exogenous factors that cause declines in abundance can be statistically confounded with endogenous factors such as density dependence. To demonstrate the potential for confounding, we used an experiment where replicated populations were driven to extinction by gradually manipulating habitat quality. In many of the replicated populations, habitat quality and density dependence were confounded, which obscured causal inference. Our results show that confounding is likely to occur when the exogenous factors that are driving the decline change gradually over time. Our study has direct implications for wild populations, because many factors that could drive a population to extinction change gradually through time.

  3. Method of fuzzy inference for one class of MISO-structure systems with non-singleton inputs

    NASA Astrophysics Data System (ADS)

    Sinuk, V. G.; Panchenko, M. V.

    2018-03-01

    In fuzzy modeling, the inputs of the simulated systems can receive both crisp values and non-Singleton. Computational complexity of fuzzy inference with fuzzy non-Singleton inputs corresponds to an exponential. This paper describes a new method of inference, based on the theorem of decomposition of a multidimensional fuzzy implication and a fuzzy truth value. This method is considered for fuzzy inputs and has a polynomial complexity, which makes it possible to use it for modeling large-dimensional MISO-structure systems.

  4. Combining phylogenetic and demographic inferences to assess the origin of the genetic diversity in an isolated wolf population

    PubMed Central

    Fabbri, Elena; Ahmed, Atidje; Bolfíková, Barbora Černá; Czarnomska, Sylwia D.; Galov, Ana; Godinho, Raquel; Hindrikson, Maris; Hulva, Pavel; Jędrzejewska, Bogumiła; Jelenčič, Maja; Kutal, Miroslav; Saarma, Urmas; Skrbinšek, Tomaž; Randi, Ettore

    2017-01-01

    The survival of isolated small populations is threatened by both demographic and genetic factors. Large carnivores declined for centuries in most of Europe due to habitat changes, overhunting of their natural prey and direct persecution. However, the current rewilding trends are driving many carnivore populations to expand again, possibly reverting the erosion of their genetic diversity. In this study we reassessed the extent and origin of the genetic variation of the Italian wolf population, which is expanding after centuries of decline and isolation. We genotyped wolves from Italy and other nine populations at four mtDNA regions (control-region, ATP6, COIII and ND4) and 39 autosomal microsatellites. Results of phylogenetic analyses and assignment procedures confirmed in the Italian wolves a second private mtDNA haplotype, which belongs to a haplogroup distributed mostly in southern Europe. Coalescent analyses showed that the unique mtDNA haplotypes in the Italian wolves likely originated during the late Pleistocene. ABC simulations concordantly showed that the extant wolf populations in Italy and in south-western Europe started to be isolated and declined right after the last glacial maximum. Thus, the standing genetic variation in the Italian wolves principally results from the historical isolation south of the Alps. PMID:28489863

  5. Population Explosion in the Yellow-Spined Bamboo Locust Ceracris kiangsu and Inferences for the Impact of Human Activity

    PubMed Central

    Fan, Zhou; Jiang, Guo-Fang; Liu, Yu-Xiang; He, Qi-Xin; Blanchard, Benjamin

    2014-01-01

    Geographic distance and geographical barriers likely play a considerable role in structuring genetic variation in species, although some migratory species may have less phylogeographic structure on a smaller spatial scale. Here, genetic diversity and the phylogenetic structure among geographical populations of the yellow-spined bamboo locust, Ceracris kiangsu, were examined with 16S rDNA and amplified fragment length polymorphisms (AFLPs). In this study, no conspicuous phylogeographical structure was discovered from either Maximum parsimony (MP) and Neighbor-joining (NJ) phylogenetic analyses. The effect of geographical isolation was not conspicuous on a large spatial scale.At smaller spatial scales local diversity of some populations within mountainous areas were detected using Nei's genetic distance and AMOVA. There is a high level of genetic diversity and a low genetic differentiation among populations in the C. kiangsu of South and Southeast China. Our analyses indicate that C. kiangsu is a monophyletic group. Our results also support the hypothesis that the C. kiangsu population is in a primary differentiation stage. Given the mismatch distribution, it is likely that a population expansion in C. kiangsu occurred about 0.242 Ma during the Quaternary interglaciation. Based on historical reports, we conjecture that human activities had significant impacts on the C. kiangsu gene flow. PMID:24603526

  6. Feature Inference Learning and Eyetracking

    ERIC Educational Resources Information Center

    Rehder, Bob; Colner, Robert M.; Hoffman, Aaron B.

    2009-01-01

    Besides traditional supervised classification learning, people can learn categories by inferring the missing features of category members. It has been proposed that feature inference learning promotes learning a category's internal structure (e.g., its typical features and interfeature correlations) whereas classification promotes the learning of…

  7. Structural drift: the population dynamics of sequential learning.

    PubMed

    Crutchfield, James P; Whalen, Sean

    2012-01-01

    We introduce a theory of sequential causal inference in which learners in a chain estimate a structural model from their upstream "teacher" and then pass samples from the model to their downstream "student". It extends the population dynamics of genetic drift, recasting Kimura's selectively neutral theory as a special case of a generalized drift process using structured populations with memory. We examine the diffusion and fixation properties of several drift processes and propose applications to learning, inference, and evolution. We also demonstrate how the organization of drift process space controls fidelity, facilitates innovations, and leads to information loss in sequential learning with and without memory.

  8. Inferring social ties from geographic coincidences.

    PubMed

    Crandall, David J; Backstrom, Lars; Cosley, Dan; Suri, Siddharth; Huttenlocher, Daniel; Kleinberg, Jon

    2010-12-28

    We investigate the extent to which social ties between people can be inferred from co-occurrence in time and space: Given that two people have been in approximately the same geographic locale at approximately the same time, on multiple occasions, how likely are they to know each other? Furthermore, how does this likelihood depend on the spatial and temporal proximity of the co-occurrences? Such issues arise in data originating in both online and offline domains as well as settings that capture interfaces between online and offline behavior. Here we develop a framework for quantifying the answers to such questions, and we apply this framework to publicly available data from a social media site, finding that even a very small number of co-occurrences can result in a high empirical likelihood of a social tie. We then present probabilistic models showing how such large probabilities can arise from a natural model of proximity and co-occurrence in the presence of social ties. In addition to providing a method for establishing some of the first quantifiable estimates of these measures, our findings have potential privacy implications, particularly for the ways in which social structures can be inferred from public online records that capture individuals' physical locations over time.

  9. Spatio-temporal population structuring and genetic diversity retention in depleted Atlantic Bluefin tuna of the Mediterranean Sea

    PubMed Central

    Riccioni, Giulia; Landi, Monica; Ferrara, Giorgia; Milano, Ilaria; Cariani, Alessia; Zane, Lorenzo; Sella, Massimo; Barbujani, Guido; Tinti, Fausto

    2010-01-01

    Fishery genetics have greatly changed our understanding of population dynamics and structuring in marine fish. In this study, we show that the Atlantic Bluefin tuna (ABFT, Thunnus thynnus), an oceanic predatory species exhibiting highly migratory behavior, large population size, and high potential for dispersal during early life stages, displays significant genetic differences over space and time, both at the fine and large scales of variation. We compared microsatellite variation of contemporary (n = 256) and historical (n = 99) biological samples of ABFTs of the central-western Mediterranean Sea, the latter dating back to the early 20th century. Measures of genetic differentiation and a general heterozygote deficit suggest that differences exist among population samples, both now and 96–80 years ago. Thus, ABFTs do not represent a single panmictic population in the Mediterranean Sea. Statistics designed to infer changes in population size, both from current and past genetic variation, suggest that some Mediterranean ABFT populations, although still not severely reduced in their genetic potential, might have suffered from demographic declines. The short-term estimates of effective population size are straddled on the minimum threshold (effective population size = 500) indicated to maintain genetic diversity and evolutionary potential across several generations in natural populations. PMID:20080643

  10. Strong ground motion inferred from liquefaction caused by the 1811-1812 New Madrid, Missouri, earthquakes

    USGS Publications Warehouse

    Holzer, Thomas L.; Noce, Thomas E.; Bennett, Michael J.

    2015-01-01

    Peak ground accelerations (PGAs) in the epicentral region of the 1811–1812 New Madrid, Missouri, earthquakes are inferred from liquefaction to have been no greater than ∼0.35g. PGA is inferred in an 11,380  km2 area in the Lower Mississippi Valley in Arkansas and Missouri where liquefaction was extensive in 1811–1812. PGA was inferred by applying liquefaction probability curves, which were originally developed for liquefaction hazard mapping, to detailed maps of liquefaction by Obermeier (1989). The low PGA is inferred because both a shallow (1.5 m deep) water table and a large moment magnitude (M 7.7) earthquake were assumed in the analysis. If a deep (5.0 m) water table and a small magnitude (M 6.8) earthquake are assumed, the maximum inferred PGA is 1.10g. Both inferred PGA values are based on an assumed and poorly constrained correction for sand aging. If an aging correction is not assumed, then the inferred PGA is no greater than 0.22g. A low PGA value may be explained by nonlinear site response. Soils in the study area have an averageVS30 of 220±15  m/s. A low inferred PGA is consistent with PGA values estimated from ground‐motion prediction equations that have been proposed for the New Madrid seismic zone when these estimates are corrected for nonlinear soil site effects. This application of liquefaction probability curves demonstrates their potential usefulness in paleoseismology.

  11. Fixation probability of a nonmutator in a large population of asexual mutators.

    PubMed

    Jain, Kavita; James, Ananthu

    2017-11-21

    In an adapted population of mutators in which most mutations are deleterious, a nonmutator that lowers the mutation rate is under indirect selection and can sweep to fixation. Using a multitype branching process, we calculate the fixation probability of a rare nonmutator in a large population of asexual mutators. We show that when beneficial mutations are absent, the fixation probability is a nonmonotonic function of the mutation rate of the mutator: it first increases sublinearly and then decreases exponentially. We also find that beneficial mutations can enhance the fixation probability of a nonmutator. Our analysis is relevant to an understanding of recent experiments in which a reduction in the mutation rates has been observed. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Using GEO Optical Observations to Infer Orbit Populations

    NASA Technical Reports Server (NTRS)

    Matney, Mark; Africano, John

    2002-01-01

    NASA's Orbital Debris measurements program has a goal to characterize the small debris environment in the geosynchronous Earth-orbit (GEO) region using optical telescopes ("small" refers to objects too small to catalog and track with current systems). Traditionally, observations of GEO and near-GEO objects involve following the object with the telescope long enough to obtain an orbit. When observing very dim objects with small field-of-view telescopes, though, the observations are generally too short to obtain accurate orbital elements. However, it is possible to use such observations to statistically characterize the small object environment. A telescope pointed at a particular spot could potentially see objects in a number of different orbits. Inevitably, when looking at one region for certain types of orbits, there are objects in other types of orbits that cannot be seen. Observation campaigns are designed with these limitations in mind and are set up to span a number of regions of the sky, making it possible to sample all potential orbits under consideration. Each orbit is not seen with the same probability, however, so there are observation biases intrinsic to any observation campaign. Fortunately, it is possible to remove such biases and reconstruct a meaningful estimate of the statistical orbit populations of small objects in GEO. This information, in turn, can be used to investigate the nature of debris sources and to characterize the risk to GEO spacecraft. This paper describes these statistical tools and presents estimates of small object GEO populations.

  13. A Hierarchical Framework for State-Space Matrix Inference and Clustering.

    PubMed

    Zuo, Chandler; Chen, Kailei; Hewitt, Kyle J; Bresnick, Emery H; Keleş, Sündüz

    2016-09-01

    In recent years, a large number of genomic and epigenomic studies have been focusing on the integrative analysis of multiple experimental datasets measured over a large number of observational units. The objectives of such studies include not only inferring a hidden state of activity for each unit over individual experiments, but also detecting highly associated clusters of units based on their inferred states. Although there are a number of methods tailored for specific datasets, there is currently no state-of-the-art modeling framework for this general class of problems. In this paper, we develop the MBASIC ( M atrix B ased A nalysis for S tate-space I nference and C lustering) framework. MBASIC consists of two parts: state-space mapping and state-space clustering. In state-space mapping, it maps observations onto a finite state-space, representing the activation states of units across conditions. In state-space clustering, MBASIC incorporates a finite mixture model to cluster the units based on their inferred state-space profiles across all conditions. Both the state-space mapping and clustering can be simultaneously estimated through an Expectation-Maximization algorithm. MBASIC flexibly adapts to a large number of parametric distributions for the observed data, as well as the heterogeneity in replicate experiments. It allows for imposing structural assumptions on each cluster, and enables model selection using information criterion. In our data-driven simulation studies, MBASIC showed significant accuracy in recovering both the underlying state-space variables and clustering structures. We applied MBASIC to two genome research problems using large numbers of datasets from the ENCODE project. The first application grouped genes based on transcription factor occupancy profiles of their promoter regions in two different cell types. The second application focused on identifying groups of loci that are similar to a GATA2 binding site that is functional at its

  14. Learning to Observe "and" Infer

    ERIC Educational Resources Information Center

    Hanuscin, Deborah L.; Park Rogers, Meredith A.

    2008-01-01

    Researchers describe the need for students to have multiple opportunities and social interaction to learn about the differences between observation and inference and their role in developing scientific explanations (Harlen 2001; Simpson 2000). Helping children develop their skills of observation and inference in science while emphasizing the…

  15. Causal Inference in Retrospective Studies.

    ERIC Educational Resources Information Center

    Holland, Paul W.; Rubin, Donald B.

    1988-01-01

    The problem of drawing causal inferences from retrospective case-controlled studies is considered. A model for causal inference in prospective studies is applied to retrospective studies. Limitations of case-controlled studies are formulated concerning relevant parameters that can be estimated in such studies. A coffee-drinking/myocardial…

  16. Resolving the ancestry of Austronesian-speaking populations.

    PubMed

    Soares, Pedro A; Trejaut, Jean A; Rito, Teresa; Cavadas, Bruno; Hill, Catherine; Eng, Ken Khong; Mormina, Maru; Brandão, Andreia; Fraser, Ross M; Wang, Tse-Yi; Loo, Jun-Hun; Snell, Christopher; Ko, Tsang-Ming; Amorim, António; Pala, Maria; Macaulay, Vincent; Bulbeck, David; Wilson, James F; Gusmão, Leonor; Pereira, Luísa; Oppenheimer, Stephen; Lin, Marie; Richards, Martin B

    2016-03-01

    There are two very different interpretations of the prehistory of Island Southeast Asia (ISEA), with genetic evidence invoked in support of both. The "out-of-Taiwan" model proposes a major Late Holocene expansion of Neolithic Austronesian speakers from Taiwan. An alternative, proposing that Late Glacial/postglacial sea-level rises triggered largely autochthonous dispersals, accounts for some otherwise enigmatic genetic patterns, but fails to explain the Austronesian language dispersal. Combining mitochondrial DNA (mtDNA), Y-chromosome and genome-wide data, we performed the most comprehensive analysis of the region to date, obtaining highly consistent results across all three systems and allowing us to reconcile the models. We infer a primarily common ancestry for Taiwan/ISEA populations established before the Neolithic, but also detected clear signals of two minor Late Holocene migrations, probably representing Neolithic input from both Mainland Southeast Asia and South China, via Taiwan. This latter may therefore have mediated the Austronesian language dispersal, implying small-scale migration and language shift rather than large-scale expansion.

  17. Category Representation for Classification and Feature Inference

    ERIC Educational Resources Information Center

    Johansen, Mark K.; Kruschke, John K.

    2005-01-01

    This research's purpose was to contrast the representations resulting from learning of the same categories by either classifying instances or inferring instance features. Prior inference learning research, particularly T. Yamauchi and A. B. Markman (1998), has suggested that feature inference learning fosters prototype representation, whereas…

  18. Forward and Backward Inference in Spatial Cognition

    PubMed Central

    Penny, Will D.; Zeidman, Peter; Burgess, Neil

    2013-01-01

    This paper shows that the various computations underlying spatial cognition can be implemented using statistical inference in a single probabilistic model. Inference is implemented using a common set of ‘lower-level’ computations involving forward and backward inference over time. For example, to estimate where you are in a known environment, forward inference is used to optimally combine location estimates from path integration with those from sensory input. To decide which way to turn to reach a goal, forward inference is used to compute the likelihood of reaching that goal under each option. To work out which environment you are in, forward inference is used to compute the likelihood of sensory observations under the different hypotheses. For reaching sensory goals that require a chaining together of decisions, forward inference can be used to compute a state trajectory that will lead to that goal, and backward inference to refine the route and estimate control signals that produce the required trajectory. We propose that these computations are reflected in recent findings of pattern replay in the mammalian brain. Specifically, that theta sequences reflect decision making, theta flickering reflects model selection, and remote replay reflects route and motor planning. We also propose a mapping of the above computational processes onto lateral and medial entorhinal cortex and hippocampus. PMID:24348230

  19. Fair Inference on Outcomes

    PubMed Central

    Nabi, Razieh; Shpitser, Ilya

    2017-01-01

    In this paper, we consider the problem of fair statistical inference involving outcome variables. Examples include classification and regression problems, and estimating treatment effects in randomized trials or observational data. The issue of fairness arises in such problems where some covariates or treatments are “sensitive,” in the sense of having potential of creating discrimination. In this paper, we argue that the presence of discrimination can be formalized in a sensible way as the presence of an effect of a sensitive covariate on the outcome along certain causal pathways, a view which generalizes (Pearl 2009). A fair outcome model can then be learned by solving a constrained optimization problem. We discuss a number of complications that arise in classical statistical inference due to this view and provide workarounds based on recent work in causal and semi-parametric inference.

  20. Productivity and abundance of large sponge populations on Flinders Reef flats, Coral Sea

    NASA Astrophysics Data System (ADS)

    Wilkinson, Clive R.

    1987-04-01

    Large populations of flattened sponges with cyanobacterial symbionts were observed on the shallow reef-flats of the Flinders Reefs, Coral Sea. Estimates of these populations indicated as many as 60 individuals with a total wet biomass of 1.2 kg per m2 in some areas. Along a metre wide transect across 1.3 km of reef flat the population was estimated at 530 kg wet weight sponge (mean 411 g m-2). The four prominent species had instantaneous P/R ratios between 1.3 and 1.8 at optimum light such that photosynthetic productivity was calculated to provide between 61 and 80% of sponge energy requirements in summer and 48 to 64% in winter. While such sponge beds are a prominent feature of these reefs, they appear to contribute less than 10% of gross reef-flat productivity.

  1. Inferred global connectivity of whale shark Rhincodon typus populations.

    PubMed

    Sequeira, A M M; Mellin, C; Meekan, M G; Sims, D W; Bradshaw, C J A

    2013-02-01

    Ten years have passed since the last synopsis of whale shark Rhincodon typus biogeography. While a recent review of the species' biology and ecology summarized the vast data collected since then, it is clear that information on population geographic connectivity, migration and demography of R. typus is still limited and scattered. Understanding R. typus migratory behaviour is central to its conservation management considering the genetic evidence suggesting local aggregations are connected at the generational scale over entire ocean basins. By collating available data on sightings, tracked movements and distribution information, this review provides evidence for the hypothesis of broad-scale connectivity among populations, and generates a model describing how the world's R. typus are part of a single, global meta-population. Rhincodon typus occurrence timings and distribution patterns make possible a connection between several aggregation sites in the Indian Ocean. The present conceptual model and validating data lend support to the hypothesis that R. typus are able to move among the three largest ocean basins with a minimum total travelling time of around 2-4 years. The model provides a worldwide perspective of possible R. typus migration routes, and suggests a modified focus for additional research to test its predictions. The framework can be used to trim the hypotheses for R. typus movements and aggregation timings, thereby isolating possible mating and breeding areas that are currently unknown. This will assist endeavours to predict the longer-term response of the species to ocean warming and changing patterns of human-induced mortality. © 2013 The Authors. Journal of Fish Biology © 2013 The Fisheries Society of the British Isles.

  2. Fine-scale variation in meiotic recombination in Mimulus inferred from population shotgun sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hellsten, Uffe; Wright, Kevin M.; Jenkins, Jerry

    2013-11-13

    Meiotic recombination rates can vary widely across genomes, with hotspots of intense activity interspersed among cold regions. In yeast, hotspots tend to occur in promoter regions of genes, whereas in humans and mice hotspots are largely defined by binding sites of the PRDM9 protein. To investigate the detailed recombination pattern in a flowering plant we use shotgun resequencing of a wild population of the monkeyflower Mimulus guttatus to precisely locate over 400,000 boundaries of historic crossovers or gene conversion tracts. Their distribution defines some 13,000 hotspots of varying strengths, interspersed with cold regions of undetectably low recombination. Average recombination ratesmore » peak near starts of genes and fall off sharply, exhibiting polarity. Within genes, recombination tracts are more likely to terminate in exons than in introns. The general pattern is similar to that observed in yeast, as well as in PRDM9-knockout mice, suggesting that recombination initiation described here in Mimulus may reflect ancient and conserved eukaryotic mechanisms« less

  3. Eastern Ross Ice Sheet Deglacial History inferred from the Roosevelt Island Ice Core

    NASA Astrophysics Data System (ADS)

    Fudge, T. J.; Buizert, C.; Lee, J.; Waddington, E. D.; Bertler, N. A. N.; Conway, H.; Brook, E.; Severinghaus, J. P.

    2017-12-01

    The Ross Ice Sheet drains large portions of both West and East Antarctica. Understanding the retreat of the Ross Ice Sheet following the Last Glacial Maximum is particularly difficult in the eastern Ross area where there is no exposed rock and the Ross Ice Shelf prevents extensive bathymetric mapping. Coastal domes, by preserving old ice, can be used to infer the establishment of grounded ice and be used to infer past ice thickness. Here we focus on Roosevelt Island, in the eastern Ross Sea, where the Roosevelt Island Climate Evolution project recently completed an ice core to bedrock. Using ice-flow modeling constrained by the depth-age relationship and an independent estimate of accumulation rate from firn-densification measurements and modeling, we infer ice thickness histories for the LGM (20ka) to present. Preliminary results indicate thinning of 300m between 15ka and 12ka is required. This is similar to the amount and timing of thinning inferred at Siple Dome, in the central Ross Sea (Waddington et al., 2005; Price et al., 2007) and supports the presence of active ice streams throughout the Ross Ice Sheet advance during the LGM.

  4. Probabilistic models for neural populations that naturally capture global coupling and criticality

    PubMed Central

    2017-01-01

    Advances in multi-unit recordings pave the way for statistical modeling of activity patterns in large neural populations. Recent studies have shown that the summed activity of all neurons strongly shapes the population response. A separate recent finding has been that neural populations also exhibit criticality, an anomalously large dynamic range for the probabilities of different population activity patterns. Motivated by these two observations, we introduce a class of probabilistic models which takes into account the prior knowledge that the neural population could be globally coupled and close to critical. These models consist of an energy function which parametrizes interactions between small groups of neurons, and an arbitrary positive, strictly increasing, and twice differentiable function which maps the energy of a population pattern to its probability. We show that: 1) augmenting a pairwise Ising model with a nonlinearity yields an accurate description of the activity of retinal ganglion cells which outperforms previous models based on the summed activity of neurons; 2) prior knowledge that the population is critical translates to prior expectations about the shape of the nonlinearity; 3) the nonlinearity admits an interpretation in terms of a continuous latent variable globally coupling the system whose distribution we can infer from data. Our method is independent of the underlying system’s state space; hence, it can be applied to other systems such as natural scenes or amino acid sequences of proteins which are also known to exhibit criticality. PMID:28926564

  5. TernaryNet: faster deep model inference without GPUs for medical 3D segmentation using sparse and binary convolutions.

    PubMed

    Heinrich, Mattias P; Blendowski, Max; Oktay, Ozan

    2018-05-30

    Deep convolutional neural networks (DCNN) are currently ubiquitous in medical imaging. While their versatility and high-quality results for common image analysis tasks including segmentation, localisation and prediction is astonishing, the large representational power comes at the cost of highly demanding computational effort. This limits their practical applications for image-guided interventions and diagnostic (point-of-care) support using mobile devices without graphics processing units (GPU). We propose a new scheme that approximates both trainable weights and neural activations in deep networks by ternary values and tackles the open question of backpropagation when dealing with non-differentiable functions. Our solution enables the removal of the expensive floating-point matrix multiplications throughout any convolutional neural network and replaces them by energy- and time-preserving binary operators and population counts. We evaluate our approach for the segmentation of the pancreas in CT. Here, our ternary approximation within a fully convolutional network leads to more than 90% memory reductions and high accuracy (without any post-processing) with a Dice overlap of 71.0% that comes close to the one obtained when using networks with high-precision weights and activations. We further provide a concept for sub-second inference without GPUs and demonstrate significant improvements in comparison with binary quantisation and without our proposed ternary hyperbolic tangent continuation. We present a key enabling technique for highly efficient DCNN inference without GPUs that will help to bring the advances of deep learning to practical clinical applications. It has also great promise for improving accuracies in large-scale medical data retrieval.

  6. When can the cause of a population decline be determined?

    PubMed

    Hefley, Trevor J; Hooten, Mevin B; Drake, John M; Russell, Robin E; Walsh, Daniel P

    2016-11-01

    Inferring the factors responsible for declines in abundance is a prerequisite to preventing the extinction of wild populations. Many of the policies and programmes intended to prevent extinctions operate on the assumption that the factors driving the decline of a population can be determined. Exogenous factors that cause declines in abundance can be statistically confounded with endogenous factors such as density dependence. To demonstrate the potential for confounding, we used an experiment where replicated populations were driven to extinction by gradually manipulating habitat quality. In many of the replicated populations, habitat quality and density dependence were confounded, which obscured causal inference. Our results show that confounding is likely to occur when the exogenous factors that are driving the decline change gradually over time. Our study has direct implications for wild populations, because many factors that could drive a population to extinction change gradually through time. © 2016 John Wiley & Sons Ltd/CNRS.

  7. Accuracy of genomic selection models in a large population of open-pollinated families in white spruce

    PubMed Central

    Beaulieu, J; Doerksen, T; Clément, S; MacKay, J; Bousquet, J

    2014-01-01

    Genomic selection (GS) is of interest in breeding because of its potential for predicting the genetic value of individuals and increasing genetic gains per unit of time. To date, very few studies have reported empirical results of GS potential in the context of large population sizes and long breeding cycles such as for boreal trees. In this study, we assessed the effectiveness of marker-aided selection in an undomesticated white spruce (Picea glauca (Moench) Voss) population of large effective size using a GS approach. A discovery population of 1694 trees representative of 214 open-pollinated families from 43 natural populations was phenotyped for 12 wood and growth traits and genotyped for 6385 single-nucleotide polymorphisms (SNPs) mined in 2660 gene sequences. GS models were built to predict estimated breeding values using all the available SNPs or SNP subsets of the largest absolute effects, and they were validated using various cross-validation schemes. The accuracy of genomic estimated breeding values (GEBVs) varied from 0.327 to 0.435 when the training and the validation data sets shared half-sibs that were on average 90% of the accuracies achieved through traditionally estimated breeding values. The trend was also the same for validation across sites. As expected, the accuracy of GEBVs obtained after cross-validation with individuals of unknown relatedness was lower with about half of the accuracy achieved when half-sibs were present. We showed that with the marker densities used in the current study, predictions with low to moderate accuracy could be obtained within a large undomesticated population of related individuals, potentially resulting in larger gains per unit of time with GS than with the traditional approach. PMID:24781808

  8. Predictive Inference Using Latent Variables with Covariates*

    PubMed Central

    Schofield, Lynne Steuerle; Junker, Brian; Taylor, Lowell J.; Black, Dan A.

    2014-01-01

    Plausible Values (PVs) are a standard multiple imputation tool for analysis of large education survey data that measures latent proficiency variables. When latent proficiency is the dependent variable, we reconsider the standard institutionally-generated PV methodology and find it applies with greater generality than shown previously. When latent proficiency is an independent variable, we show that the standard institutional PV methodology produces biased inference because the institutional conditioning model places restrictions on the form of the secondary analysts’ model. We offer an alternative approach that avoids these biases based on the mixed effects structural equations (MESE) model of Schofield (2008). PMID:25231627

  9. A Method for Inferring an Individual’s Genetic Ancestry and Degree of Admixture Associated with Six Major Continental Populations

    PubMed Central

    Libiger, Ondrej; Schork, Nicholas J.

    2013-01-01

    The determination of the ancestry and genetic backgrounds of the subjects in genetic and general epidemiology studies is a crucial component in the analysis of relevant outcomes or associations. Although there are many methods for differentiating ancestral subgroups among individuals based on genetic markers only a few of these methods provide actual estimates of the fraction of an individual’s genome that is likely to be associated with different ancestral populations. We propose a method for assigning ancestry that works in stages to refine estimates of ancestral population contributions to individual genomes. The method leverages genotype data in the public domain obtained from individuals with known ancestries. Although we showcase the method in the assessment of ancestral genome proportions leveraging largely continental populations, the strategy can be used for assessing within-continent or more subtle ancestral origins with the appropriate data. PMID:23335941

  10. Causal Inference and Developmental Psychology

    ERIC Educational Resources Information Center

    Foster, E. Michael

    2010-01-01

    Causal inference is of central importance to developmental psychology. Many key questions in the field revolve around improving the lives of children and their families. These include identifying risk factors that if manipulated in some way would foster child development. Such a task inherently involves causal inference: One wants to know whether…

  11. Spiral wave chimera states in large populations of coupled chemical oscillators

    NASA Astrophysics Data System (ADS)

    Totz, Jan Frederik; Rode, Julian; Tinsley, Mark R.; Showalter, Kenneth; Engel, Harald

    2018-03-01

    The coexistence of coherent and incoherent dynamics in a population of identically coupled oscillators is known as a chimera state1,2. Discovered in 20023, this counterintuitive dynamical behaviour has inspired extensive theoretical and experimental activity4-15. The spiral wave chimera is a particularly remarkable chimera state, in which an ordered spiral wave rotates around a core consisting of asynchronous oscillators. Spiral wave chimeras were theoretically predicted in 200416 and numerically studied in a variety of systems17-23. Here, we report their experimental verification using large populations of nonlocally coupled Belousov-Zhabotinsky chemical oscillators10,18 in a two-dimensional array. We characterize previously unreported spatiotemporal dynamics, including erratic motion of the asynchronous spiral core, growth and splitting of the cores, as well as the transition from the chimera state to disordered behaviour. Spiral wave chimeras are likely to occur in other systems with long-range interactions, such as cortical tissues24, cilia carpets25, SQUID metamaterials26 and arrays of optomechanical oscillators9.

  12. Coevolutionary dynamics in large, but finite populations

    NASA Astrophysics Data System (ADS)

    Traulsen, Arne; Claussen, Jens Christian; Hauert, Christoph

    2006-07-01

    Coevolving and competing species or game-theoretic strategies exhibit rich and complex dynamics for which a general theoretical framework based on finite populations is still lacking. Recently, an explicit mean-field description in the form of a Fokker-Planck equation was derived for frequency-dependent selection with two strategies in finite populations based on microscopic processes [A. Traulsen, J. C. Claussen, and C. Hauert, Phys. Rev. Lett. 95, 238701 (2005)]. Here we generalize this approach in a twofold way: First, we extend the framework to an arbitrary number of strategies and second, we allow for mutations in the evolutionary process. The deterministic limit of infinite population size of the frequency-dependent Moran process yields the adjusted replicator-mutator equation, which describes the combined effect of selection and mutation. For finite populations, we provide an extension taking random drift into account. In the limit of neutral selection, i.e., whenever the process is determined by random drift and mutations, the stationary strategy distribution is derived. This distribution forms the background for the coevolutionary process. In particular, a critical mutation rate uc is obtained separating two scenarios: above uc the population predominantly consists of a mixture of strategies whereas below uc the population tends to be in homogeneous states. For one of the fundamental problems in evolutionary biology, the evolution of cooperation under Darwinian selection, we demonstrate that the analytical framework provides excellent approximations to individual based simulations even for rather small population sizes. This approach complements simulation results and provides a deeper, systematic understanding of coevolutionary dynamics.

  13. Bayesian Nonparametric Inference – Why and How

    PubMed Central

    Müller, Peter; Mitra, Riten

    2013-01-01

    We review inference under models with nonparametric Bayesian (BNP) priors. The discussion follows a set of examples for some common inference problems. The examples are chosen to highlight problems that are challenging for standard parametric inference. We discuss inference for density estimation, clustering, regression and for mixed effects models with random effects distributions. While we focus on arguing for the need for the flexibility of BNP models, we also review some of the more commonly used BNP models, thus hopefully answering a bit of both questions, why and how to use BNP. PMID:24368932

  14. Cosmic shear measurement with maximum likelihood and maximum a posteriori inference

    NASA Astrophysics Data System (ADS)

    Hall, Alex; Taylor, Andy

    2017-06-01

    We investigate the problem of noise bias in maximum likelihood and maximum a posteriori estimators for cosmic shear. We derive the leading and next-to-leading order biases and compute them in the context of galaxy ellipticity measurements, extending previous work on maximum likelihood inference for weak lensing. We show that a large part of the bias on these point estimators can be removed using information already contained in the likelihood when a galaxy model is specified, without the need for external calibration. We test these bias-corrected estimators on simulated galaxy images similar to those expected from planned space-based weak lensing surveys, with promising results. We find that the introduction of an intrinsic shape prior can help with mitigation of noise bias, such that the maximum a posteriori estimate can be made less biased than the maximum likelihood estimate. Second-order terms offer a check on the convergence of the estimators, but are largely subdominant. We show how biases propagate to shear estimates, demonstrating in our simple set-up that shear biases can be reduced by orders of magnitude and potentially to within the requirements of planned space-based surveys at mild signal-to-noise ratio. We find that second-order terms can exhibit significant cancellations at low signal-to-noise ratio when Gaussian noise is assumed, which has implications for inferring the performance of shear-measurement algorithms from simplified simulations. We discuss the viability of our point estimators as tools for lensing inference, arguing that they allow for the robust measurement of ellipticity and shear.

  15. mtDNA variation predicts population size in humans and reveals a major Southern Asian chapter in human prehistory.

    PubMed

    Atkinson, Quentin D; Gray, Russell D; Drummond, Alexei J

    2008-02-01

    The relative timing and size of regional human population growth following our expansion from Africa remain unknown. Human mitochondrial DNA (mtDNA) diversity carries a legacy of our population history. Given a set of sequences, we can use coalescent theory to estimate past population size through time and draw inferences about human population history. However, recent work has challenged the validity of using mtDNA diversity to infer species population sizes. Here we use Bayesian coalescent inference methods, together with a global data set of 357 human mtDNA coding-region sequences, to infer human population sizes through time across 8 major geographic regions. Our estimates of relative population sizes show remarkable concordance with the contemporary regional distribution of humans across Africa, Eurasia, and the Americas, indicating that mtDNA diversity is a good predictor of population size in humans. Plots of population size through time show slow growth in sub-Saharan Africa beginning 143-193 kya, followed by a rapid expansion into Eurasia after the emergence of the first non-African mtDNA lineages 50-70 kya. Outside Africa, the earliest and fastest growth is inferred in Southern Asia approximately 52 kya, followed by a succession of growth phases in Northern and Central Asia (approximately 49 kya), Australia (approximately 48 kya), Europe (approximately 42 kya), the Middle East and North Africa (approximately 40 kya), New Guinea (approximately 39 kya), the Americas (approximately 18 kya), and a second expansion in Europe (approximately 10-15 kya). Comparisons of relative regional population sizes through time suggest that between approximately 45 and 20 kya most of humanity lived in Southern Asia. These findings not only support the use of mtDNA data for estimating human population size but also provide a unique picture of human prehistory and demonstrate the importance of Southern Asia to our recent evolutionary past.

  16. Classification of complex information: inference of co-occurring affective states from their expressions in speech.

    PubMed

    Sobol-Shikler, Tal; Robinson, Peter

    2010-07-01

    We present a classification algorithm for inferring affective states (emotions, mental states, attitudes, and the like) from their nonverbal expressions in speech. It is based on the observations that affective states can occur simultaneously and different sets of vocal features, such as intonation and speech rate, distinguish between nonverbal expressions of different affective states. The input to the inference system was a large set of vocal features and metrics that were extracted from each utterance. The classification algorithm conducted independent pairwise comparisons between nine affective-state groups. The classifier used various subsets of metrics of the vocal features and various classification algorithms for different pairs of affective-state groups. Average classification accuracy of the 36 pairwise machines was 75 percent, using 10-fold cross validation. The comparison results were consolidated into a single ranked list of the nine affective-state groups. This list was the output of the system and represented the inferred combination of co-occurring affective states for the analyzed utterance. The inference accuracy of the combined machine was 83 percent. The system automatically characterized over 500 affective state concepts from the Mind Reading database. The inference of co-occurring affective states was validated by comparing the inferred combinations to the lexical definitions of the labels of the analyzed sentences. The distinguishing capabilities of the system were comparable to human performance.

  17. Transcranial Doppler velocities in a large, healthy population.

    PubMed

    Tegeler, Charles H; Crutchfield, Kevin; Katsnelson, Michael; Kim, Jongyeol; Tang, Rong; Passmore Griffin, Leah; Rundek, Tanja; Evans, Greg

    2013-07-01

    Transcranial Doppler (TCD) ultrasonography has been extensively used in the evaluation and management of patients with cerebrovascular disease since the clinical application was first described in 1982 by Aaslid and colleagues TCD is a painless, safe, and noninvasive diagnostic technique that measures blood flow velocity in various cerebral arteries. Numerous commercially available TCD devices are currently approved for use worldwide, and TCD is recognized to have an established clinical value for a variety of clinical indications and settings. Although many studies have reported normal values, there have been few recently, and none to include a large cohort of healthy subjects across age, race, and gender. As more objective, automated processes are being developed to assist with the performance and interpretation of TCD studies, and with the potential to easily compare results against a reference population, it is important to define stable normal values and variances across age, race, and gender, with clear understanding of variability of the measurements, as well as the yield from various anatomic segments. To define normal TCD values in a healthy population, we enrolled 364 healthy subjects, ages 18-80 years, to have a complete, nonimaging TCD examination. Subjects with known or suspected cerebrovascular disorders, systemic disorders with cerebrovascular effects, as well as those with known hypertension, diabetes, stroke, coronary artery disease, or myocardial infarction, were excluded. Self-reported ethnicity, handedness, BP, and BMI were recorded. A complete TCD examination was performed by a single experienced sonographer, using a single gate nonimaging TCD device, and a standardized protocol to interrogate up to 23 arterial segments. Individual Doppler spectra were saved for each segment, with velocity and pulsatility index (PI) values calculated using the instrument's automated waveform tracking function. Descriptive analysis was done to determine the mean

  18. Frequency of five disease-causing genetic mutations in a large mixed-breed dog population (2011-2012).

    PubMed

    Zierath, Sharon; Hughes, Angela M; Fretwell, Neale; Dibley, Mark; Ekenstedt, Kari J

    2017-01-01

    A large and growing number of inherited genetic disease mutations are now known in the dog. Frequencies of these mutations are typically examined within the breed of discovery, possibly in related breeds, but nearly always in purebred dogs. No report to date has examined the frequencies of specific genetic disease mutations in a large population of mixed-breed dogs. Further, veterinarians and dog owners typically dismiss inherited/genetic diseases as possibilities for health problems in mixed-breed dogs, assuming hybrid vigor will guarantee that single-gene disease mutations are not a cause for concern. Therefore, the objective of this study was to screen a large mixed-breed canine population for the presence of mutant alleles associated with five autosomal recessive disorders: hyperuricosuria and hyperuricemia (HUU), cystinuria (CYST), factor VII deficiency (FVIID), myotonia congenita (MYC) and phosphofructokinase deficiency (PKFD). Genetic testing was performed in conjunction with breed determination via the commercially-available Wisdom PanelTM test. From a population of nearly 35,000 dogs, homozygous mutant dogs were identified for HUU (n = 57) and FVIID (n = 65). Homozygotes for HUU and FVIID were identified even among dogs with highly mixed breed ancestry. Carriers were identified for all disorders except MYC. HUU and FVIID were of high enough frequency to merit consideration in any mixed-breed dog, while CYST, MYC, and PKFD are vanishingly rare. The assumption that mixed-breed dogs do not suffer from single-gene genetic disorders is shown here to be false. Within the diseases examined, HUU and FVIID should remain on any practitioner's rule-out list, when clinically appropriate, for all mixed-breed dogs, and judicious genetic testing should be performed for diagnosis or screening. Future testing of large mixed-breed dog populations that include additional known canine genetic mutations will refine our knowledge of which genetic diseases can strike mixed

  19. Quantum-Like Representation of Non-Bayesian Inference

    NASA Astrophysics Data System (ADS)

    Asano, M.; Basieva, I.; Khrennikov, A.; Ohya, M.; Tanaka, Y.

    2013-01-01

    This research is related to the problem of "irrational decision making or inference" that have been discussed in cognitive psychology. There are some experimental studies, and these statistical data cannot be described by classical probability theory. The process of decision making generating these data cannot be reduced to the classical Bayesian inference. For this problem, a number of quantum-like coginitive models of decision making was proposed. Our previous work represented in a natural way the classical Bayesian inference in the frame work of quantum mechanics. By using this representation, in this paper, we try to discuss the non-Bayesian (irrational) inference that is biased by effects like the quantum interference. Further, we describe "psychological factor" disturbing "rationality" as an "environment" correlating with the "main system" of usual Bayesian inference.

  20. PhySIC_IST: cleaning source trees to infer more informative supertrees

    PubMed Central

    Scornavacca, Celine; Berry, Vincent; Lefort, Vincent; Douzery, Emmanuel JP; Ranwez, Vincent

    2008-01-01

    Background Supertree methods combine phylogenies with overlapping sets of taxa into a larger one. Topological conflicts frequently arise among source trees for methodological or biological reasons, such as long branch attraction, lateral gene transfers, gene duplication/loss or deep gene coalescence. When topological conflicts occur among source trees, liberal methods infer supertrees containing the most frequent alternative, while veto methods infer supertrees not contradicting any source tree, i.e. discard all conflicting resolutions. When the source trees host a significant number of topological conflicts or have a small taxon overlap, supertree methods of both kinds can propose poorly resolved, hence uninformative, supertrees. Results To overcome this problem, we propose to infer non-plenary supertrees, i.e. supertrees that do not necessarily contain all the taxa present in the source trees, discarding those whose position greatly differs among source trees or for which insufficient information is provided. We detail a variant of the PhySIC veto method called PhySIC_IST that can infer non-plenary supertrees. PhySIC_IST aims at inferring supertrees that satisfy the same appealing theoretical properties as with PhySIC, while being as informative as possible under this constraint. The informativeness of a supertree is estimated using a variation of the CIC (Cladistic Information Content) criterion, that takes into account both the presence of multifurcations and the absence of some taxa. Additionally, we propose a statistical preprocessing step called STC (Source Trees Correction) to correct the source trees prior to the supertree inference. STC is a liberal step that removes the parts of each source tree that significantly conflict with other source trees. Combining STC with a veto method allows an explicit trade-off between veto and liberal approaches, tuned by a single parameter. Performing large-scale simulations, we observe that STC+PhySIC_IST infers much