Science.gov

Sample records for inferred large population

  1. fastSTRUCTURE: variational inference of population structure in large SNP data sets.

    PubMed

    Raj, Anil; Stephens, Matthew; Pritchard, Jonathan K

    2014-06-01

    Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework. Variational methods pose the problem of computing relevant posterior distributions as an optimization problem, allowing us to build on recent advances in optimization theory to develop fast inference tools. In addition, we propose useful heuristic scores to identify the number of populations represented in a data set and a new hierarchical prior to detect weak population structure in the data. We test the variational algorithms on simulated data and illustrate using genotype data from the CEPH-Human Genome Diversity Panel. The variational algorithms are almost two orders of magnitude faster than STRUCTURE and achieve accuracies comparable to those of ADMIXTURE. Furthermore, our results show that the heuristic scores for choosing model complexity provide a reasonable range of values for the number of populations represented in the data, with minimal bias toward detecting structure when it is very weak. Our algorithm, fastSTRUCTURE, is freely available online at http://pritchardlab.stanford.edu/structure.html. Copyright © 2014 by the Genetics Society of America.

  2. Circadian analysis of large human populations: inferences from the power grid.

    PubMed

    Stowie, Adam C; Amicarelli, Mario J; Crosier, Caitlin J; Mymko, Ryan; Glass, J David

    2015-03-01

    Few, if any studies have focused on the daily rhythmic nature of modern industrialized populations. The present study utilized real-time load data from the U.S. Pacific Northwest electrical power grid as a reflection of human operative household activity. This approach involved actigraphic analyses of continuously streaming internet data (provided in 5 min bins) from a human subject pool of approximately 43 million primarily residential users. Rhythm analyses reveal striking seasonal and intra-week differences in human activity patterns, largely devoid of manufacturing and automated load interference. Length of the diurnal activity period (alpha) is longer during the spring than the summer (16.64 h versus 15.98 h, respectively; p < 0.01). As expected, significantly more activity occurs in the solar dark phase during the winter than during the summer (6.29 h versus 2.03 h, respectively; p < 0.01). Interestingly, throughout the year a "weekend effect" is evident, where morning activity onset occurs approximately 1 h later than during the work week (5:54 am versus 6:52 am, respectively; p < 0.01). This indicates a general phase-delaying response to the absence of job-related or other weekday morning arousal cues, substantiating a preference or need to sleep longer on weekends. Finally, a shift in onset time can be seen during the transition to Day Light Saving Time, but not the transition back to Standard Time. The use of grid power load as a means for human actimetry assessment thus offers new insights into the collective diurnal activity patterns of large human populations.

  3. Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach

    PubMed Central

    Boitard, Simon; Rodríguez, Willy; Jay, Flora; Mona, Stefano; Austerlitz, Frédéric

    2016-01-01

    Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles. PMID:26943927

  4. CONE: Community Oriented Network Estimation Is a Versatile Framework for Inferring Population Structure in Large Scale Sequencing Data.

    PubMed

    Kuismin, Markku O; Ahlinder, Jon; Sillanpää, Mikko J

    2017-08-22

    Estimation of genetic population structure based on molecular markers is a common task in population genetics and ecology. We apply a generalized linear model with LASSO regularization to infer relationships between individuals and populations from molecular marker data. Specifically, we apply a neighborhood selection algorithm to infer population genetic structure and gene flow between populations. The resulting relationships are used to construct an individual-level population graph. Different network substructures known as communities are then dissociated from each other using a community detection algorithm. Inference of population structure using networks combines the good properties of: (i) network theory (broad collection of tools, including aesthetically pleasing visualization) (ii) principal component analysis (dimension reduction together with simple visual inspection) (iii) model-based methods (e.g. ancestry coefficients estimates). We have named our process as CONE (Community Oriented Network Estimation). CONE has fewer restrictions than conventional assignment methods in that properties such as the number of subpopulations need not be fixed before the analysis, the sample may include close relatives or involve uneven sampling. Applying CONE on simulated data sets resulted in more accurate estimates of the true number of subpopulations and provided comparable ancestry coefficient estimates than model-based methods. Inference of empirical data sets of teosinte single nucleotide polymorphism, bacterial disease outbreak, and human genome diversity panel illustrate that population structures estimated with CONE are consistent with the earlier findings. Copyright © 2017, G3: Genes, Genomes, Genetics.

  5. Spectral Lags of GRBs observed with INTEGRAL and the inferred large population of low-luminosity GRBs

    NASA Astrophysics Data System (ADS)

    Foley, S.; McGlynn, S.; Hanlon, L.; McBreen, S.; McBreen, B.

    2009-05-01

    The γ-ray instruments on board INTEGRAL detected and localised 47 GRBs from its launch in October 2002 up to July 2007. The peak flux distribution shows that INTEGRAL detects proportionally more weak GRBs than Swift because of its higher sensitivity in a smaller field of view. The all-sky rate of GRBs above ~0.15 ph cm-2 s-1 is ~1400yr-1 in the fully coded field of view of IBIS. Spectral lags i.e. the time delay in the arrival of low-energy γ-rays with respect to high-energy γ-rays, are measured for 31 of the GRBs. Two groups are identified in the spectral lag distribution of INTEGRAL GRBs, one with short lags <0.75 s (between 25-50 keV and 50-300 keV) and one with long lags >0.75 s. Most of the long-lag GRBs are inferred to have low redshifts because of their long spectral lags, their tendency to have low peak energies, and their faint optical and X-ray afterglows. They are mainly observed in the direction of the supergalactic plane with a quadrupole moment of Q = -0.225+/-0.090 and hence reflect the local large-scale structure of the Universe. The rate of long-lag GRBs with inferred low luminosity is ~25% of Type Ib/c SNe. Some of these bursts could be produced by the collapse of a massive star without a SN. Alternatively, they could result from a different progenitor, such as the merger of two white dwarfs or a white dwarf with a neutron star or black hole, possibly in the cluster environment without a host galaxy.

  6. Population heterogeneity and causal inference.

    PubMed

    Xie, Yu

    2013-04-16

    Population heterogeneity is ubiquitous in social science. The very objective of social science research is not to discover abstract and universal laws but to understand population heterogeneity. Due to population heterogeneity, causal inference with observational data in social science is impossible without strong assumptions. Researchers have long been concerned with two potential sources of bias. The first is bias in unobserved pretreatment factors affecting the outcome even in the absence of treatment. The second is bias due to heterogeneity in treatment effects. In this article, I show how "composition bias" due to population heterogeneity evolves over time when treatment propensity is systematically associated with heterogeneous treatment effects. A form of selection bias, composition bias, arises dynamically at the aggregate level even when the classic assumption of ignorability holds true at the microlevel.

  7. Population heterogeneity and causal inference

    PubMed Central

    Xie, Yu

    2013-01-01

    Population heterogeneity is ubiquitous in social science. The very objective of social science research is not to discover abstract and universal laws but to understand population heterogeneity. Due to population heterogeneity, causal inference with observational data in social science is impossible without strong assumptions. Researchers have long been concerned with two potential sources of bias. The first is bias in unobserved pretreatment factors affecting the outcome even in the absence of treatment. The second is bias due to heterogeneity in treatment effects. In this article, I show how “composition bias” due to population heterogeneity evolves over time when treatment propensity is systematically associated with heterogeneous treatment effects. A form of selection bias, composition bias, arises dynamically at the aggregate level even when the classic assumption of ignorability holds true at the microlevel. PMID:23530202

  8. Global characteristics of GRBs observed with INTEGRAL and the inferred large population of low-luminosity GRBs

    NASA Astrophysics Data System (ADS)

    Foley, S.; McGlynn, S.; Hanlon, L.; McBreen, S.; McBreen, B.

    2008-06-01

    Context: INTEGRAL has two sensitive gamma-ray instruments that have detected and localised 47 gamma-ray bursts (GRBs) from its launch in October 2002 up to July 2007. Aims: We present the spectral, spatial, and temporal properties of the bursts in the INTEGRAL GRB catalogue using data from the imager, IBIS, and spectrometer, SPI. Methods: Spectral properties of the GRBs are determined using power-law and, where appropriate, Band model and quasithermal model fits to the prompt emission. Spectral lags, i.e. the time delay in the arrival of low-energy γ-rays with respect to high-energy γ-rays, are measured for 31 of the GRBs. Results: The photon index distribution of power-law fits to the prompt emission spectra is presented and is consistent with that obtained by Swift. The peak flux distribution shows that INTEGRAL detects proportionally more weak GRBs than Swift because of its higher sensitivity in a smaller field of view. The all-sky rate of GRBs above ~0.15~ph~cm-2~s-1 is ~1400~yr-1 in the fully coded field of view of IBIS. Two groups are identified in the spectral lag distribution of INTEGRAL GRBs, one with short lags <0.75~s (between 25-50 keV and 50-300 keV) and one with long lags >0.75~s. Most of the long-lag GRBs are inferred to have low redshifts because of their long spectral lags, their tendency to have low peak energies, and their faint optical and X-ray afterglows. They are mainly observed in the direction of the supergalactic plane with a quadrupole moment of Q=-0.225 ± 0.090 and hence reflect the local large-scale structure of the Universe. Conclusions: The spectral, spatial, and temporal properties of the 47 GRBs in the INTEGRAL catalogue are presented and compared with the results from other missions. The rate of long-lag GRBs with inferred low luminosity is ~25% of type Ib/c supernovae. Some of these bursts could be produced by the collapse of a massive star without a supernova. Alternatively, they could result from a different progenitor, such

  9. Deep Learning for Population Genetic Inference.

    PubMed

    Sheehan, Sara; Song, Yun S

    2016-03-01

    Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.

  10. Deep Learning for Population Genetic Inference

    PubMed Central

    Sheehan, Sara; Song, Yun S.

    2016-01-01

    Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme. PMID:27018908

  11. Inferring Past Effective Population Size from Distributions of Coalescent Times

    PubMed Central

    Gattepaille, Lucie; Günther, Torsten; Jakobsson, Mattias

    2016-01-01

    Inferring and understanding changes in effective population size over time is a major challenge for population genetics. Here we investigate some theoretical properties of random-mating populations with varying size over time. In particular, we present an exact solution to compute the population size as a function of time, Ne(t), based on distributions of coalescent times of samples of any size. This result reduces the problem of population size inference to a problem of estimating coalescent time distributions. To illustrate the analytic results, we design a heuristic method using a tree-inference algorithm and investigate simulated and empirical population-genetic data. We investigate the effects of a range of conditions associated with empirical data, for instance number of loci, sample size, mutation rate, and cryptic recombination. We show that our approach performs well with genomic data (≥ 10,000 loci) and that increasing the sample size from 2 to 10 greatly improves the inference of Ne(t) whereas further increase in sample size results in modest improvements, even under a scenario of exponential growth. We also investigate the impact of recombination and characterize the potential biases in inference of Ne(t). The approach can handle large sample sizes and the computations are fast. We apply our method to human genomes from four populations and reconstruct population size profiles that are coherent with previous finds, including the Out-of-Africa bottleneck. Additionally, we uncover a potential difference in population size between African and non-African populations as early as 400 KYA. In summary, we provide an analytic relationship between distributions of coalescent times and Ne(t), which can be incorporated into powerful approaches for inferring past population sizes from population-genomic data. PMID:27638421

  12. The aggregate site frequency spectrum for comparative population genomic inference.

    PubMed

    Xue, Alexander T; Hickerson, Michael J

    2015-12-01

    Understanding how assemblages of species responded to past climate change is a central goal of comparative phylogeography and comparative population genomics, an endeavour that has increasing potential to integrate with community ecology. New sequencing technology now provides the potential to perform complex demographic inference at unprecedented resolution across assemblages of nonmodel species. To this end, we introduce the aggregate site frequency spectrum (aSFS), an expansion of the site frequency spectrum to use single nucleotide polymorphism (SNP) data sets collected from multiple, co-distributed species for assemblage-level demographic inference. We describe how the aSFS is constructed over an arbitrary number of independent population samples and then demonstrate how the aSFS can differentiate various multispecies demographic histories under a wide range of sampling configurations while allowing effective population sizes and expansion magnitudes to vary independently. We subsequently couple the aSFS with a hierarchical approximate Bayesian computation (hABC) framework to estimate degree of temporal synchronicity in expansion times across taxa, including an empirical demonstration with a data set consisting of five populations of the threespine stickleback (Gasterosteus aculeatus). Corroborating what is generally understood about the recent postglacial origins of these populations, the joint aSFS/hABC analysis strongly suggests that the stickleback data are most consistent with synchronous expansion after the Last Glacial Maximum (posterior probability = 0.99). The aSFS will have general application for multilevel statistical frameworks to test models involving assemblages and/or communities, and as large-scale SNP data from nonmodel species become routine, the aSFS expands the potential for powerful next-generation comparative population genomic inference.

  13. Fast and accurate inference of local ancestry in Latino populations

    PubMed Central

    Baran, Yael; Pasaniuc, Bogdan; Sankararaman, Sriram; Torgerson, Dara G.; Gignoux, Christopher; Eng, Celeste; Rodriguez-Cintron, William; Chapela, Rocio; Ford, Jean G.; Avila, Pedro C.; Rodriguez-Santana, Jose; Burchard, Esteban Gonzàlez; Halperin, Eran

    2012-01-01

    Motivation: It is becoming increasingly evident that the analysis of genotype data from recently admixed populations is providing important insights into medical genetics and population history. Such analyses have been used to identify novel disease loci, to understand recombination rate variation and to detect recent selection events. The utility of such studies crucially depends on accurate and unbiased estimation of the ancestry at every genomic locus in recently admixed populations. Although various methods have been proposed and shown to be extremely accurate in two-way admixtures (e.g. African Americans), only a few approaches have been proposed and thoroughly benchmarked on multi-way admixtures (e.g. Latino populations of the Americas). Results: To address these challenges we introduce here methods for local ancestry inference which leverage the structure of linkage disequilibrium in the ancestral population (LAMP-LD), and incorporate the constraint of Mendelian segregation when inferring local ancestry in nuclear family trios (LAMP-HAP). Our algorithms uniquely combine hidden Markov models (HMMs) of haplotype diversity within a novel window-based framework to achieve superior accuracy as compared with published methods. Further, unlike previous methods, the structure of our HMM does not depend on the number of reference haplotypes but on a fixed constant, and it is thereby capable of utilizing large datasets while remaining highly efficient and robust to over-fitting. Through simulations and analysis of real data from 489 nuclear trio families from the mainland US, Puerto Rico and Mexico, we demonstrate that our methods achieve superior accuracy compared with published methods for local ancestry inference in Latinos. Availability: http://lamp.icsi.berkeley.edu/lamp/lampld/ Contact: bpasaniu@hsph.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22495753

  14. Fast and accurate inference of local ancestry in Latino populations.

    PubMed

    Baran, Yael; Pasaniuc, Bogdan; Sankararaman, Sriram; Torgerson, Dara G; Gignoux, Christopher; Eng, Celeste; Rodriguez-Cintron, William; Chapela, Rocio; Ford, Jean G; Avila, Pedro C; Rodriguez-Santana, Jose; Burchard, Esteban Gonzàlez; Halperin, Eran

    2012-05-15

    It is becoming increasingly evident that the analysis of genotype data from recently admixed populations is providing important insights into medical genetics and population history. Such analyses have been used to identify novel disease loci, to understand recombination rate variation and to detect recent selection events. The utility of such studies crucially depends on accurate and unbiased estimation of the ancestry at every genomic locus in recently admixed populations. Although various methods have been proposed and shown to be extremely accurate in two-way admixtures (e.g. African Americans), only a few approaches have been proposed and thoroughly benchmarked on multi-way admixtures (e.g. Latino populations of the Americas). To address these challenges we introduce here methods for local ancestry inference which leverage the structure of linkage disequilibrium in the ancestral population (LAMP-LD), and incorporate the constraint of Mendelian segregation when inferring local ancestry in nuclear family trios (LAMP-HAP). Our algorithms uniquely combine hidden Markov models (HMMs) of haplotype diversity within a novel window-based framework to achieve superior accuracy as compared with published methods. Further, unlike previous methods, the structure of our HMM does not depend on the number of reference haplotypes but on a fixed constant, and it is thereby capable of utilizing large datasets while remaining highly efficient and robust to over-fitting. Through simulations and analysis of real data from 489 nuclear trio families from the mainland US, Puerto Rico and Mexico, we demonstrate that our methods achieve superior accuracy compared with published methods for local ancestry inference in Latinos.

  15. Population metrics for suicide events: A causal inference approach.

    PubMed

    He, Hua; Lu, Naiji; Stephens, Brady; Xia, Yinglin; Bossarte, Robert M; Kane, Cathleen P; Tang, Wan; Tu, Xin M

    2017-01-01

    Large-scale public health prevention initiatives and interventions are a very important component to current public health strategies. But evaluating effects of such large-scale prevention/intervention faces a lot of challenges due to confounding effects and heterogeneity of study population. In this paper, we will develop metrics to assess the risk for suicide events based on causal inference framework when the study population is heterogeneous. The proposed metrics deal with the confounding effect by first estimating the risk of suicide events within each of the risk levels, number of prior attempts, and then taking a weighted sum of the conditional probabilities. The metrics provide unbiased estimates of the risk of suicide events. Simulation studies and a real data example will be used to demonstrate the proposed metrics.

  16. Regional population inferences for the American woodcock

    USGS Publications Warehouse

    Dwyer, T.J.; Nichols, J.D.

    1982-01-01

    Woodcock (Philohela minor) bandings and recoveries from 1967 to 1977 were analyzed from two large banding reference areas corresponding to existing Eastern and Central harvest units. We examined temporal, age-specific, sex-specific, and geographic variation in both survival and recovery rates, using recently developed stochastic models. Survival rate estimates for females were higher than those for males, and higher for adults than for young. There was no significant difference in recovery rates between young and adults. Recovery rates of Eastern unit birds were higher, and overall survival rates were lower than those of Central unit birds. Survival rate estimates were used with crude production rate estimates in a simple modeling effort, and resulting rates of population increase were 1.2 to 1.3 times higher in the Central reference area.

  17. Accurate inference of local phased ancestry of modern admixed populations.

    PubMed

    Ma, Yamin; Zhao, Jian; Wong, Jian-Syuan; Ma, Li; Li, Wenzhi; Fu, Guoxing; Xu, Wei; Zhang, Kui; Kittles, Rick A; Li, Yun; Song, Qing

    2014-07-23

    Population stratification is a growing concern in genetic-association studies. Averaged ancestry at the genome level (global ancestry) is insufficient for detecting the population substructures and correcting population stratifications in association studies. Local and phase stratification are needed for human genetic studies, but current technologies cannot be applied on the entire genome data due to various technical caveats. Here we developed a novel approach (aMAP, ancestry of Modern Admixed Populations) for inferring local phased ancestry. It took about 3 seconds on a desktop computer to finish a local ancestry analysis for each human genome with 1.4-million SNPs. This method also exhibits the scalability to larger datasets with respect to the number of SNPs, the number of samples, and the size of reference panels. It can detect the lack of the proxy of reference panels. The accuracy was 99.4%. The aMAP software has a capacity for analyzing 6-way admixed individuals. As the biomedical community continues to expand its efforts to increase the representation of diverse populations, and as the number of large whole-genome sequence datasets continues to grow rapidly, there is an increasing demand on rapid and accurate local ancestry analysis in genetics, pharmacogenomics, population genetics, and clinical diagnosis.

  18. Robust Inference of Risks of Large Portfolios.

    PubMed

    Fan, Jianqing; Han, Fang; Liu, Han; Vickers, Byron

    2016-10-01

    We propose a bootstrap-based robust high-confidence level upper bound (Robust H-CLUB) for assessing the risks of large portfolios. The proposed approach exploits rank-based and quantile-based estimators, and can be viewed as a robust extension of the H-CLUB procedure (Fan et al., 2015). Such an extension allows us to handle possibly misspecified models and heavy-tailed data, which are stylized features in financial returns. Under mixing conditions, we analyze the proposed approach and demonstrate its advantage over H-CLUB. We further provide thorough numerical results to back up the developed theory, and also apply the proposed method to analyze a stock market dataset.

  19. Robust Inference of Risks of Large Portfolios

    PubMed Central

    Fan, Jianqing; Han, Fang; Liu, Han; Vickers, Byron

    2016-01-01

    We propose a bootstrap-based robust high-confidence level upper bound (Robust H-CLUB) for assessing the risks of large portfolios. The proposed approach exploits rank-based and quantile-based estimators, and can be viewed as a robust extension of the H-CLUB procedure (Fan et al., 2015). Such an extension allows us to handle possibly misspecified models and heavy-tailed data, which are stylized features in financial returns. Under mixing conditions, we analyze the proposed approach and demonstrate its advantage over H-CLUB. We further provide thorough numerical results to back up the developed theory, and also apply the proposed method to analyze a stock market dataset. PMID:27818569

  20. Ceres' internal structure as inferred from its large craters

    NASA Astrophysics Data System (ADS)

    Marchi, Simone; Raymond, Carol; Fu, Roger; Ermakov, Anton I.; O'Brien, David P.; De Sanctis, Cristina; Ammannito, Eleonora; Russell, Christopher T.

    2016-10-01

    The Dawn spacecraft has gathered important data about the surface composition, internal structure, and geomorphology of Ceres, revealing a cratered landscape. Digital terrain models and global mosaics have been used to derive a global catalog of impact craters larger than 10 km in diameter. A surface dichotomy appears evident: a large fraction of the northern hemisphere is heavily cratered as the result of several billion of years of collisions, while portions of the equatorial region and southern hemisphere are much less cratered. The latter are associated with the presence of the two largest (~270-280 km) impact craters, Kerwan and Yalode. The global crater count shows a severe depletion for diameters larger than 100-150 km with respect to collisional models and other large asteroids, like Vesta. This is a strong indication that a significant population of large cerean craters has been obliterated over geological time-scales. This observation is supported by the overall topographic power spectrum of Ceres, which shows that long wavelengths in topography are suppressed (that is, flatter surface) compared to short wavelengths.Viscous relaxation of topography may be a natural culprit for the observed paucity of large craters. Relaxation accommodated by the creep of water ice is expected to result in much more rapid and complete decay of topography than inferred. In contrast, we favor a strong crust composed of a mixture of silicates and salt species (<30% vol water ice) with viscosity decreasing by two-three orders of magnitude in the top 45-70 km of Ceres' crust. This model can account for the observed topography power spectrum and explain the lack of craters in the size range ~100-600 km.Interestingly, Ceres' surface exhibits an 800-km-wide, 4-km-deep depression, known as Vendimia Planitia. The overall topography of Vendimia Planitia is compatible with a partially relaxed mega impact structure. The presence of such a large scale depression bears implications for

  1. Multi-Agent Inference in Social Networks: A Finite Population Learning Approach

    PubMed Central

    Tong, Xin; Zeng, Yao

    2016-01-01

    When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to trade off the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people’s incentives and interactions in the data collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept, finite population learning, to address whether with high probability, a large fraction of people in a given finite population network can make “good” inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows. PMID:27076691

  2. Multi-Agent Inference in Social Networks: A Finite Population Learning Approach.

    PubMed

    Fan, Jianqing; Tong, Xin; Zeng, Yao

    When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to trade off the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people's incentives and interactions in the data collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept, finite population learning, to address whether with high probability, a large fraction of people in a given finite population network can make "good" inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows.

  3. Methods for Ranking and Selection in Large-Scale Inference

    NASA Astrophysics Data System (ADS)

    Henderson, Nicholas C.

    This thesis addresses two distinct problems: one related to ranking and selection for large-scale inference and another related to latent class modeling of longitudinal count data. The first part of the thesis focuses on the problem of identifying leading measurement units from a large collection with a focus on settings with differing levels of estimation precision across measurement units. The main approach presented is a Bayesian ranking procedure that populates the list of top units in a way that maximizes the expected overlap between the true and reported top lists for all list sizes. This procedure relates unit-specific posterior upper tail probabilities with their empirical distribution to yield a ranking variable. It discounts high-variance units less than other common methods and thus achieves improved operating characteristics in the models considered. In the second part of the thesis, we introduce and describe a finite mixture model for longitudinal count data where, conditional on the class label, the subject-specific observations are assumed to arise from a discrete autoregressive process. This approach offers notable computational advantages over related methods due to the within-class closed form of the likelihood function and, as we describe, has a within-class correlation structure which improves model identifiability. We also outline computational strategies for estimating model parameters, and we describe a novel measure of the underlying separation between latent classes and discuss its relation to posterior classification.

  4. Minimal-assumption inference from population-genomic data

    PubMed Central

    Weissman, Daniel B; Hallatschek, Oskar

    2017-01-01

    Samples of multiple complete genome sequences contain vast amounts of information about the evolutionary history of populations, much of it in the associations among polymorphisms at different loci. We introduce a method, Minimal-Assumption Genomic Inference of Coalescence (MAGIC), that reconstructs key features of the evolutionary history, including the distribution of coalescence times, by integrating information across genomic length scales without using an explicit model of coalescence or recombination, allowing it to analyze arbitrarily large samples without phasing while making no assumptions about ancestral structure, linked selection, or gene conversion. Using simulated data, we show that the performance of MAGIC is comparable to that of PSMC’ even on single diploid samples generated with standard coalescent and recombination models. Applying MAGIC to a sample of human genomes reveals evidence of non-demographic factors driving coalescence. DOI: http://dx.doi.org/10.7554/eLife.24836.001 PMID:28671549

  5. Inferences of demography and selection in an African population of Drosophila melanogaster.

    PubMed

    Singh, Nadia D; Jensen, Jeffrey D; Clark, Andrew G; Aquadro, Charles F

    2013-01-01

    It remains a central problem in population genetics to infer the past action of natural selection, and these inferences pose a challenge because demographic events will also substantially affect patterns of polymorphism and divergence. Thus it is imperative to explicitly model the underlying demographic history of the population whenever making inferences about natural selection. In light of the considerable interest in adaptation in African populations of Drosophila melanogaster, which are considered ancestral to the species, we generated a large polymorphism data set representing 2.1 Mb from each of 20 individuals from a Ugandan population of D. melanogaster. In contrast to previous inferences of a simple population expansion in eastern Africa, our demographic modeling of this ancestral population reveals a strong signature of a population bottleneck followed by population expansion, which has significant implications for future demographic modeling of derived populations of this species. Taking this more complex underlying demographic history into account, we also estimate a mean X-linked region-wide rate of adaptation of 6 × 10(-11)/site/generation and a mean selection coefficient of beneficial mutations of 0.0009. These inferences regarding the rate and strength of selection are largely consistent with most other estimates from D. melanogaster and indicate a relatively high rate of adaptation driven by weakly beneficial mutations.

  6. Using Network Methodology to Infer Population Substructure

    PubMed Central

    Prokopenko, Dmitry; Hecker, Julian; Silverman, Edwin; Nöthen, Markus M.; Schmid, Matthias; Lange, Christoph; Loehlein Fier, Heide

    2015-01-01

    One of the main caveats of association studies is the possible affection by bias due to population stratification. Existing methods rely on model-based approaches like structure and ADMIXTURE or on principal component analysis like EIGENSTRAT. Here we provide a novel visualization technique and describe the problem of population substructure from a graph-theoretical point of view. We group the sequenced individuals into triads, which depict the relational structure, on the basis of a predefined pairwise similarity measure. We then merge the triads into a network and apply community detection algorithms in order to identify homogeneous subgroups or communities, which can further be incorporated as covariates into logistic regression. We apply our method to populations from different continents in the 1000 Genomes Project and evaluate the type 1 error based on the empirical p-values. The application to 1000 Genomes data suggests that the network approach provides a very fine resolution of the underlying ancestral population structure. Besides we show in simulations, that in the presence of discrete population structures, our developed approach maintains the type 1 error more precisely than existing approaches. PMID:26098940

  7. Inferences about ungulate population dynamics derived from age ratios

    USGS Publications Warehouse

    Harris, N.C.; Kauffman, M.J.; Mills, L.S.

    2008-01-01

    Age ratios (e.g., calf:cow for elk and fawn:doe for deer) are used regularly to monitor ungulate populations. However, it remains unclear what inferences are appropriate from this index because multiple vital rate changes can influence the observed ratio. We used modeling based on elk (Cervus elaphus) life-history to evaluate both how age ratios are influenced by stage-specific fecundity and survival and how well age ratios track population dynamics. Although all vital rates have the potential to influence calf:adult female ratios (i.e., calf:xow ratios), calf survival explained the vast majority of variation in calf:adult female ratios due to its temporal variation compared to other vital rates. Calf:adult female ratios were positively correlated with population growth rate (??) and often successfully indicated population trajectories. However, calf:adult female ratios performed poorly at detecting imposed declines in calf survival, suggesting that only the most severe declines would be rapidly detected. Our analyses clarify that managers can use accurate, unbiased age ratios to monitor arguably the most important components contributing to sustainable ungulate populations, survival rate of young and ??. However, age ratios are not useful for detecting gradual declines in survival of young or making inferences about fecundity or adult survival in ungulate populations. Therefore, age ratios coupled with independent estimates of population growth or population size are necessary to monitor ungulate population demography and dynamics closely through time.

  8. Halo detection via large-scale Bayesian inference

    NASA Astrophysics Data System (ADS)

    Merson, Alexander I.; Jasche, Jens; Abdalla, Filipe B.; Lahav, Ofer; Wandelt, Benjamin; Jones, D. Heath; Colless, Matthew

    2016-08-01

    We present a proof-of-concept of a novel and fully Bayesian methodology designed to detect haloes of different masses in cosmological observations subject to noise and systematic uncertainties. Our methodology combines the previously published Bayesian large-scale structure inference algorithm, HAmiltonian Density Estimation and Sampling algorithm (HADES), and a Bayesian chain rule (the Blackwell-Rao estimator), which we use to connect the inferred density field to the properties of dark matter haloes. To demonstrate the capability of our approach, we construct a realistic galaxy mock catalogue emulating the wide-area 6-degree Field Galaxy Survey, which has a median redshift of approximately 0.05. Application of HADES to the catalogue provides us with accurately inferred three-dimensional density fields and corresponding quantification of uncertainties inherent to any cosmological observation. We then use a cosmological simulation to relate the amplitude of the density field to the probability of detecting a halo with mass above a specified threshold. With this information, we can sum over the HADES density field realisations to construct maps of detection probabilities and demonstrate the validity of this approach within our mock scenario. We find that the probability of successful detection of haloes in the mock catalogue increases as a function of the signal to noise of the local galaxy observations. Our proposed methodology can easily be extended to account for more complex scientific questions and is a promising novel tool to analyse the cosmic large-scale structure in observations.

  9. Genealogical lineage sorting leads to significant, but incorrect Bayesian multilocus inference of population structure

    PubMed Central

    OROZCO-terWENGEL, PABLO; CORANDER, JUKKA; SCHLÖTTERER, CHRISTIAN

    2011-01-01

    Over the past decades, the use of molecular markers has revolutionized biology and led to the foundation of a new research discipline—phylogeography. Of particular interest has been the inference of population structure and biogeography. While initial studies focused on mtDNA as a molecular marker, it has become apparent that selection and genealogical lineage sorting could lead to erroneous inferences. As it is not clear to what extent these forces affect a given marker, it has become common practice to use the combined evidence from a set of molecular markers as an attempt to recover the signals that approximate the true underlying demography. Typically, the number of markers used is determined by either budget constraints or by statistical power required to recognize significant population differentiation. Using microsatellite markers from Drosophila and humans, we show that even large numbers of loci (>50) can frequently result in statistically well-supported, but incorrect inference of population structure using the software baps. Most importantly, genomic features, such as chromosomal location, variability of the markers, or recombination rate, cannot explain this observation. Instead, it can be attributed to sampling variation among loci with different realizations of the stochastic lineage sorting. This phenomenon is particularly pronounced for low levels of population differentiation. Our results have important implications for ongoing studies of population differentiation, as we unambiguously demonstrate that statistical significance of population structure inferred from a random set of genetic markers cannot necessarily be taken as evidence for a reliable demographic inference. PMID:21244537

  10. Bayesian inference of population size history from multiple loci.

    PubMed

    Heled, Joseph; Drummond, Alexei J

    2008-10-23

    Effective population size (Ne) is related to genetic variability and is a basic parameter in many models of population genetics. A number of methods for inferring current and past population sizes from genetic data have been developed since JFC Kingman introduced the n-coalescent in 1982. Here we present the Extended Bayesian Skyline Plot, a non-parametric Bayesian Markov chain Monte Carlo algorithm that extends a previous coalescent-based method in several ways, including the ability to analyze multiple loci. Through extensive simulations we show the accuracy and limitations of inferring population size as a function of the amount of data, including recovering information about evolutionary bottlenecks. We also analyzed two real data sets to demonstrate the behavior of the new method; a single gene Hepatitis C virus data set sampled from Egypt and a 10 locus Drosophila ananassae data set representing 16 different populations. The results demonstrate the essential role of multiple loci in recovering population size dynamics. Multi-locus data from a small number of individuals can precisely recover past bottlenecks in population size which can not be characterized by analysis of a single locus. We also demonstrate that sequence data quality is important because even moderate levels of sequencing errors result in a considerable decrease in estimation accuracy for realistic levels of population genetic variability.

  11. Hierarchical animal movement models for population-level inference

    USGS Publications Warehouse

    Hooten, Mevin B.; Buderman, Frances E.; Brost, Brian M.; Hanks, Ephraim M.; Ivans, Jacob S.

    2016-01-01

    New methods for modeling animal movement based on telemetry data are developed regularly. With advances in telemetry capabilities, animal movement models are becoming increasingly sophisticated. Despite a need for population-level inference, animal movement models are still predominantly developed for individual-level inference. Most efforts to upscale the inference to the population level are either post hoc or complicated enough that only the developer can implement the model. Hierarchical Bayesian models provide an ideal platform for the development of population-level animal movement models but can be challenging to fit due to computational limitations or extensive tuning required. We propose a two-stage procedure for fitting hierarchical animal movement models to telemetry data. The two-stage approach is statistically rigorous and allows one to fit individual-level movement models separately, then resample them using a secondary MCMC algorithm. The primary advantages of the two-stage approach are that the first stage is easily parallelizable and the second stage is completely unsupervised, allowing for an automated fitting procedure in many cases. We demonstrate the two-stage procedure with two applications of animal movement models. The first application involves a spatial point process approach to modeling telemetry data, and the second involves a more complicated continuous-time discrete-space animal movement model. We fit these models to simulated data and real telemetry data arising from a population of monitored Canada lynx in Colorado, USA.

  12. Inferring Admixture Histories of Human Populations Using Linkage Disequilibrium

    PubMed Central

    Loh, Po-Ru; Lipson, Mark; Patterson, Nick; Moorjani, Priya; Pickrell, Joseph K.; Reich, David; Berger, Bonnie

    2013-01-01

    Long-range migrations and the resulting admixtures between populations have been important forces shaping human genetic diversity. Most existing methods for detecting and reconstructing historical admixture events are based on allele frequency divergences or patterns of ancestry segments in chromosomes of admixed individuals. An emerging new approach harnesses the exponential decay of admixture-induced linkage disequilibrium (LD) as a function of genetic distance. Here, we comprehensively develop LD-based inference into a versatile tool for investigating admixture. We present a new weighted LD statistic that can be used to infer mixture proportions as well as dates with fewer constraints on reference populations than previous methods. We define an LD-based three-population test for admixture and identify scenarios in which it can detect admixture events that previous formal tests cannot. We further show that we can uncover phylogenetic relationships among populations by comparing weighted LD curves obtained using a suite of references. Finally, we describe several improvements to the computation and fitting of weighted LD curves that greatly increase the robustness and speed of the calculations. We implement all of these advances in a software package, ALDER, which we validate in simulations and apply to test for admixture among all populations from the Human Genome Diversity Project (HGDP), highlighting insights into the admixture history of Central African Pygmies, Sardinians, and Japanese. PMID:23410830

  13. Measuring happiness in large population

    NASA Astrophysics Data System (ADS)

    Wenas, Annabelle; Sjahputri, Smita; Takwin, Bagus; Primaldhi, Alfindra; Muhamad, Roby

    2016-01-01

    The ability to know emotional states for large number of people is important, for example, to ensure the effectiveness of public policies. In this study, we propose a measure of happiness that can be used in large scale population that is based on the analysis of Indonesian language lexicons. Here, we incorporate human assessment of Indonesian words, then quantify happiness on large-scale of texts gathered from twitter conversations. We used two psychological constructs to measure happiness: valence and arousal. We found that Indonesian words have tendency towards positive emotions. We also identified several happiness patterns during days of the week, hours of the day, and selected conversation topics.

  14. Accurate inference of subtle population structure (and other genetic discontinuities) using principal coordinates

    USDA-ARS?s Scientific Manuscript database

    Accurate inference of genetic discontinuities between populations is an essential component of intraspecific biodiversity and evolution studies, as well as associative genetics. The most widely used methods to infer population structure are model based, Bayesian MCMC procedures that minimize Hardy...

  15. ABC inference of multi-population divergence with admixture from unphased population genomic data.

    PubMed

    Robinson, John D; Bunnefeld, Lynsey; Hearn, Jack; Stone, Graham N; Hickerson, Michael J

    2014-09-01

    Rapidly developing sequencing technologies and declining costs have made it possible to collect genome-scale data from population-level samples in nonmodel systems. Inferential tools for historical demography given these data sets are, at present, underdeveloped. In particular, approximate Bayesian computation (ABC) has yet to be widely embraced by researchers generating these data. Here, we demonstrate the promise of ABC for analysis of the large data sets that are now attainable from nonmodel taxa through current genomic sequencing technologies. We develop and test an ABC framework for model selection and parameter estimation, given histories of three-population divergence with admixture. We then explore different sampling regimes to illustrate how sampling more loci, longer loci or more individuals affects the quality of model selection and parameter estimation in this ABC framework. Our results show that inferences improved substantially with increases in the number and/or length of sequenced loci, while less benefit was gained by sampling large numbers of individuals. Optimal sampling strategies given our inferential models included at least 2000 loci, each approximately 2 kb in length, sampled from five diploid individuals per population, although specific strategies are model and question dependent. We tested our ABC approach through simulation-based cross-validations and illustrate its application using previously analysed data from the oak gall wasp, Biorhiza pallida. © 2014 The Authors. Molecular Ecology published by John Wiley & Sons Ltd.

  16. ABC inference of multi-population divergence with admixture from unphased population genomic data

    PubMed Central

    Robinson, John D; Bunnefeld, Lynsey; Hearn, Jack; Stone, Graham N; Hickerson, Michael J

    2014-01-01

    Rapidly developing sequencing technologies and declining costs have made it possible to collect genome-scale data from population-level samples in nonmodel systems. Inferential tools for historical demography given these data sets are, at present, underdeveloped. In particular, approximate Bayesian computation (ABC) has yet to be widely embraced by researchers generating these data. Here, we demonstrate the promise of ABC for analysis of the large data sets that are now attainable from nonmodel taxa through current genomic sequencing technologies. We develop and test an ABC framework for model selection and parameter estimation, given histories of three-population divergence with admixture. We then explore different sampling regimes to illustrate how sampling more loci, longer loci or more individuals affects the quality of model selection and parameter estimation in this ABC framework. Our results show that inferences improved substantially with increases in the number and/or length of sequenced loci, while less benefit was gained by sampling large numbers of individuals. Optimal sampling strategies given our inferential models included at least 2000 loci, each approximately 2 kb in length, sampled from five diploid individuals per population, although specific strategies are model and question dependent. We tested our ABC approach through simulation-based cross-validations and illustrate its application using previously analysed data from the oak gall wasp, Biorhiza pallida. PMID:25113024

  17. Natural Selection in Large Populations

    NASA Astrophysics Data System (ADS)

    Desai, Michael

    2011-03-01

    I will discuss theoretical and experimental approaches to the evolutionary dynamics and population genetics of natural selection in large populations. In these populations, many mutations are often present simultaneously, and because recombination is limited, selection cannot act on them all independently. Rather, it can only affect whole combinations of mutations linked together on the same chromosome. Methods common in theoretical population genetics have been of limited utility in analyzing this coupling between the fates of different mutations. In the past few years it has become increasingly clear that this is a crucial gap in our understanding, as sequence data has begun to show that selection appears to act pervasively on many linked sites in a wide range of populations, including viruses, microbes, Drosophila, and humans. I will describe approaches that combine analytical tools drawn from statistical physics and dynamical systems with traditional methods in theoretical population genetics to address this problem, and describe how experiments in budding yeast can help us directly observe these evolutionary dynamics.

  18. Statistical inference from capture data on closed animal populations

    USGS Publications Warehouse

    Otis, David L.; Burnham, Kenneth P.; White, Gary C.; Anderson, David R.

    1978-01-01

    The estimation of animal abundance is an important problem in both the theoretical and applied biological sciences. Serious work to develop estimation methods began during the 1950s, with a few attempts before that time. The literature on estimation methods has increased tremendously during the past 25 years (Cormack 1968, Seber 1973). However, in large part, the problem remains unsolved. Past efforts toward comprehensive and systematic estimation of density (D) or population size (N) have been inadequate, in general. While more than 200 papers have been published on the subject, one is generally left without a unified approach to the estimation of abundance of an animal population This situation is unfortunate because a number of pressing research problems require such information. In addition, a wide array of environmental assessment studies and biological inventory programs require the estimation of animal abundance. These needs have been further emphasized by the requirement for the preparation of Environmental Impact Statements imposed by the National Environmental Protection Act in 1970. This publication treats inference procedures for certain types of capture data on closed animal populations. This includes multiple capture-recapture studies (variously called capture-mark-recapture, mark-recapture, or tag-recapture studies) involving livetrapping techniques and removal studies involving kill traps or at least temporary removal of captured individuals during the study. Animals do not necessarily need to be physically trapped; visual sightings of marked animals and electrofishing studies also produce data suitable for the methods described in this monograph. To provide a frame of reference for what follows, we give an exampled of a capture-recapture experiment to estimate population size of small animals using live traps. The general field experiment is similar for all capture-recapture studies (a removal study is, of course, slightly different). A typical

  19. Intercoalescence time distribution of incomplete gene genealogies in temporally varying populations, and applications in population genetic inference.

    PubMed

    Chen, Hua

    2013-03-01

    Tracing back to a specific time T in the past, the genealogy of a sample of haplotypes may not have reached their common ancestor and may leave m lineages extant. For such an incomplete genealogy truncated at a specific time T in the past, the distribution and expectation of the intercoalescence times conditional on T are derived in an exact form in this paper for populations of deterministically time-varying sizes, specifically, for populations growing exponentially. The derived intercoalescence time distribution can be integrated to the coalescent-based joint allele frequency spectrum (JAFS) theory, and is useful for population genetic inference from large-scale genomic data, without relying on computationally intensive approaches, such as importance sampling and Markov Chain Monte Carlo (MCMC) methods. The inference of several important parameters relying on this derived conditional distribution is demonstrated: quantifying population growth rate and onset time, and estimating the number of ancestral lineages at a specific ancient time. Simulation studies confirm validity of the derivation and statistical efficiency of the methods using the derived intercoalescence time distribution. Two examples of real data are given to show the inference of the population growth rate of a European sample from the NIEHS Environmental Genome Project, and the number of ancient lineages of 31 mitochondrial genomes from Tibetan populations.

  20. Inference of Population Structure using Dense Haplotype Data

    PubMed Central

    Lawson, Daniel John; Hellenthal, Garrett

    2012-01-01

    The advent of genome-wide dense variation data provides an opportunity to investigate ancestry in unprecedented detail, but presents new statistical challenges. We propose a novel inference framework that aims to efficiently capture information on population structure provided by patterns of haplotype similarity. Each individual in a sample is considered in turn as a recipient, whose chromosomes are reconstructed using chunks of DNA donated by the other individuals. Results of this “chromosome painting” can be summarized as a “coancestry matrix,” which directly reveals key information about ancestral relationships among individuals. If markers are viewed as independent, we show that this matrix almost completely captures the information used by both standard Principal Components Analysis (PCA) and model-based approaches such as STRUCTURE in a unified manner. Furthermore, when markers are in linkage disequilibrium, the matrix combines information across successive markers to increase the ability to discern fine-scale population structure using PCA. In parallel, we have developed an efficient model-based approach to identify discrete populations using this matrix, which offers advantages over PCA in terms of interpretability and over existing clustering algorithms in terms of speed, number of separable populations, and sensitivity to subtle population structure. We analyse Human Genome Diversity Panel data for 938 individuals and 641,000 markers, and we identify 226 populations reflecting differences on continental, regional, local, and family scales. We present multiple lines of evidence that, while many methods capture similar information among strongly differentiated groups, more subtle population structure in human populations is consistently present at a much finer level than currently available geographic labels and is only captured by the haplotype-based approach. The software used for this article, ChromoPainter and fineSTRUCTURE, is available from

  1. Inference of population structure using dense haplotype data.

    PubMed

    Lawson, Daniel John; Hellenthal, Garrett; Myers, Simon; Falush, Daniel

    2012-01-01

    The advent of genome-wide dense variation data provides an opportunity to investigate ancestry in unprecedented detail, but presents new statistical challenges. We propose a novel inference framework that aims to efficiently capture information on population structure provided by patterns of haplotype similarity. Each individual in a sample is considered in turn as a recipient, whose chromosomes are reconstructed using chunks of DNA donated by the other individuals. Results of this "chromosome painting" can be summarized as a "coancestry matrix," which directly reveals key information about ancestral relationships among individuals. If markers are viewed as independent, we show that this matrix almost completely captures the information used by both standard Principal Components Analysis (PCA) and model-based approaches such as STRUCTURE in a unified manner. Furthermore, when markers are in linkage disequilibrium, the matrix combines information across successive markers to increase the ability to discern fine-scale population structure using PCA. In parallel, we have developed an efficient model-based approach to identify discrete populations using this matrix, which offers advantages over PCA in terms of interpretability and over existing clustering algorithms in terms of speed, number of separable populations, and sensitivity to subtle population structure. We analyse Human Genome Diversity Panel data for 938 individuals and 641,000 markers, and we identify 226 populations reflecting differences on continental, regional, local, and family scales. We present multiple lines of evidence that, while many methods capture similar information among strongly differentiated groups, more subtle population structure in human populations is consistently present at a much finer level than currently available geographic labels and is only captured by the haplotype-based approach. The software used for this article, ChromoPainter and fineSTRUCTURE, is available from http://www.paintmychromosomes.com/.

  2. Benchmarking Spike Rate Inference in Population Calcium Imaging.

    PubMed

    Theis, Lucas; Berens, Philipp; Froudarakis, Emmanouil; Reimer, Jacob; Román Rosón, Miroslav; Baden, Tom; Euler, Thomas; Tolias, Andreas S; Bethge, Matthias

    2016-05-04

    A fundamental challenge in calcium imaging has been to infer spike rates of neurons from the measured noisy fluorescence traces. We systematically evaluate different spike inference algorithms on a large benchmark dataset (>100,000 spikes) recorded from varying neural tissue (V1 and retina) using different calcium indicators (OGB-1 and GCaMP6). In addition, we introduce a new algorithm based on supervised learning in flexible probabilistic models and find that it performs better than other published techniques. Importantly, it outperforms other algorithms even when applied to entirely new datasets for which no simultaneously recorded data is available. Future data acquired in new experimental conditions can be used to further improve the spike prediction accuracy and generalization performance of the model. Finally, we show that comparing algorithms on artificial data is not informative about performance on real data, suggesting that benchmarking different methods with real-world datasets may greatly facilitate future algorithmic developments in neuroscience. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. Inferring infection hazard in wildlife populations by linking data across individual and population scales

    USGS Publications Warehouse

    Pepin, Kim M.; Kay, Shannon L.; Golas, Ben D.; Shriner, Susan A.; Gilbert, Amy T.; Miller, Ryan S.; Graham, Andrea L.; Riley, Steven; Cross, Paul C.; Samuel, Michael D.; Hooten, Mevin B.; Hoeting, Jennifer A.; Lloyd-Smith, James O.; Webb, Colleen T.; Buhnerkempe, Michael G.

    2017-01-01

    Our ability to infer unobservable disease-dynamic processes such as force of infection (infection hazard for susceptible hosts) has transformed our understanding of disease transmission mechanisms and capacity to predict disease dynamics. Conventional methods for inferring FOI estimate a time-averaged value and are based on population-level processes. Because many pathogens exhibit epidemic cycling and FOI is the result of processes acting across the scales of individuals and populations, a flexible framework that extends to epidemic dynamics and links within-host processes to FOI is needed. Specifically, within-host antibody kinetics in wildlife hosts can be short-lived and produce patterns that are repeatable across individuals, suggesting individual-level antibody concentrations could be used to infer time since infection and hence FOI. Using simulations and case studies (influenza A in lesser snow geese and Yersinia pestis in coyotes), we argue that with careful experimental and surveillance design, the population-level FOI signal can be recovered from individual-level antibody kinetics, despite substantial individual-level variation. In addition to improving inference, the cross-scale quantitative antibody approach we describe can reveal insights into drivers of individual-based variation in disease response, and the role of poorly understood processes such as secondary infections, in population-level dynamics of disease.

  4. Inferring infection hazard in wildlife populations by linking data across individual and population scales.

    PubMed

    Pepin, Kim M; Kay, Shannon L; Golas, Ben D; Shriner, Susan S; Gilbert, Amy T; Miller, Ryan S; Graham, Andrea L; Riley, Steven; Cross, Paul C; Samuel, Michael D; Hooten, Mevin B; Hoeting, Jennifer A; Lloyd-Smith, James O; Webb, Colleen T; Buhnerkempe, Michael G

    2017-03-01

    Our ability to infer unobservable disease-dynamic processes such as force of infection (infection hazard for susceptible hosts) has transformed our understanding of disease transmission mechanisms and capacity to predict disease dynamics. Conventional methods for inferring FOI estimate a time-averaged value and are based on population-level processes. Because many pathogens exhibit epidemic cycling and FOI is the result of processes acting across the scales of individuals and populations, a flexible framework that extends to epidemic dynamics and links within-host processes to FOI is needed. Specifically, within-host antibody kinetics in wildlife hosts can be short-lived and produce patterns that are repeatable across individuals, suggesting individual-level antibody concentrations could be used to infer time since infection and hence FOI. Using simulations and case studies (influenza A in lesser snow geese and Yersinia pestis in coyotes), we argue that with careful experimental and surveillance design, the population-level FOI signal can be recovered from individual-level antibody kinetics, despite substantial individual-level variation. In addition to improving inference, the cross-scale quantitative antibody approach we describe can reveal insights into drivers of individual-based variation in disease response, and the role of poorly understood processes such as secondary infections, in population-level dynamics of disease.

  5. Accurate Inference of Subtle Population Structure (and Other Genetic Discontinuities) Using Principal Coordinates

    PubMed Central

    Reeves, Patrick A.; Richards, Christopher M.

    2009-01-01

    Background Accurate inference of genetic discontinuities between populations is an essential component of intraspecific biodiversity and evolution studies, as well as associative genetics. The most widely-used methods to infer population structure are model-based, Bayesian MCMC procedures that minimize Hardy-Weinberg and linkage disequilibrium within subpopulations. These methods are useful, but suffer from large computational requirements and a dependence on modeling assumptions that may not be met in real data sets. Here we describe the development of a new approach, PCO-MC, which couples principal coordinate analysis to a clustering procedure for the inference of population structure from multilocus genotype data. Methodology/Principal Findings PCO-MC uses data from all principal coordinate axes simultaneously to calculate a multidimensional “density landscape”, from which the number of subpopulations, and the membership within subpopulations, is determined using a valley-seeking algorithm. Using extensive simulations, we show that this approach outperforms a Bayesian MCMC procedure when many loci (e.g. 100) are sampled, but that the Bayesian procedure is marginally superior with few loci (e.g. 10). When presented with sufficient data, PCO-MC accurately delineated subpopulations with population Fst values as low as 0.03 (G'st>0.2), whereas the limit of resolution of the Bayesian approach was Fst = 0.05 (G'st>0.35). Conclusions/Significance We draw a distinction between population structure inference for describing biodiversity as opposed to Type I error control in associative genetics. We suggest that discrete assignments, like those produced by PCO-MC, are appropriate for circumscribing units of biodiversity whereas expression of population structure as a continuous variable is more useful for case-control correction in structured association studies. PMID:19172174

  6. Trans-dimensional Bayesian inference for large sequential data sets

    NASA Astrophysics Data System (ADS)

    Mandolesi, E.; Dettmer, J.; Dosso, S. E.; Holland, C. W.

    2015-12-01

    This work develops a sequential Monte Carlo method to infer seismic parameters of layered seabeds from large sequential reflection-coefficient data sets. The approach provides parameter estimates and uncertainties along survey tracks with the goal to aid in the detection of unexploded ordnance in shallow water. The sequential data are acquired by a moving platform with source and receiver array towed close to the seabed. This geometry requires consideration of spherical reflection coefficients, computed efficiently by massively parallel implementation of the Sommerfeld integral via Levin integration on a graphics processing unit. The seabed is parametrized with a trans-dimensional model to account for changes in the environment (i.e. changes in layering) along the track. The method combines advanced Markov chain Monte Carlo methods (annealing) with particle filtering (resampling). Since data from closely-spaced source transmissions (pings) often sample similar environments, the solution from one ping can be utilized to efficiently estimate the posterior for data from subsequent pings. Since reflection-coefficient data are highly informative, the likelihood function can be extremely peaked, resulting in little overlap between posteriors of adjacent pings. This is addressed by adding bridging distributions (via annealed importance sampling) between pings for more efficient transitions. The approach assumes the environment to be changing slowly enough to justify the local 1D parametrization. However, bridging allows rapid changes between pings to be addressed and we demonstrate the method to be stable in such situations. Results are in terms of trans-D parameter estimates and uncertainties along the track. The algorithm is examined for realistic simulated data along a track and applied to a dataset collected by an autonomous underwater vehicle on the Malta Plateau, Mediterranean Sea. [Work supported by the SERDP, DoD.

  7. Inference about density and temporary emigration in unmarked populations

    USGS Publications Warehouse

    Chandler, Richard B.; Royle, J. Andrew; King, David I.

    2011-01-01

    Few species are distributed uniformly in space, and populations of mobile organisms are rarely closed with respect to movement, yet many models of density rely upon these assumptions. We present a hierarchical model allowing inference about the density of unmarked populations subject to temporary emigration and imperfect detection. The model can be fit to data collected using a variety of standard survey methods such as repeated point counts in which removal sampling, double-observer sampling, or distance sampling is used during each count. Simulation studies demonstrated that parameter estimators are unbiased when temporary emigration is either "completely random" or is determined by the size and location of home ranges relative to survey points. We also applied the model to repeated removal sampling data collected on Chestnut-sided Warblers (Dendroica pensylvancia) in the White Mountain National Forest, USA. The density estimate from our model, 1.09 birds/ha, was similar to an estimate of 1.11 birds/ha produced by an intensive spot-mapping effort. Our model is also applicable when processes other than temporary emigration affect the probability of being available for detection, such as in studies using cue counts. Functions to implement the model have been added to the R package unmarked.

  8. Inference about density and temporary emigration in unmarked populations.

    PubMed

    Chandler, Richard B; Royle, J Andrew; King, David I

    2011-07-01

    Few species are distributed uniformly in space, and populations of mobile organisms are rarely closed with respect to movement, yet many models of density rely upon these assumptions. We present a hierarchical model allowing inference about the density of unmarked populations subject to temporary emigration and imperfect detection. The model can be fit to data collected using a variety of standard survey methods such as repeated point counts in which removal sampling, double-observer sampling, or distance sampling is used during each count. Simulation studies demonstrated that parameter estimators are unbiased when temporary emigration is either "completely random" or is determined by the size and location of home ranges relative to survey points. We also applied the model to repeated removal sampling data collected on Chestnut-sided Warblers (Dendroica pensylvancia) in the White Mountain National Forest, U.S.A. The density estimate from our model, 1.09 birds/ha, was similar to an estimate of 1.11 birds/ha produced by an intensive spot-mapping effort. Our model is also applicable when processes other than temporary emigration affect the probability of being available for detection, such as in studies using cue counts. Functions to implement the model have been added to the R package unmarked.

  9. The aggregate site frequency spectrum (aSFS) for comparative population genomic inference

    PubMed Central

    Xue, Alexander T.; Hickerson, Michael J.

    2015-01-01

    Understanding how assemblages of species responded to past climate change is a central goal of comparative phylogeography and comparative population genomics, an endeavor that has increasing potential to integrate with community ecology. New sequencing technology now provides the potential to perform complex demographic inference at unprecedented resolution across assemblages of non-model species. To this end, we introduce the aggregate site frequency spectrum (aSFS), an expansion of the site frequency spectrum to use single nucleotide polymorphism (SNP) datasets collected from multiple, co-distributed species for assemblage-level demographic inference. We describe how the aSFS is constructed over an arbitrary number of independent population samples and then demonstrate how the aSFS can differentiate various multi-species demographic histories under a wide range of sampling configurations while allowing effective population sizes and expansion magnitudes to vary independently. We subsequently couple the aSFS with a hierarchical approximate Bayesian computation (hABC) framework to estimate degree of temporal synchronicity in expansion times across taxa, including an empirical demonstration with a dataset consisting of five populations of the threespine stickleback (Gasterosteus aculeatus). Corroborating what is generally understood about the recent post-glacial origins of these populations, the joint aSFS/hABC analysis strongly suggests that the stickleback data are most consistent with synchronous expansion after the Last Glacial Maximum (posterior probability = 0.99). The aSFS will have general application for multi-level statistical frameworks to test models involving assemblages and/or communities and as large-scale SNP data from non-model species become routine, the aSFS expands the potential for powerful next-generation comparative population genomic inference. PMID:26769405

  10. Inferring mental states from neuroimaging data: From reverse inference to large-scale decoding

    PubMed Central

    Poldrack, Russell A.

    2011-01-01

    A common goal of neuroimaging research is to use imaging data to identify the mental processes that are engaged when a subject performs a mental task. The use of reasoning from activation to mental functions, known as “reverse inference”, has been previously criticized on the basis that it does not take into account how selectively the area is activated by the mental process in question. In this Perspective, I outline the critique of informal reverse inference, and describe a number of new developments that provide the ability to more formally test the predictive power of neuroimaging data. PMID:22153367

  11. Assessing the effect of sequencing depth and sample size in population genetics inferences.

    PubMed

    Fumagalli, Matteo

    2013-01-01

    Next-Generation Sequencing (NGS) technologies have dramatically revolutionised research in many fields of genetics. The ability to sequence many individuals from one or multiple populations at a genomic scale has greatly enhanced population genetics studies and made it a data-driven discipline. Recently, researchers have proposed statistical modelling to address genotyping uncertainty associated with NGS data. However, an ongoing debate is whether it is more beneficial to increase the number of sequenced individuals or the per-sample sequencing depth for estimating genetic variation. Through extensive simulations, I assessed the accuracy of estimating nucleotide diversity, detecting polymorphic sites, and predicting population structure under different experimental scenarios. Results show that the greatest accuracy for estimating population genetics parameters is achieved by employing a large sample size, despite single individuals being sequenced at low depth. Under some circumstances, the minimum sequencing depth for obtaining accurate estimates of allele frequencies and to identify polymorphic sites is [Formula: see text], where both alleles are more likely to have been sequenced. On the other hand, inferences of population structure are more accurate at very large sample sizes, even with extremely low sequencing depth. This all points to the conclusion that under various experimental scenarios, in cost-limited population genetics studies, large sample sizes at low sequencing depth are desirable to achieve high accuracy. These findings will help researchers design their experimental set-ups and guide further investigation on the effect of protocol design for genetic research.

  12. Inferring network dynamics and neuron properties from population recordings.

    PubMed

    Linaro, Daniele; Storace, Marco; Mattia, Maurizio

    2011-01-01

    Understanding the computational capabilities of the nervous system means to "identify" its emergent multiscale dynamics. For this purpose, we propose a novel model-driven identification procedure and apply it to sparsely connected populations of excitatory integrate-and-fire neurons with spike frequency adaptation (SFA). Our method does not characterize the system from its microscopic elements in a bottom-up fashion, and does not resort to any linearization. We investigate networks as a whole, inferring their properties from the response dynamics of the instantaneous discharge rate to brief and aspecific supra-threshold stimulations. While several available methods assume generic expressions for the system as a black box, we adopt a mean-field theory for the evolution of the network transparently parameterized by identified elements (such as dynamic timescales), which are in turn non-trivially related to single-neuron properties. In particular, from the elicited transient responses, the input-output gain function of the neurons in the network is extracted and direct links to the microscopic level are made available: indeed, we show how to extract the decay time constant of the SFA, the absolute refractory period and the average synaptic efficacy. In addition and contrary to previous attempts, our method captures the system dynamics across bifurcations separating qualitatively different dynamical regimes. The robustness and the generality of the methodology is tested on controlled simulations, reporting a good agreement between theoretically expected and identified values. The assumptions behind the underlying theoretical framework make the method readily applicable to biological preparations like cultured neuron networks and in vitro brain slices.

  13. Cu and Zn in different stellar populations:. inferring their origin

    NASA Astrophysics Data System (ADS)

    Bisterzo, S.; Gallino, R.; Pignatari, M.; Pompeia, L.; Cunha, K.; Smith, V.

    We analyse recent high-resolution spectroscopic observations of Cu and Zn for stars of different stellar populations and metallicities, using the best available stellar nucleosynthesis expectations. The observations include unevolved stars of the Galactic halo, thick-disk and thin-disk, bulge-like stars and stars of Omega Cen, globular clusters and Dwarf Spheroidal systems. Most cosmic Cu and half the Zn are synthesised in massive stars during the hydrostatic He-burning and C-burning phases by the weak sr-process, which depends linearly on metallicity. A minor primary contribution for Cu derives from explosive nucleosynthesis in SNe II. A large primary contribution to Zn (as 64Zn) is ascribable to the alpha -rich freezout in nu -winds or to SNe II with large explosion energies (hypernovae). AGB stars and type Ia supernovae do not contribute appreciably to either Cu or Zn.

  14. A Novel and Fast Approach for Population Structure Inference Using Kernel-PCA and Optimization

    PubMed Central

    Popescu, Andrei-Alin; Harper, Andrea L.; Trick, Martin; Bancroft, Ian; Huber, Katharina T.

    2014-01-01

    Population structure is a confounding factor in genome-wide association studies, increasing the rate of false positive associations. To correct for it, several model-based algorithms such as ADMIXTURE and STRUCTURE have been proposed. These tend to suffer from the fact that they have a considerable computational burden, limiting their applicability when used with large datasets, such as those produced by next generation sequencing techniques. To address this, nonmodel based approaches such as sparse nonnegative matrix factorization (sNMF) and EIGENSTRAT have been proposed, which scale better with larger data. Here we present a novel nonmodel-based approach, population structure inference using kernel-PCA and optimization (PSIKO), which is based on a unique combination of linear kernel-PCA and least-squares optimization and allows for the inference of admixture coefficients, principal components, and number of founder populations of a dataset. PSIKO has been compared against existing leading methods on a variety of simulation scenarios, as well as on real biological data. We found that in addition to producing results of the same quality as other tested methods, PSIKO scales extremely well with dataset size, being considerably (up to 30 times) faster for longer sequences than even state-of-the-art methods such as sNMF. PSIKO and accompanying manual are freely available at https://www.uea.ac.uk/computing/psiko. PMID:25326237

  15. Large-Scale Optimization for Bayesian Inference in Complex Systems

    SciTech Connect

    Willcox, Karen; Marzouk, Youssef

    2013-11-12

    The SAGUARO (Scalable Algorithms for Groundwater Uncertainty Analysis and Robust Optimization) Project focused on the development of scalable numerical algorithms for large-scale Bayesian inversion in complex systems that capitalize on advances in large-scale simulation-based optimization and inversion methods. The project was a collaborative effort among MIT, the University of Texas at Austin, Georgia Institute of Technology, and Sandia National Laboratories. The research was directed in three complementary areas: efficient approximations of the Hessian operator, reductions in complexity of forward simulations via stochastic spectral approximations and model reduction, and employing large-scale optimization concepts to accelerate sampling. The MIT--Sandia component of the SAGUARO Project addressed the intractability of conventional sampling methods for large-scale statistical inverse problems by devising reduced-order models that are faithful to the full-order model over a wide range of parameter values; sampling then employs the reduced model rather than the full model, resulting in very large computational savings. Results indicate little effect on the computed posterior distribution. On the other hand, in the Texas--Georgia Tech component of the project, we retain the full-order model, but exploit inverse problem structure (adjoint-based gradients and partial Hessian information of the parameter-to-observation map) to implicitly extract lower dimensional information on the posterior distribution; this greatly speeds up sampling methods, so that fewer sampling points are needed. We can think of these two approaches as ``reduce then sample'' and ``sample then reduce.'' In fact, these two approaches are complementary, and can be used in conjunction with each other. Moreover, they both exploit deterministic inverse problem structure, in the form of adjoint-based gradient and Hessian information of the underlying parameter-to-observation map, to achieve their

  16. An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data.

    PubMed

    Wang, Yi; Lu, James; Yu, Jin; Gibbs, Richard A; Yu, Fuli

    2013-05-01

    Next-generation sequencing is a powerful approach for discovering genetic variation. Sensitive variant calling and haplotype inference from population sequencing data remain challenging. We describe methods for high-quality discovery, genotyping, and phasing of SNPs for low-coverage (approximately 5×) sequencing of populations, implemented in a pipeline called SNPTools. Our pipeline contains several innovations that specifically address challenges caused by low-coverage population sequencing: (1) effective base depth (EBD), a nonparametric statistic that enables more accurate statistical modeling of sequencing data; (2) variance ratio scoring, a variance-based statistic that discovers polymorphic loci with high sensitivity and specificity; and (3) BAM-specific binomial mixture modeling (BBMM), a clustering algorithm that generates robust genotype likelihoods from heterogeneous sequencing data. Last, we develop an imputation engine that refines raw genotype likelihoods to produce high-quality phased genotypes/haplotypes. Designed for large population studies, SNPTools' input/output (I/O) and storage aware design leads to improved computing performance on large sequencing data sets. We apply SNPTools to the International 1000 Genomes Project (1000G) Phase 1 low-coverage data set and obtain genotyping accuracy comparable to that of SNP microarray.

  17. Expectation propagation for large scale Bayesian inference of non-linear molecular networks from perturbation data.

    PubMed

    Narimani, Zahra; Beigy, Hamid; Ahmad, Ashar; Masoudi-Nejad, Ali; Fröhlich, Holger

    2017-01-01

    Inferring the structure of molecular networks from time series protein or gene expression data provides valuable information about the complex biological processes of the cell. Causal network structure inference has been approached using different methods in the past. Most causal network inference techniques, such as Dynamic Bayesian Networks and ordinary differential equations, are limited by their computational complexity and thus make large scale inference infeasible. This is specifically true if a Bayesian framework is applied in order to deal with the unavoidable uncertainty about the correct model. We devise a novel Bayesian network reverse engineering approach using ordinary differential equations with the ability to include non-linearity. Besides modeling arbitrary, possibly combinatorial and time dependent perturbations with unknown targets, one of our main contributions is the use of Expectation Propagation, an algorithm for approximate Bayesian inference over large scale network structures in short computation time. We further explore the possibility of integrating prior knowledge into network inference. We evaluate the proposed model on DREAM4 and DREAM8 data and find it competitive against several state-of-the-art existing network inference methods.

  18. Expectation propagation for large scale Bayesian inference of non-linear molecular networks from perturbation data

    PubMed Central

    Beigy, Hamid; Ahmad, Ashar; Masoudi-Nejad, Ali; Fröhlich, Holger

    2017-01-01

    Inferring the structure of molecular networks from time series protein or gene expression data provides valuable information about the complex biological processes of the cell. Causal network structure inference has been approached using different methods in the past. Most causal network inference techniques, such as Dynamic Bayesian Networks and ordinary differential equations, are limited by their computational complexity and thus make large scale inference infeasible. This is specifically true if a Bayesian framework is applied in order to deal with the unavoidable uncertainty about the correct model. We devise a novel Bayesian network reverse engineering approach using ordinary differential equations with the ability to include non-linearity. Besides modeling arbitrary, possibly combinatorial and time dependent perturbations with unknown targets, one of our main contributions is the use of Expectation Propagation, an algorithm for approximate Bayesian inference over large scale network structures in short computation time. We further explore the possibility of integrating prior knowledge into network inference. We evaluate the proposed model on DREAM4 and DREAM8 data and find it competitive against several state-of-the-art existing network inference methods. PMID:28166542

  19. Joint Inference of Population Assignment and Demographic History

    PubMed Central

    Choi, Sang Chul; Hey, Jody

    2011-01-01

    A new approach to assigning individuals to populations using genetic data is described. Most existing methods work by maximizing Hardy–Weinberg and linkage equilibrium within populations, neither of which will apply for many demographic histories. By including a demographic model, within a likelihood framework based on coalescent theory, we can jointly study demographic history and population assignment. Genealogies and population assignments are sampled from a posterior distribution using a general isolation-with-migration model for multiple populations. A measure of partition distance between assignments facilitates not only the summary of a posterior sample of assignments, but also the estimation of the posterior density for the demographic history. It is shown that joint estimates of assignment and demographic history are possible, including estimation of population phylogeny for samples from three populations. The new method is compared to results of a widely used assignment method, using simulated and published empirical data sets. PMID:21775468

  20. Population genetics inference for longitudinally-sampled mutants under strong selection.

    PubMed

    Lacerda, Miguel; Seoighe, Cathal

    2014-11-01

    Longitudinal allele frequency data are becoming increasingly prevalent. Such samples permit statistical inference of the population genetics parameters that influence the fate of mutant variants. To infer these parameters by maximum likelihood, the mutant frequency is often assumed to evolve according to the Wright-Fisher model. For computational reasons, this discrete model is commonly approximated by a diffusion process that requires the assumption that the forces of natural selection and mutation are weak. This assumption is not always appropriate. For example, mutations that impart drug resistance in pathogens may evolve under strong selective pressure. Here, we present an alternative approximation to the mutant-frequency distribution that does not make any assumptions about the magnitude of selection or mutation and is much more computationally efficient than the standard diffusion approximation. Simulation studies are used to compare the performance of our method to that of the Wright-Fisher and Gaussian diffusion approximations. For large populations, our method is found to provide a much better approximation to the mutant-frequency distribution when selection is strong, while all three methods perform comparably when selection is weak. Importantly, maximum-likelihood estimates of the selection coefficient are severely attenuated when selection is strong under the two diffusion models, but not when our method is used. This is further demonstrated with an application to mutant-frequency data from an experimental study of bacteriophage evolution. We therefore recommend our method for estimating the selection coefficient when the effective population size is too large to utilize the discrete Wright-Fisher model. Copyright © 2014 by the Genetics Society of America.

  1. Population Genetics Inference for Longitudinally-Sampled Mutants Under Strong Selection

    PubMed Central

    Lacerda, Miguel; Seoighe, Cathal

    2014-01-01

    Longitudinal allele frequency data are becoming increasingly prevalent. Such samples permit statistical inference of the population genetics parameters that influence the fate of mutant variants. To infer these parameters by maximum likelihood, the mutant frequency is often assumed to evolve according to the Wright–Fisher model. For computational reasons, this discrete model is commonly approximated by a diffusion process that requires the assumption that the forces of natural selection and mutation are weak. This assumption is not always appropriate. For example, mutations that impart drug resistance in pathogens may evolve under strong selective pressure. Here, we present an alternative approximation to the mutant-frequency distribution that does not make any assumptions about the magnitude of selection or mutation and is much more computationally efficient than the standard diffusion approximation. Simulation studies are used to compare the performance of our method to that of the Wright–Fisher and Gaussian diffusion approximations. For large populations, our method is found to provide a much better approximation to the mutant-frequency distribution when selection is strong, while all three methods perform comparably when selection is weak. Importantly, maximum-likelihood estimates of the selection coefficient are severely attenuated when selection is strong under the two diffusion models, but not when our method is used. This is further demonstrated with an application to mutant-frequency data from an experimental study of bacteriophage evolution. We therefore recommend our method for estimating the selection coefficient when the effective population size is too large to utilize the discrete Wright–Fisher model. PMID:25213172

  2. Dynamics of Sequence -Discrete Bacterial Populations Inferred Using Metagenomes

    SciTech Connect

    Stevens, Sarah; Bendall, Matthew; Kang, Dongwan; Froula, Jeff; Egan, Rob; Chan, Leong-Keat; Tringe, Susannah; McMahon, Katherine; Malmstrom, Rex

    2014-03-14

    From a multi-year metagenomic time series of two dissimilar Wisconsin lakes we have assembled dozens of genomes using a novel approach that bins contigs into distinct genome based on sequence composition, e.g. kmer frequencies, and contig coverage patterns at various times points. Next, we investigated how these genomes, which represent sequence-discrete bacterial populations, evolved over time and used the time series to discover the population dynamics. For example, we explored changes in single nucleotide polymorphism (SNP) frequencies as well as patterns of gene gain and loss in multiple populations. Interestingly, SNP diversity was purged at nearly every genome position in some populations during the course of this study, suggesting these populations may have experienced genome-wide selective sweeps. This represents the first direct, time-resolved observations of periodic selection in natural populations, a key process predicted by the ecotype model of bacterial diversification.

  3. Geographic population structure analysis of worldwide human populations infers their biogeographical origins.

    PubMed

    Elhaik, Eran; Tatarinova, Tatiana; Chebotarev, Dmitri; Piras, Ignazio S; Maria Calò, Carla; De Montis, Antonella; Atzori, Manuela; Marini, Monica; Tofanelli, Sergio; Francalacci, Paolo; Pagani, Luca; Tyler-Smith, Chris; Xue, Yali; Cucca, Francesco; Schurr, Theodore G; Gaieski, Jill B; Melendez, Carlalynne; Vilar, Miguel G; Owings, Amanda C; Gómez, Rocío; Fujita, Ricardo; Santos, Fabrício R; Comas, David; Balanovsky, Oleg; Balanovska, Elena; Zalloua, Pierre; Soodyall, Himla; Pitchappan, Ramasamy; Ganeshprasad, Arunkumar; Hammer, Michael; Matisoo-Smith, Lisa; Wells, R Spencer

    2014-04-29

    The search for a method that utilizes biological information to predict humans' place of origin has occupied scientists for millennia. Over the past four decades, scientists have employed genetic data in an effort to achieve this goal but with limited success. While biogeographical algorithms using next-generation sequencing data have achieved an accuracy of 700 km in Europe, they were inaccurate elsewhere. Here we describe the Geographic Population Structure (GPS) algorithm and demonstrate its accuracy with three data sets using 40,000-130,000 SNPs. GPS placed 83% of worldwide individuals in their country of origin. Applied to over 200 Sardinians villagers, GPS placed a quarter of them in their villages and most of the rest within 50 km of their villages. GPS's accuracy and power to infer the biogeography of worldwide individuals down to their country or, in some cases, village, of origin, underscores the promise of admixture-based methods for biogeography and has ramifications for genetic ancestry testing.

  4. Use of genetic data to infer population-specific ecological and phenotypic traits from mixed aggregations

    USGS Publications Warehouse

    Moran, Paul; Bromaghin, Jeffrey F.; Masuda, Michele

    2014-01-01

    Many applications in ecological genetics involve sampling individuals from a mixture of multiple biological populations and subsequently associating those individuals with the populations from which they arose. Analytical methods that assign individuals to their putative population of origin have utility in both basic and applied research, providing information about population-specific life history and habitat use, ecotoxins, pathogen and parasite loads, and many other non-genetic ecological, or phenotypic traits. Although the question is initially directed at the origin of individuals, in most cases the ultimate desire is to investigate the distribution of some trait among populations. Current practice is to assign individuals to a population of origin and study properties of the trait among individuals within population strata as if they constituted independent samples. It seemed that approach might bias population-specific trait inference. In this study we made trait inferences directly through modeling, bypassing individual assignment. We extended a Bayesian model for population mixture analysis to incorporate parameters for the phenotypic trait and compared its performance to that of individual assignment with a minimum probability threshold for assignment. The Bayesian mixture model outperformed individual assignment under some trait inference conditions. However, by discarding individuals whose origins are most uncertain, the individual assignment method provided a less complex analytical technique whose performance may be adequate for some common trait inference problems. Our results provide specific guidance for method selection under various genetic relationships among populations with different trait distributions.

  5. Robust and scalable inference of population history from hundreds of unphased whole genomes.

    PubMed

    Terhorst, Jonathan; Kamm, John A; Song, Yun S

    2017-02-01

    It has recently been demonstrated that inference methods based on genealogical processes with recombination can uncover past population history in unprecedented detail. However, these methods scale poorly with sample size, limiting resolution in the recent past, and they require phased genomes, which contain switch errors that can catastrophically distort the inferred history. Here we present SMC++, a new statistical tool capable of analyzing orders of magnitude more samples than existing methods while requiring only unphased genomes (its results are independent of phasing). SMC++ can jointly infer population size histories and split times in diverged populations, and it employs a novel spline regularization scheme that greatly reduces estimation error. We apply SMC++ to analyze sequence data from over a thousand human genomes in Africa and Eurasia, hundreds of genomes from a Drosophila melanogaster population in Africa, and tens of genomes from zebra finch and long-tailed finch populations in Australia.

  6. Stellar population effects on the inferred photon density at reionization

    NASA Astrophysics Data System (ADS)

    Stanway, Elizabeth R.; Eldridge, J. J.; Becker, George D.

    2016-02-01

    The relationship between stellar populations and the ionizing flux with which they irradiate their surroundings has profound implications for the evolution of the intergalactic medium (IGM). We quantify the ionizing flux arising from synthetic stellar populations which incorporate the evolution of interacting binary stars. We determine that these show ionizing flux boosted by 60 per cent at 0.05 ≤ Z ≤ 0.3 Z⊙ and a more modest 10-20 per cent at near-solar metallicities relative to star-forming populations in which stars evolve in isolation. The relation of ionizing flux to observables such as 1500 Å continuum and ultraviolet spectral slope is sensitive to attributes of the stellar population including age, star formation history and initial mass function (IMF). For a galaxy forming 1 M⊙ yr-1, observed at >100 Myr after the onset of star formation, we predict a production rate of photons capable of ionizing hydrogen, Nion = 1.4 × 1053 s-1 at Z = Z⊙ and 3.5 × 1053 s-1 at 0.1 Z⊙, assuming a Salpeter-like IMF. We evaluate the impact of these issues on the ionization of the IGM, finding that the known galaxy populations can maintain the ionization state of the Universe back to z ˜ 9, assuming that their luminosity functions continue to MUV = -10, and that constraints on the IGM at z ˜ 2-5 can be satisfied with modest Lyman-continuum photon escape fractions of 4-24 per cent depending on assumed metallicity.

  7. Inferred global connectivity of whale shark Rhincodon typus populations.

    PubMed

    Sequeira, A M M; Mellin, C; Meekan, M G; Sims, D W; Bradshaw, C J A

    2013-02-01

    Ten years have passed since the last synopsis of whale shark Rhincodon typus biogeography. While a recent review of the species' biology and ecology summarized the vast data collected since then, it is clear that information on population geographic connectivity, migration and demography of R. typus is still limited and scattered. Understanding R. typus migratory behaviour is central to its conservation management considering the genetic evidence suggesting local aggregations are connected at the generational scale over entire ocean basins. By collating available data on sightings, tracked movements and distribution information, this review provides evidence for the hypothesis of broad-scale connectivity among populations, and generates a model describing how the world's R. typus are part of a single, global meta-population. Rhincodon typus occurrence timings and distribution patterns make possible a connection between several aggregation sites in the Indian Ocean. The present conceptual model and validating data lend support to the hypothesis that R. typus are able to move among the three largest ocean basins with a minimum total travelling time of around 2-4 years. The model provides a worldwide perspective of possible R. typus migration routes, and suggests a modified focus for additional research to test its predictions. The framework can be used to trim the hypotheses for R. typus movements and aggregation timings, thereby isolating possible mating and breeding areas that are currently unknown. This will assist endeavours to predict the longer-term response of the species to ocean warming and changing patterns of human-induced mortality.

  8. Mutational Meltdown in Large Sexual Populations

    NASA Astrophysics Data System (ADS)

    Bernardes, A. T.

    1995-11-01

    When a new individual is formed (independently of the reproduction process) it inherits harmful mutations. Moreover, new mutations are acquired even in the genetic code formation, most of them deleterious ones. This might lead to a time decay in the mean fitness of the whole population that, for long enough time, would produce the extinction of the species. This process is called Mutational Meltdown and such question used to be considered in the biological literature as a problem that only occurs in small populations. In contrast with earlier biological assumptions, here we present results obtained in different models showing that the mutational meltdown can occur in large populations, even in sexual reproductive ones. We used a bit-string model introduced to study the time evolution of age-structured populations and a genetically inspired model that allows to observe the time evolution of the population mean fitness.

  9. Geographic population structure analysis of worldwide human populations infers their biogeographical origins

    PubMed Central

    Elhaik, Eran; Tatarinova, Tatiana; Chebotarev, Dmitri; Piras, Ignazio S.; Maria Calò, Carla; De Montis, Antonella; Atzori, Manuela; Marini, Monica; Tofanelli, Sergio; Francalacci, Paolo; Pagani, Luca; Tyler-Smith, Chris; Xue, Yali; Cucca, Francesco; Schurr, Theodore G.; Gaieski, Jill B.; Melendez, Carlalynne; Vilar, Miguel G.; Owings, Amanda C.; Gómez, Rocío; Fujita, Ricardo; Santos, Fabrício R.; Comas, David; Balanovsky, Oleg; Balanovska, Elena; Zalloua, Pierre; Soodyall, Himla; Pitchappan, Ramasamy; GaneshPrasad, ArunKumar; Hammer, Michael; Matisoo-Smith, Lisa; Wells, R. Spencer; Acosta, Oscar; Adhikarla, Syama; Adler, Christina J.; Bertranpetit, Jaume; Clarke, Andrew C.; Cooper, Alan; Der Sarkissian, Clio S. I.; Haak, Wolfgang; Haber, Marc; Jin, Li; Kaplan, Matthew E.; Li, Hui; Li, Shilin; Martínez-Cruz, Begoña; Merchant, Nirav C.; Mitchell, John R.; Parida, Laxmi; Platt, Daniel E.; Quintana-Murci, Lluis; Renfrew, Colin; Lacerda, Daniela R.; Royyuru, Ajay K.; Sandoval, Jose Raul; Santhakumari, Arun Varatharajan; Soria Hernanz, David F.; Swamikrishnan, Pandikumar; Ziegle, Janet S.

    2014-01-01

    The search for a method that utilizes biological information to predict humans’ place of origin has occupied scientists for millennia. Over the past four decades, scientists have employed genetic data in an effort to achieve this goal but with limited success. While biogeographical algorithms using next-generation sequencing data have achieved an accuracy of 700 km in Europe, they were inaccurate elsewhere. Here we describe the Geographic Population Structure (GPS) algorithm and demonstrate its accuracy with three data sets using 40,000–130,000 SNPs. GPS placed 83% of worldwide individuals in their country of origin. Applied to over 200 Sardinians villagers, GPS placed a quarter of them in their villages and most of the rest within 50 km of their villages. GPS’s accuracy and power to infer the biogeography of worldwide individuals down to their country or, in some cases, village, of origin, underscores the promise of admixture-based methods for biogeography and has ramifications for genetic ancestry testing. PMID:24781250

  10. A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks.

    PubMed

    Yin, Junming; Ho, Qirong; Xing, Eric P

    2013-01-01

    We propose a scalable approach for making inference about latent spaces of large networks. With a succinct representation of networks as a bag of triangular motifs, a parsimonious statistical model, and an efficient stochastic variational inference algorithm, we are able to analyze real networks with over a million vertices and hundreds of latent roles on a single machine in a matter of hours, a setting that is out of reach for many existing methods. When compared to the state-of-the-art probabilistic approaches, our method is several orders of magnitude faster, with competitive or improved accuracy for latent space recovery and link prediction.

  11. Density estimation in tiger populations: combining information for strong inference

    USGS Publications Warehouse

    Gopalaswamy, Arjun M.; Royle, J. Andrew; Delampady, Mohan; Nichols, James D.; Karanth, K. Ullas; Macdonald, David W.

    2012-01-01

    A productive way forward in studies of animal populations is to efficiently make use of all the information available, either as raw data or as published sources, on critical parameters of interest. In this study, we demonstrate two approaches to the use of multiple sources of information on a parameter of fundamental interest to ecologists: animal density. The first approach produces estimates simultaneously from two different sources of data. The second approach was developed for situations in which initial data collection and analysis are followed up by subsequent data collection and prior knowledge is updated with new data using a stepwise process. Both approaches are used to estimate density of a rare and elusive predator, the tiger, by combining photographic and fecal DNA spatial capture–recapture data. The model, which combined information, provided the most precise estimate of density (8.5 ± 1.95 tigers/100 km2 [posterior mean ± SD]) relative to a model that utilized only one data source (photographic, 12.02 ± 3.02 tigers/100 km2 and fecal DNA, 6.65 ± 2.37 tigers/100 km2). Our study demonstrates that, by accounting for multiple sources of available information, estimates of animal density can be significantly improved.

  12. Density estimation in tiger populations: combining information for strong inference.

    PubMed

    Gopalaswamy, Arjun M; Royle, J Andrew; Delampady, Mohan; Nichols, James D; Karanth, K Ullas; Macdonald, David W

    2012-07-01

    A productive way forward in studies of animal populations is to efficiently make use of all the information available, either as raw data or as published sources, on critical parameters of interest. In this study, we demonstrate two approaches to the use of multiple sources of information on a parameter of fundamental interest to ecologists: animal density. The first approach produces estimates simultaneously from two different sources of data. The second approach was developed for situations in which initial data collection and analysis are followed up by subsequent data collection and prior knowledge is updated with new data using a stepwise process. Both approaches are used to estimate density of a rare and elusive predator, the tiger, by combining photographic and fecal DNA spatial capture-recapture data. The model, which combined information, provided the most precise estimate of density (8.5 +/- 1.95 tigers/100 km2 [posterior mean +/- SD]) relative to a model that utilized only one data source (photographic, 12.02 +/- 3.02 tigers/100 km2 and fecal DNA, 6.65 +/- 2.37 tigers/100 km2). Our study demonstrates that, by accounting for multiple sources of available information, estimates of animal density can be significantly improved.

  13. Accuracy of Demographic Inferences from the Site Frequency Spectrum: The Case of the Yoruba Population.

    PubMed

    Lapierre, Marguerite; Lambert, Amaury; Achaz, Guillaume

    2017-05-01

    Some methods for demographic inference based on the observed genetic diversity of current populations rely on the use of summary statistics such as the Site Frequency Spectrum (SFS). Demographic models can be either model-constrained with numerous parameters, such as growth rates, timing of demographic events, and migration rates, or model-flexible, with an unbounded collection of piecewise constant sizes. It is still debated whether demographic histories can be accurately inferred based on the SFS. Here, we illustrate this theoretical issue on an example of demographic inference for an African population. The SFS of the Yoruba population (data from the 1000 Genomes Project) is fit to a simple model of population growth described with a single parameter (e.g., founding time). We infer a time to the most recent common ancestor of 1.7 million years (MY) for this population. However, we show that the Yoruba SFS is not informative enough to discriminate between several different models of growth. We also show that for such simple demographies, the fit of one-parameter models outperforms the stairway plot, a recently developed model-flexible method. The use of this method on simulated data suggests that it is biased by the noise intrinsically present in the data. Copyright © 2017 by the Genetics Society of America.

  14. Multi-InDel Analysis for Ancestry Inference of Sub-Populations in China

    PubMed Central

    Sun, Kuan; Ye, Yi; Luo, Tao; Hou, Yiping

    2016-01-01

    Ancestry inference is of great interest in diverse areas of scientific researches, including the forensic biology, medical genetics and anthropology. Various methods have been published for distinguishing populations. However, few reports refer to sub-populations (like ethnic groups) within Asian populations for the limitation of markers. Several InDel loci located very tightly in physical positions were treated as one marker by us, which is multi-InDel. The multi-InDel shows potential as Ancestry Inference Marker (AIM). In this study, we performed a genome-wide scan for multi-InDels as AIM. After examining the FST distributions in the 1000 Genomes Database, 12 candidates were selected and validated for eastern Asian populations. A multiplexed assay was developed as a panel to genotype 12 multi-InDel markers simultaneously. Ancestry component analysis with STRUCTURE and principal component analysis (PCA) were employed to estimate its capability for ancestry inference. Furthermore, ancestry assignments of trial individuals were conducted. It proved to be very effective when 210 samples from Han and Tibetan individuals in China were tested. The panel consisting of multi-InDel markers exhibited considerable potency in ancestry inference, and was suggested to be applied in forensic practices and genetic population studies. PMID:28004788

  15. Large-scale parentage inference with SNPs: an efficient algorithm for statistical confidence of parent pair allocations.

    PubMed

    Anderson, Eric C

    2012-11-08

    Advances in genotyping that allow tens of thousands of individuals to be genotyped at a moderate number of single nucleotide polymorphisms (SNPs) permit parentage inference to be pursued on a very large scale. The intergenerational tagging this capacity allows is revolutionizing the management of cultured organisms (cows, salmon, etc.) and is poised to do the same for scientific studies of natural populations. Currently, however, there are no likelihood-based methods of parentage inference which are implemented in a manner that allows them to quickly handle a very large number of potential parents or parent pairs. Here we introduce an efficient likelihood-based method applicable to the specialized case of cultured organisms in which both parents can be reliably sampled. We develop a Markov chain representation for the cumulative number of Mendelian incompatibilities between an offspring and its putative parents and we exploit it to develop a fast algorithm for simulation-based estimates of statistical confidence in SNP-based assignments of offspring to pairs of parents. The method is implemented in the freely available software SNPPIT. We describe the method in detail, then assess its performance in a large simulation study using known allele frequencies at 96 SNPs from ten hatchery salmon populations. The simulations verify that the method is fast and accurate and that 96 well-chosen SNPs can provide sufficient power to identify the correct pair of parents from amongst millions of candidate pairs.

  16. Population generation for large-scale simulation

    NASA Astrophysics Data System (ADS)

    Hannon, Andrew C.; King, Gary; Morrison, Clayton; Galstyan, Aram; Cohen, Paul

    2005-05-01

    Computer simulation is used to research phenomena ranging from the structure of the space-time continuum to population genetics and future combat.1-3 Multi-agent simulations in particular are now commonplace in many fields.4, 5 By modeling populations whose complex behavior emerges from individual interactions, these simulations help to answer questions about effects where closed form solutions are difficult to solve or impossible to derive.6 To be useful, simulations must accurately model the relevant aspects of the underlying domain. In multi-agent simulation, this means that the modeling must include both the agents and their relationships. Typically, each agent can be modeled as a set of attributes drawn from various distributions (e.g., height, morale, intelligence and so forth). Though these can interact - for example, agent height is related to agent weight - they are usually independent. Modeling relations between agents, on the other hand, adds a new layer of complexity, and tools from graph theory and social network analysis are finding increasing application.7, 8 Recognizing the role and proper use of these techniques, however, remains the subject of ongoing research. We recently encountered these complexities while building large scale social simulations.9-11 One of these, the Hats Simulator, is designed to be a lightweight proxy for intelligence analysis problems. Hats models a "society in a box" consisting of many simple agents, called hats. Hats gets its name from the classic spaghetti western, in which the heroes and villains are known by the color of the hats they wear. The Hats society also has its heroes and villains, but the challenge is to identify which color hat they should be wearing based on how they behave. There are three types of hats: benign hats, known terrorists, and covert terrorists. Covert terrorists look just like benign hats but act like terrorists. Population structure can make covert hat identification significantly more

  17. Inference of hazel grouse population structure using multilocus data: a landscape genetic approach.

    PubMed

    Sahlsten, J; Thörngren, H; Höglund, J

    2008-12-01

    In conservation and management of species it is important to make inferences about gene flow, dispersal and population structure. In this study, we used 613 georeferenced tissue samples from hazel grouse (Bonasa bonasia) where each individual was genotyped at 12 microsatellite loci to make inference on population genetic structure, gene flow and dispersal in northern Sweden. Observed levels of genetic diversity suggest that Swedish hazel grouse do not suffer loss of genetic diversity compared with other grouse species. We found significant F(IS) (deviation from Hardy-Weinberg expectations) over the entire sample using jack-knifed estimators over loci, which is most likely explained by a Wahlund effect. With the use of spatial autocorrelation methods, we detected significant isolation by distance among individuals. Neighbourhood size was estimated in the order of 62-158 individuals corresponding to a dispersal distance of 950-1500 m. Using a spatial statistical model for landscape genetics to infer the number of populations and the spatial location of genetic discontinuities between these populations we found indications that Swedish hazel grouse are divided into a northern and a southern population. We could not find a sharp border between these two populations and none of the observed borders appeared to coincide with any potential geographical barriers.These results imply that gene flow appears somewhat unrestricted in the boreal taiga forests of northern Sweden and that the two populations of hazel grouse in Sweden may be explained by the post-glacial reinvasion history of the Scandinavian Peninsula.

  18. Directed partial correlation: inferring large-scale gene regulatory network through induced topology disruptions.

    PubMed

    Yuan, Yinyin; Li, Chang-Tsun; Windram, Oliver

    2011-04-06

    Inferring regulatory relationships among many genes based on their temporal variation in transcript abundance has been a popular research topic. Due to the nature of microarray experiments, classical tools for time series analysis lose power since the number of variables far exceeds the number of the samples. In this paper, we describe some of the existing multivariate inference techniques that are applicable to hundreds of variables and show the potential challenges for small-sample, large-scale data. We propose a directed partial correlation (DPC) method as an efficient and effective solution to regulatory network inference using these data. Specifically for genomic data, the proposed method is designed to deal with large-scale datasets. It combines the efficiency of partial correlation for setting up network topology by testing conditional independence, and the concept of Granger causality to assess topology change with induced interruptions. The idea is that when a transcription factor is induced artificially within a gene network, the disruption of the network by the induction signifies a genes role in transcriptional regulation. The benchmarking results using GeneNetWeaver, the simulator for the DREAM challenges, provide strong evidence of the outstanding performance of the proposed DPC method. When applied to real biological data, the inferred starch metabolism network in Arabidopsis reveals many biologically meaningful network modules worthy of further investigation. These results collectively suggest DPC is a versatile tool for genomics research. The R package DPC is available for download (http://code.google.com/p/dpcnet/).

  19. Bayesian Parameter Inference and Model Selection by Population Annealing in Systems Biology

    PubMed Central

    Murakami, Yohei

    2014-01-01

    Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named “posterior parameter ensemble”. We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor. PMID:25089832

  20. Bayesian parameter inference and model selection by population annealing in systems biology.

    PubMed

    Murakami, Yohei

    2014-01-01

    Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named "posterior parameter ensemble". We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor.

  1. Explaining Inference on a Population of Independent Agents Using Bayesian Networks

    ERIC Educational Resources Information Center

    Sutovsky, Peter

    2013-01-01

    The main goal of this research is to design, implement, and evaluate a novel explanation method, the hierarchical explanation method (HEM), for explaining Bayesian network (BN) inference when the network is modeling a population of conditionally independent agents, each of which is modeled as a subnetwork. For example, consider disease-outbreak…

  2. Explaining Inference on a Population of Independent Agents Using Bayesian Networks

    ERIC Educational Resources Information Center

    Sutovsky, Peter

    2013-01-01

    The main goal of this research is to design, implement, and evaluate a novel explanation method, the hierarchical explanation method (HEM), for explaining Bayesian network (BN) inference when the network is modeling a population of conditionally independent agents, each of which is modeled as a subnetwork. For example, consider disease-outbreak…

  3. Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation.

    PubMed

    Li, Sen; Jakobsson, Mattias

    2012-03-27

    The Approximate Bayesian Computation (ABC) approach has been used to infer demographic parameters for numerous species, including humans. However, most applications of ABC still use limited amounts of data, from a small number of loci, compared to the large amount of genome-wide population-genetic data which have become available in the last few years. We evaluated the performance of the ABC approach for three 'population divergence' models - similar to the 'isolation with migration' model - when the data consists of several hundred thousand SNPs typed for multiple individuals by simulating data from known demographic models. The ABC approach was used to infer demographic parameters of interest and we compared the inferred values to the true parameter values that was used to generate hypothetical "observed" data. For all three case models, the ABC approach inferred most demographic parameters quite well with narrow credible intervals, for example, population divergence times and past population sizes, but some parameters were more difficult to infer, such as population sizes at present and migration rates. We compared the ability of different summary statistics to infer demographic parameters, including haplotype and LD based statistics, and found that the accuracy of the parameter estimates can be improved by combining summary statistics that capture different parts of information in the data. Furthermore, our results suggest that poor choices of prior distributions can in some circumstances be detected using ABC. Finally, increasing the amount of data beyond some hundred loci will substantially improve the accuracy of many parameter estimates using ABC. We conclude that the ABC approach can accommodate realistic genome-wide population genetic data, which may be difficult to analyze with full likelihood approaches, and that the ABC can provide accurate and precise inference of demographic parameters from these data, suggesting that the ABC approach will be a

  4. Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation

    PubMed Central

    2012-01-01

    Background The Approximate Bayesian Computation (ABC) approach has been used to infer demographic parameters for numerous species, including humans. However, most applications of ABC still use limited amounts of data, from a small number of loci, compared to the large amount of genome-wide population-genetic data which have become available in the last few years. Results We evaluated the performance of the ABC approach for three 'population divergence' models - similar to the 'isolation with migration' model - when the data consists of several hundred thousand SNPs typed for multiple individuals by simulating data from known demographic models. The ABC approach was used to infer demographic parameters of interest and we compared the inferred values to the true parameter values that was used to generate hypothetical "observed" data. For all three case models, the ABC approach inferred most demographic parameters quite well with narrow credible intervals, for example, population divergence times and past population sizes, but some parameters were more difficult to infer, such as population sizes at present and migration rates. We compared the ability of different summary statistics to infer demographic parameters, including haplotype and LD based statistics, and found that the accuracy of the parameter estimates can be improved by combining summary statistics that capture different parts of information in the data. Furthermore, our results suggest that poor choices of prior distributions can in some circumstances be detected using ABC. Finally, increasing the amount of data beyond some hundred loci will substantially improve the accuracy of many parameter estimates using ABC. Conclusions We conclude that the ABC approach can accommodate realistic genome-wide population genetic data, which may be difficult to analyze with full likelihood approaches, and that the ABC can provide accurate and precise inference of demographic parameters from these data, suggesting that

  5. Reducing bias in population and landscape genetic inferences: the effects of sampling related individuals and multiple life stages.

    PubMed

    Peterman, William; Brocato, Emily R; Semlitsch, Raymond D; Eggert, Lori S

    2016-01-01

    In population or landscape genetics studies, an unbiased sampling scheme is essential for generating accurate results, but logistics may lead to deviations from the sample design. Such deviations may come in the form of sampling multiple life stages. Presently, it is largely unknown what effect sampling different life stages can have on population or landscape genetic inference, or how mixing life stages can affect the parameters being measured. Additionally, the removal of siblings from a data set is considered best-practice, but direct comparisons of inferences made with and without siblings are limited. In this study, we sampled embryos, larvae, and adult Ambystoma maculatum from five ponds in Missouri, and analyzed them at 15 microsatellite loci. We calculated allelic richness, heterozygosity and effective population sizes for each life stage at each pond and tested for genetic differentiation (F ST and D C ) and isolation-by-distance (IBD) among ponds. We tested for differences in each of these measures between life stages, and in a pooled population of all life stages. All calculations were done with and without sibling pairs to assess the effect of sibling removal. We also assessed the effect of reducing the number of microsatellites used to make inference. No statistically significant differences were found among ponds or life stages for any of the population genetic measures, but patterns of IBD differed among life stages. There was significant IBD when using adult samples, but tests using embryos, larvae, or a combination of the three life stages were not significant. We found that increasing the ratio of larval or embryo samples in the analysis of genetic distance weakened the IBD relationship, and when using D C , the IBD was no longer significant when larvae and embryos exceeded 60% of the population sample. Further, power to detect an IBD relationship was reduced when fewer microsatellites were used in the analysis.

  6. A Bayesian random effects discrete-choice model for resource selection: Population-level selection inference

    USGS Publications Warehouse

    Thomas, D.L.; Johnson, D.; Griffith, B.

    2006-01-01

    Modeling the probability of use of land units characterized by discrete and continuous measures, we present a Bayesian random-effects model to assess resource selection. This model provides simultaneous estimation of both individual- and population-level selection. Deviance information criterion (DIC), a Bayesian alternative to AIC that is sample-size specific, is used for model selection. Aerial radiolocation data from 76 adult female caribou (Rangifer tarandus) and calf pairs during 1 year on an Arctic coastal plain calving ground were used to illustrate models and assess population-level selection of landscape attributes, as well as individual heterogeneity of selection. Landscape attributes included elevation, NDVI (a measure of forage greenness), and land cover-type classification. Results from the first of a 2-stage model-selection procedure indicated that there is substantial heterogeneity among cow-calf pairs with respect to selection of the landscape attributes. In the second stage, selection of models with heterogeneity included indicated that at the population-level, NDVI and land cover class were significant attributes for selection of different landscapes by pairs on the calving ground. Population-level selection coefficients indicate that the pairs generally select landscapes with higher levels of NDVI, but the relationship is quadratic. The highest rate of selection occurs at values of NDVI less than the maximum observed. Results for land cover-class selections coefficients indicate that wet sedge, moist sedge, herbaceous tussock tundra, and shrub tussock tundra are selected at approximately the same rate, while alpine and sparsely vegetated landscapes are selected at a lower rate. Furthermore, the variability in selection by individual caribou for moist sedge and sparsely vegetated landscapes is large relative to the variability in selection of other land cover types. The example analysis illustrates that, while sometimes computationally intense, a

  7. Inferring hidden states in Langevin dynamics on large networks: Average case performance

    NASA Astrophysics Data System (ADS)

    Bravi, B.; Opper, M.; Sollich, P.

    2017-01-01

    We present average performance results for dynamical inference problems in large networks, where a set of nodes is hidden while the time trajectories of the others are observed. Examples of this scenario can occur in signal transduction and gene regulation networks. We focus on the linear stochastic dynamics of continuous variables interacting via random Gaussian couplings of generic symmetry. We analyze the inference error, given by the variance of the posterior distribution over hidden paths, in the thermodynamic limit and as a function of the system parameters and the ratio α between the number of hidden and observed nodes. By applying Kalman filter recursions we find that the posterior dynamics is governed by an "effective" drift that incorporates the effect of the observations. We present two approaches for characterizing the posterior variance that allow us to tackle, respectively, equilibrium and nonequilibrium dynamics. The first appeals to Random Matrix Theory and reveals average spectral properties of the inference error and typical posterior relaxation times; the second is based on dynamical functionals and yields the inference error as the solution of an algebraic equation.

  8. Improving asthma outcomes in large populations.

    PubMed

    Schatz, Michael; Zeiger, Robert S

    2011-08-01

    This article summarizes our experience using administrative, survey, and telephone information to define asthma severity, impairment, risk, and quality of care in our large Kaiser Permanente population. Our data suggest that the 2-year Healthcare Effectiveness Data and Information Set definition of persistent asthma is a good surrogate for survey-defined persistent asthma, and thus it would be reasonable to direct asthma population management and quality-of-care assessments at patients with Healthcare Effectiveness Data and Information Set-defined persistent asthma for 2 years in a row. For population management, the numbers of short-acting β-agonist (SABA) canisters dispensed and validated tools on mail or telephone surveys have been used to assess asthma impairment. Algorithms based on pharmacy data (SABA canister and oral corticosteroid dispensings and prior emergency hospital care) have been used to assess the risk domain of asthma control. The asthma medication ratio (controllers divided by controllers plus SABAs) has been shown to be related to improved outcomes and is recommended as an asthma quality-of-care marker. It is hoped that outreach to patients and providers based on these indicators will improve asthma outcomes in patients cared for in individual practices, as well as in large health plans.

  9. Inference and Analysis of Population Structure Using Genetic Data and Network Theory.

    PubMed

    Greenbaum, Gili; Templeton, Alan R; Bar-David, Shirli

    2016-04-01

    Clustering individuals to subpopulations based on genetic data has become commonplace in many genetic studies. Inference about population structure is most often done by applying model-based approaches, aided by visualization using distance-based approaches such as multidimensional scaling. While existing distance-based approaches suffer from a lack of statistical rigor, model-based approaches entail assumptions of prior conditions such as that the subpopulations are at Hardy-Weinberg equilibria. Here we present a distance-based approach for inference about population structure using genetic data by defining population structure using network theory terminology and methods. A network is constructed from a pairwise genetic-similarity matrix of all sampled individuals. The community partition, a partition of a network to dense subgraphs, is equated with population structure, a partition of the population to genetically related groups. Community-detection algorithms are used to partition the network into communities, interpreted as a partition of the population to subpopulations. The statistical significance of the structure can be estimated by using permutation tests to evaluate the significance of the partition's modularity, a network theory measure indicating the quality of community partitions. To further characterize population structure, a new measure of the strength of association (SA) for an individual to its assigned community is presented. The strength of association distribution (SAD) of the communities is analyzed to provide additional population structure characteristics, such as the relative amount of gene flow experienced by the different subpopulations and identification of hybrid individuals. Human genetic data and simulations are used to demonstrate the applicability of the analyses. The approach presented here provides a novel, computationally efficient model-free method for inference about population structure that does not entail assumption of

  10. SHIPS: Spectral Hierarchical clustering for the Inference of Population Structure in genetic studies.

    PubMed

    Bouaziz, Matthieu; Paccard, Caroline; Guedj, Mickael; Ambroise, Christophe

    2012-01-01

    Inferring the structure of populations has many applications for genetic research. In addition to providing information for evolutionary studies, it can be used to account for the bias induced by population stratification in association studies. To this end, many algorithms have been proposed to cluster individuals into genetically homogeneous sub-populations. The parametric algorithms, such as Structure, are very popular but their underlying complexity and their high computational cost led to the development of faster parametric alternatives such as Admixture. Alternatives to these methods are the non-parametric approaches. Among this category, AWclust has proven efficient but fails to properly identify population structure for complex datasets. We present in this article a new clustering algorithm called Spectral Hierarchical clustering for the Inference of Population Structure (SHIPS), based on a divisive hierarchical clustering strategy, allowing a progressive investigation of population structure. This method takes genetic data as input to cluster individuals into homogeneous sub-populations and with the use of the gap statistic estimates the optimal number of such sub-populations. SHIPS was applied to a set of simulated discrete and admixed datasets and to real SNP datasets, that are data from the HapMap and Pan-Asian SNP consortium. The programs Structure, Admixture, AWclust and PCAclust were also investigated in a comparison study. SHIPS and the parametric approach Structure were the most accurate when applied to simulated datasets both in terms of individual assignments and estimation of the correct number of clusters. The analysis of the results on the real datasets highlighted that the clusterings of SHIPS were the more consistent with the population labels or those produced by the Admixture program. The performances of SHIPS when applied to SNP data, along with its relatively low computational cost and its ease of use make this method a promising

  11. SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies

    PubMed Central

    Bouaziz, Matthieu; Paccard, Caroline; Guedj, Mickael; Ambroise, Christophe

    2012-01-01

    Inferring the structure of populations has many applications for genetic research. In addition to providing information for evolutionary studies, it can be used to account for the bias induced by population stratification in association studies. To this end, many algorithms have been proposed to cluster individuals into genetically homogeneous sub-populations. The parametric algorithms, such as Structure, are very popular but their underlying complexity and their high computational cost led to the development of faster parametric alternatives such as Admixture. Alternatives to these methods are the non-parametric approaches. Among this category, AWclust has proven efficient but fails to properly identify population structure for complex datasets. We present in this article a new clustering algorithm called Spectral Hierarchical clustering for the Inference of Population Structure (SHIPS), based on a divisive hierarchical clustering strategy, allowing a progressive investigation of population structure. This method takes genetic data as input to cluster individuals into homogeneous sub-populations and with the use of the gap statistic estimates the optimal number of such sub-populations. SHIPS was applied to a set of simulated discrete and admixed datasets and to real SNP datasets, that are data from the HapMap and Pan-Asian SNP consortium. The programs Structure, Admixture, AWclust and PCAclust were also investigated in a comparison study. SHIPS and the parametric approach Structure were the most accurate when applied to simulated datasets both in terms of individual assignments and estimation of the correct number of clusters. The analysis of the results on the real datasets highlighted that the clusterings of SHIPS were the more consistent with the population labels or those produced by the Admixture program. The performances of SHIPS when applied to SNP data, along with its relatively low computational cost and its ease of use make this method a promising

  12. Diversity and Adaptation in Large Population Games

    NASA Astrophysics Data System (ADS)

    Wong, K. Y. Michael; Lim, S. W.; Luo, Peixun

    We consider a version of large population games whose players compete for resources using strategies with adaptable preferences. The system efficiency is measured by the variance of the decisions. In the regime where the system can be plagued by the maladaptive behavior of the players, we find that diversity among the players improves the system efficiency, though it slows the convergence to the steady state. Diversity causes a mild spread of resources at the transient state, but reduces the uneven distribution of resources in the steady state.

  13. Susceptibility of large populations of coupled oscillators

    NASA Astrophysics Data System (ADS)

    Daido, Hiroaki

    2015-01-01

    It is an important and interesting problem to elucidate how the degree of phase order in a large population of coupled oscillators responds to a synchronizing periodic force from the outside. Here this problem is studied analytically as well as numerically by introducing the concept of susceptibility for globally coupled phase oscillators with either nonrandom or random interactions. It is shown that the susceptibility diverges at the critical point in the nonrandom case with Widom's equality satisfied, while it exhibits a cusp in the most random case.

  14. Inferring the population structure and demography of Drosophila ananassae from multilocus data.

    PubMed

    Das, Aparup; Mohanty, Sujata; Stephan, Wolfgang

    2004-12-01

    Inferring the origin, population structure, and demographic history of a species is a major objective of population genetics. Although many organisms have been analyzed, the genetic structures of subdivided populations are not well understood. Here we analyze Drosophila ananassae, a highly substructured, cosmopolitan, and human-commensal species distributed in the tropical, subtropical, and mildly temperate regions of the world. We adopt a multilocus approach (with 10 neutral loci) using 16 population samples covering almost the entire species range (Asia, Australia, and America). Analyzed with our recently developed Bayesian method, 5 populations in Southeast Asia are found to be central, while the other 11 are peripheral. These 5 central populations were sampled from localities that belonged to a single landmass ("Sundaland") during the late Pleistocene ( approximately 18,000 years ago), when sea level was approximately 120 m below the present level. The inferred migration routes of D. ananassae out of Sundaland seem to parallel those of humans in this region. Strong evidence for a population size expansion is seen particularly in the ancestral populations.

  15. Inferring genome-wide patterns of admixture in Qataris using fifty-five ancestral populations.

    PubMed

    Omberg, Larsson; Salit, Jacqueline; Hackett, Neil; Fuller, Jennifer; Matthew, Rebecca; Chouchane, Lotfi; Rodriguez-Flores, Juan L; Bustamante, Carlos; Crystal, Ronald G; Mezey, Jason G

    2012-06-26

    Populations of the Arabian Peninsula have a complex genetic structure that reflects waves of migrations including the earliest human migrations from Africa and eastern Asia, migrations along ancient civilization trading routes and colonization history of recent centuries. Here, we present a study of genome-wide admixture in this region, using 156 genotyped individuals from Qatar, a country located at the crossroads of these migration patterns. Since haplotypes of these individuals could have originated from many different populations across the world, we have developed a machine learning method "SupportMix" to infer loci-specific genomic ancestry when simultaneously analyzing many possible ancestral populations. Simulations show that SupportMix is not only more accurate than other popular admixture discovery tools but is the first admixture inference method that can efficiently scale for simultaneous analysis of 50-100 putative ancestral populations while being independent of prior demographic information. By simultaneously using the 55 world populations from the Human Genome Diversity Panel, SupportMix was able to extract the fine-scale ancestry of the Qatar population, providing many new observations concerning the ancestry of the region. For example, as well as recapitulating the three major sub-populations in Qatar, composed of mainly Arabic, Persian, and African ancestry, SupportMix additionally identifies the specific ancestry of the Persian group to populations sampled in Greater Persia rather than from China and the ancestry of the African group to sub-Saharan origin and not Southern African Bantu origin as previously thought.

  16. Inferring population trends for the world's largest fish from mark-recapture estimates of survival.

    PubMed

    Bradshaw, Corey J A; Mollet, Henry F; Meekan, Mark G

    2007-05-01

    1. Precise estimates of demographic rates are key components of population models used to predict the effects of stochastic environmental processes, harvest scenarios and extinction probability. 2. We used a 12-year photographic identification library of whale sharks from Ningaloo Reef, Western Australia to construct Cormack-Jolly-Seber (CJS) model estimates of survival within a capture-mark-recapture (CMR) framework. Estimated survival rates, population structure and assumptions regarding age at maturity, longevity and reproduction frequency were combined in a series of age-classified Leslie matrices to infer the potential trajectory of the population. 3. Using data from 111 individuals, there was evidence for time variation in apparent survival (phi) and recapture probability (p). The null model gave a phi of 0.825 (95% CI: 0.727-0.893) and p = 0.184 (95% CI: 0.121-0.271). The model-averaged annual phi ranged from 0.737 to 0.890. There was little evidence for a sex effect on survival. 4. Using standardized total length as a covariate in the CMR models indicated a size bias in phi. Ignoring the effects of time, a 5-m shark has a phi = 0.59 and a 9 m shark has phi = 0.81. 5. Of the 16 model combinations considered, 10 (63%) indicated a decreasing population (lambda < 1). For models based on age at first reproduction (alpha) of 13 years, the mean age of reproducing females at the stable age distribution (A) ranged from 15 to 23 years, which increased to 29-37 years when alpha was assumed to be 25. 6. All model scenarios had higher total elasticities for non-reproductive female survival [E(s(nr))] compared to those for reproductive female survival [E(s(r))]. 7. Assuming relatively slow, but biologically realistic, vital rates (alpha = 25 and biennial reproduction) and size-biased survival probabilities, our results suggest that the Ningaloo Reef population of whale sharks is declining, although more reproductive data are clearly needed to confirm this conclusion

  17. Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories.

    PubMed

    Chockalingam, Sriram; Aluru, Maneesha; Aluru, Srinivas

    2016-09-19

    Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp.

  18. Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci.

    PubMed

    Gill, Mandev S; Lemey, Philippe; Faria, Nuno R; Rambaut, Andrew; Shapiro, Beth; Suchard, Marc A

    2013-03-01

    Effective population size is fundamental in population genetics and characterizes genetic diversity. To infer past population dynamics from molecular sequence data, coalescent-based models have been developed for Bayesian nonparametric estimation of effective population size over time. Among the most successful is a Gaussian Markov random field (GMRF) model for a single gene locus. Here, we present a generalization of the GMRF model that allows for the analysis of multilocus sequence data. Using simulated data, we demonstrate the improved performance of our method to recover true population trajectories and the time to the most recent common ancestor (TMRCA). We analyze a multilocus alignment of HIV-1 CRF02_AG gene sequences sampled from Cameroon. Our results are consistent with HIV prevalence data and uncover some aspects of the population history that go undetected in Bayesian parametric estimation. Finally, we recover an older and more reconcilable TMRCA for a classic ancient DNA data set.

  19. Hierarchical modeling and inference in ecology: The analysis of data from populations, metapopulations and communities

    USGS Publications Warehouse

    Royle, J. Andrew; Dorazio, Robert M.

    2008-01-01

    A guide to data collection, modeling and inference strategies for biological survey data using Bayesian and classical statistical methods. This book describes a general and flexible framework for modeling and inference in ecological systems based on hierarchical models, with a strict focus on the use of probability models and parametric inference. Hierarchical models represent a paradigm shift in the application of statistics to ecological inference problems because they combine explicit models of ecological system structure or dynamics with models of how ecological systems are observed. The principles of hierarchical modeling are developed and applied to problems in population, metapopulation, community, and metacommunity systems. The book provides the first synthetic treatment of many recent methodological advances in ecological modeling and unifies disparate methods and procedures. The authors apply principles of hierarchical modeling to ecological problems, including * occurrence or occupancy models for estimating species distribution * abundance models based on many sampling protocols, including distance sampling * capture-recapture models with individual effects * spatial capture-recapture models based on camera trapping and related methods * population and metapopulation dynamic models * models of biodiversity, community structure and dynamics.

  20. Exoplanet population inference and the abundance of Earth analogs from noisy, incomplete catalogs

    SciTech Connect

    Foreman-Mackey, Daniel; Hogg, David W.; Morton, Timothy D.

    2014-11-01

    No true extrasolar Earth analog is known. Hundreds of planets have been found around Sun-like stars that are either Earth-sized but on shorter periods, or else on year-long orbits but somewhat larger. Under strong assumptions, exoplanet catalogs have been used to make an extrapolated estimate of the rate at which Sun-like stars host Earth analogs. These studies are complicated by the fact that every catalog is censored by non-trivial selection effects and detection efficiencies, and every property (period, radius, etc.) is measured noisily. Here we present a general hierarchical probabilistic framework for making justified inferences about the population of exoplanets, taking into account survey completeness and, for the first time, observational uncertainties. We are able to make fewer assumptions about the distribution than previous studies; we only require that the occurrence rate density be a smooth function of period and radius (employing a Gaussian process). By applying our method to synthetic catalogs, we demonstrate that it produces more accurate estimates of the whole population than standard procedures based on weighting by inverse detection efficiency. We apply the method to an existing catalog of small planet candidates around G dwarf stars. We confirm a previous result that the radius distribution changes slope near Earth's radius. We find that the rate density of Earth analogs is about 0.02 (per star per natural logarithmic bin in period and radius) with large uncertainty. This number is much smaller than previous estimates made with the same data but stronger assumptions.

  1. mStruct: inference of population structure in light of both genetic admixing and allele mutations.

    PubMed

    Shringarpure, Suyash; Xing, Eric P

    2009-06-01

    Traditional methods for analyzing population structure, such as the Structure program, ignore the influence of the effect of allele mutations between the ancestral and current alleles of genetic markers, which can dramatically influence the accuracy of the structural estimation of current populations. Studying these effects can also reveal additional information about population evolution such as the divergence time and migration history of admixed populations. We propose mStruct, an admixture of population-specific mixtures of inheritance models that addresses the task of structure inference and mutation estimation jointly through a hierarchical Bayesian framework, and a variational algorithm for inference. We validated our method on synthetic data and used it to analyze the Human Genome Diversity Project-Centre d'Etude du Polymorphisme Humain (HGDP-CEPH) cell line panel of microsatellites and HGDP single-nucleotide polymorphism (SNP) data. A comparison of the structural maps of world populations estimated by mStruct and Structure is presented, and we also report potentially interesting mutation patterns in world populations estimated by mStruct.

  2. Population genetic inference from personal genome data: impact of ancestry and admixture on human genomic variation.

    PubMed

    Kidd, Jeffrey M; Gravel, Simon; Byrnes, Jake; Moreno-Estrada, Andres; Musharoff, Shaila; Bryc, Katarzyna; Degenhardt, Jeremiah D; Brisbin, Abra; Sheth, Vrunda; Chen, Rong; McLaughlin, Stephen F; Peckham, Heather E; Omberg, Larsson; Bormann Chung, Christina A; Stanley, Sarah; Pearlstein, Kevin; Levandowsky, Elizabeth; Acevedo-Acevedo, Suehelay; Auton, Adam; Keinan, Alon; Acuña-Alonzo, Victor; Barquera-Lozano, Rodrigo; Canizales-Quinteros, Samuel; Eng, Celeste; Burchard, Esteban G; Russell, Archie; Reynolds, Andy; Clark, Andrew G; Reese, Martin G; Lincoln, Stephen E; Butte, Atul J; De La Vega, Francisco M; Bustamante, Carlos D

    2012-10-05

    Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from 11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas-70% of the European ancestry in today's African Americans dates back to European gene flow happening only 7-8 generations ago.

  3. Population Genetic Inference from Personal Genome Data: Impact of Ancestry and Admixture on Human Genomic Variation

    PubMed Central

    Kidd, Jeffrey M.; Gravel, Simon; Byrnes, Jake; Moreno-Estrada, Andres; Musharoff, Shaila; Bryc, Katarzyna; Degenhardt, Jeremiah D.; Brisbin, Abra; Sheth, Vrunda; Chen, Rong; McLaughlin, Stephen F.; Peckham, Heather E.; Omberg, Larsson; Bormann Chung, Christina A.; Stanley, Sarah; Pearlstein, Kevin; Levandowsky, Elizabeth; Acevedo-Acevedo, Suehelay; Auton, Adam; Keinan, Alon; Acuña-Alonzo, Victor; Barquera-Lozano, Rodrigo; Canizales-Quinteros, Samuel; Eng, Celeste; Burchard, Esteban G.; Russell, Archie; Reynolds, Andy; Clark, Andrew G.; Reese, Martin G.; Lincoln, Stephen E.; Butte, Atul J.; De La Vega, Francisco M.; Bustamante, Carlos D.

    2012-01-01

    Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from 11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas—70% of the European ancestry in today’s African Americans dates back to European gene flow happening only 7–8 generations ago. PMID:23040495

  4. Molecular hyperdiversity and evolution in very large populations

    PubMed Central

    Cutter, Asher D.; Jovelin, Richard; Dey, Alivia

    2014-01-01

    The genomic density of sequence polymorphisms critically affects the sensitivity of inferences about ongoing sequence evolution, function, and demographic history. Most animal and plant genomes have relatively low densities of polymorphisms, but some species are hyperdiverse with neutral nucleotide heterozygosity exceeding 5%. Eukaryotes with extremely large populations, mimicking bacterial and viral populations, present novel opportunities for studying molecular evolution in sexually-reproducing taxa with complex development. In particular, hyperdiverse species can help answer controversial questions about the evolution of genome complexity, the limits of natural selection, modes of adaptation, and subtleties of the mutation process. However, such systems have some inherent complications and here we identify topics in need of theoretical developments. Close relatives of the model organisms Caenorhabditis elegans and Drosophila melanogaster provide known examples of hyperdiverse eukaryotes, encouraging functional dissection of resulting molecular evolutionary patterns. We recommend how best to exploit hyperdiverse populations for analysis, for example, in quantifying the impact of non-crossover recombination in genomes and for determining the identity and micro-evolutionary selective pressures on non-coding regulatory elements. PMID:23506466

  5. A method to infer positive selection from marker dynamics in an asexual population.

    PubMed

    Illingworth, Christopher J R; Mustonen, Ville

    2012-03-15

    The observation of positive selection acting on a mutant indicates that the corresponding mutation has some form of functional relevance. Determining the fitness effects of mutations thus has relevance to many interesting biological questions. One means of identifying beneficial mutations in an asexual population is to observe changes in the frequency of marked subsets of the population. We here describe a method to estimate the establishment times and fitnesses of beneficial mutations from neutral marker frequency data. The method accurately reproduces complex marker frequency trajectories. In simulations for which positive selection is close to 5% per generation, we obtain correlations upwards of 0.91 between correct and inferred haplotype establishment times. Where mutation selection coefficients are exponentially distributed, the inferred distribution of haplotype fitnesses is close to being correct. Applied to data from a bacterial evolution experiment, our method reproduces an observed correlation between evolvability and initial fitness defect.

  6. On the importance of being structured: instantaneous coalescence rates and human evolution—lessons for ancestral population size inference?

    PubMed Central

    Mazet, O; Rodríguez, W; Grusea, S; Boitard, S; Chikhi, L

    2016-01-01

    Most species are structured and influenced by processes that either increased or reduced gene flow between populations. However, most population genetic inference methods assume panmixia and reconstruct a history characterized by population size changes. This is potentially problematic as population structure can generate spurious signals of population size change through time. Moreover, when the model assumed for demographic inference is misspecified, genomic data will likely increase the precision of misleading if not meaningless parameters. For instance, if data were generated under an n-island model (characterized by the number of islands and migrants exchanged) inference based on a model of population size change would produce precise estimates of a bottleneck that would be meaningless. In addition, archaeological or climatic events around the bottleneck's timing might provide a reasonable but potentially misleading scenario. In a context of model uncertainty (panmixia versus structure) genomic data may thus not necessarily lead to improved statistical inference. We consider two haploid genomes and develop a theory that explains why any demographic model with structure will necessarily be interpreted as a series of changes in population size by inference methods ignoring structure. We formalize a parameter, the inverse instantaneous coalescence rate, and show that it is equivalent to a population size only in panmictic models, and is mostly misleading for structured models. We argue that this issue affects all population genetics methods ignoring population structure which may thus infer population size changes that never took place. We apply our approach to human genomic data. PMID:26647653

  7. On the importance of being structured: instantaneous coalescence rates and human evolution--lessons for ancestral population size inference?

    PubMed

    Mazet, O; Rodríguez, W; Grusea, S; Boitard, S; Chikhi, L

    2016-04-01

    Most species are structured and influenced by processes that either increased or reduced gene flow between populations. However, most population genetic inference methods assume panmixia and reconstruct a history characterized by population size changes. This is potentially problematic as population structure can generate spurious signals of population size change through time. Moreover, when the model assumed for demographic inference is misspecified, genomic data will likely increase the precision of misleading if not meaningless parameters. For instance, if data were generated under an n-island model (characterized by the number of islands and migrants exchanged) inference based on a model of population size change would produce precise estimates of a bottleneck that would be meaningless. In addition, archaeological or climatic events around the bottleneck's timing might provide a reasonable but potentially misleading scenario. In a context of model uncertainty (panmixia versus structure) genomic data may thus not necessarily lead to improved statistical inference. We consider two haploid genomes and develop a theory that explains why any demographic model with structure will necessarily be interpreted as a series of changes in population size by inference methods ignoring structure. We formalize a parameter, the inverse instantaneous coalescence rate, and show that it is equivalent to a population size only in panmictic models, and is mostly misleading for structured models. We argue that this issue affects all population genetics methods ignoring population structure which may thus infer population size changes that never took place. We apply our approach to human genomic data.

  8. PyClone: statistical inference of clonal population structure in cancer.

    PubMed

    Roth, Andrew; Khattra, Jaswinder; Yap, Damian; Wan, Adrian; Laks, Emma; Biele, Justina; Ha, Gavin; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P

    2014-04-01

    We introduce PyClone, a statistical model for inference of clonal population structures in cancers. PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy-number changes and normal-cell contamination. Single-cell sequencing validation demonstrates PyClone's accuracy.

  9. Inferring the genetic history of lactase persistence along the Italian peninsula from a large genomic interval surrounding the LCT gene.

    PubMed

    De Fanti, Sara; Sazzini, Marco; Giuliani, Cristina; Frazzoni, Federica; Sarno, Stefania; Boattini, Alessio; Marasco, Elena; Mantovani, Vilma; Franceschi, Claudio; Moral, Pedro; Garagnani, Paolo; Luiselli, Donata

    2015-12-01

    Although genetic variants related to lactase persistence in European populations were supposed to have firstly undergone positive selection in farmers from the Balkans and Central Europe, demographic and evolutionary dynamics that subsequently shaped the distribution of this adaptive trait across the continent have still to be elucidated. To deepen the knowledge about potential routes of diffusion of lactase persistence to Western Europe we investigated variation at a large genomic region surrounding the LCT gene along the Italian peninsula, a geographical area that played a key role in population movements responsible for Neolithic diffusion across Europe. By genotyping 40 highly selected SNPs in more than 400 Italian individuals we described gradients of nucleotide and haplotype variation potentially related to lactase persistence and compared them with those observed in several European and Mediterranean human groups. Multiple migratory events responsible for earlier introduction of the examined alleles in Italy than in Northern European regions could be invoked. Different demic processes occurred along the western and eastern sides of the peninsula were also inferred via linkage disequilibrium and population structure analyses. The appreciable genetic continuum observed between people from Northern or Central-Western Italy and Central European populations suggested a local arrival of lactase persistence-related variants mainly via overland routes. On the contrary, diversity of Central-Eastern and Southern Italian groups entailed also gene flow from South-Eastern Mediterranean regions, in accordance to the earlier entrance of the Neolithic in Southern Italy via maritime population movements along the Mediterranean coastlines. © 2015 Wiley Periodicals, Inc.

  10. Genetic differentiation between sandfly populations of Phlebotomus chinensis and Phlebotomus sichuanensis (Diptera: Psychodidae) in China inferred by microsatellites

    PubMed Central

    2013-01-01

    Background Phlebotomus chinensis is a primary vector of visceral leishmaniasis; it occurs in various biotopes with a large geographical distribution, ranging from Yangtze River to northeast China. Phlebotomus sichuanensis, a species closely related to P. chinensis in high altitude regions, has a long term disputation on its taxonomic status. Both species occur in the current epidemic regions and are responsible for the transmission of leishmaniasis. Population genetic analysis will help to understand the population structure and infer the relationship for morphologically indistinguishable cryptic species. In this study, microsatellite markers were used for studying the genetic differentiation between P. chinensis and P. sichuanensis. Methods Sandflies were collected in 6 representative localities in China in 2005-2009. Ten microsatellite loci were used to estimate population genetic diversity. The intra-population genetic diversity, genetic differentiation and effective population size were estimated. Results All 10 microsatellite loci were highly polymorphic across populations, with high allelic richness and heterozygosity. Hardy-Weinberg disequilibrium was found in 23 out of 60 (38.33%) comparisons associated with heterozygote deficits, which was likely caused by the presence of null allele and the Wahlund effect. Bayesian clustering analysis revealed three clusters. The cluster I included almost all specimens in the sample SCD collected at high altitude habitats in Sichuan. The other two clusters were shared by the remaining 5 populations, SCJ in Sichuan, GSZ in Gansu, SXL and SXX in Shaanxi and HNS in Henan. The diversity among these 5 populations was low (FST = -0.003-0.090) and no isolation by distance was detected. AMOVA analysis suggested that the variations were largely derived from individuals within populations and among individuals. Consistently, the analysis of ribosomal DNA second internal transcribed spacer (ITS2) sequence uncovered three types of

  11. Exploring iris colour prediction and ancestry inference in admixed populations of South America.

    PubMed

    Freire-Aradas, A; Ruiz, Y; Phillips, C; Maroñas, O; Söchtig, J; Tato, A Gómez; Dios, J Álvarez; de Cal, M Casares; Silbiger, V N; Luchessi, A D; Luchessi, A D; Chiurillo, M A; Carracedo, Á; Lareu, M V

    2014-11-01

    New DNA-based predictive tests for physical characteristics and inference of ancestry are highly informative tools that are being increasingly used in forensic genetic analysis. Two eye colour prediction models: a Bayesian classifier - Snipper and a multinomial logistic regression (MLR) system for the Irisplex assay, have been described for the analysis of unadmixed European populations. Since multiple SNPs in combination contribute in varying degrees to eye colour predictability in Europeans, it is likely that these predictive tests will perform in different ways amongst admixed populations that have European co-ancestry, compared to unadmixed Europeans. In this study we examined 99 individuals from two admixed South American populations comparing eye colour versus ancestry in order to reveal a direct correlation of light eye colour phenotypes with European co-ancestry in admixed individuals. Additionally, eye colour prediction following six prediction models, using varying numbers of SNPs and based on Snipper and MLR, were applied to the study populations. Furthermore, patterns of eye colour prediction have been inferred for a set of publicly available admixed and globally distributed populations from the HGDP-CEPH panel and 1000 Genomes databases with a special emphasis on admixed American populations similar to those of the study samples.

  12. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies.

    PubMed Central

    Falush, Daniel; Stephens, Matthew; Pritchard, Jonathan K

    2003-01-01

    We describe extensions to the method of Pritchard et al. for inferring population structure from multilocus genotype data. Most importantly, we develop methods that allow for linkage between loci. The new model accounts for the correlations between linked loci that arise in admixed populations ("admixture linkage disequilibium"). This modification has several advantages, allowing (1) detection of admixture events farther back into the past, (2) inference of the population of origin of chromosomal regions, and (3) more accurate estimates of statistical uncertainty when linked loci are used. It is also of potential use for admixture mapping. In addition, we describe a new prior model for the allele frequencies within each population, which allows identification of subtle population subdivisions that were not detectable using the existing method. We present results applying the new methods to study admixture in African-Americans, recombination in Helicobacter pylori, and drift in populations of Drosophila melanogaster. The methods are implemented in a program, structure, version 2.0, which is available at http://pritch.bsd.uchicago.edu. PMID:12930761

  13. Inferring human population size and separation history from multiple genome sequences

    PubMed Central

    Schiffels, Stephan; Durbin, Richard

    2014-01-01

    The availability of complete human genome sequences from populations across the world has given rise to new population genetic inference methods that explicitly model their ancestral relationship under recombination and mutation. So far, application of these methods to evolutionary history more recent than 20-30 thousand years ago and to population separations has been limited. Here we present a new method that overcomes these shortcomings. The Multiple Sequentially Markovian Coalescent (MSMC) analyses the observed pattern of mutations in multiple individuals, focusing on the first coalescence between any two individuals. Results from applying MSMC to genome sequences from nine populations across the world suggest that the genetic separation of non-African ancestors from African Yoruban ancestors started long before 50,000 years ago, and give information about human population history as recently as 2,000 years ago, including the bottleneck in the peopling of the Americas, and separations within Africa, East Asia and Europe. PMID:24952747

  14. The large impact process inferred from the geology of lunar multiring basins

    NASA Technical Reports Server (NTRS)

    Spudis, Paul D.

    1992-01-01

    The nature of the impact process has been inferred through the study of the geology of a wide variety of impact crater types and sizes. Some of the largest craters known are the multiring basins found in ancient terrains of the terrestrial planets. Of these features, those found on the Moon possess the most extensive and diverse data coverage, including morphological, geochemical, geophysical, and sample data. The study of the geology of lunar basins over the past 10 years has given us a rudimentary understanding of how these large structures have formed and evolved. The topics covered include basin morphology, basin ejecta, basin excavation, and basin ring formation.

  15. Bayesian inference of the initial conditions from large-scale structure surveys

    NASA Astrophysics Data System (ADS)

    Leclercq, Florent

    2016-10-01

    Analysis of three-dimensional cosmological surveys has the potential to answer outstanding questions on the initial conditions from which structure appeared, and therefore on the very high energy physics at play in the early Universe. We report on recently proposed statistical data analysis methods designed to study the primordial large-scale structure via physical inference of the initial conditions in a fully Bayesian framework, and applications to the Sloan Digital Sky Survey data release 7. We illustrate how this approach led to a detailed characterization of the dynamic cosmic web underlying the observed galaxy distribution, based on the tidal environment.

  16. Integrative inference of population history in the Ibero-Maghrebian endemic Pleurodeles waltl (Salamandridae).

    PubMed

    Gutiérrez-Rodríguez, Jorge; Barbosa, A Márcia; Martínez-Solano, Íñigo

    2017-07-01

    Inference of population histories from the molecular signatures of past demographic processes is challenging, but recent methodological advances in species distribution models and their integration in time-calibrated phylogeographic studies allow detailed reconstruction of complex biogeographic scenarios. We apply an integrative approach to infer the evolutionary history of the Iberian ribbed newt (Pleurodeles waltl), an Ibero-Maghrebian endemic with populations north and south of the Strait of Gibraltar. We analyzed an extensive multilocus dataset (mitochondrial and nuclear DNA sequences and ten polymorphic microsatellite loci) and found a deep east-west phylogeographic break in Iberian populations dating back to the Plio-Pleistocene. This break is inferred to result from vicariance associated with the formation of the Guadalquivir river basin. In contrast with previous studies, North African populations showed exclusive mtDNA haplotypes, and formed a monophyletic clade within the Eastern Iberian lineage in the mtDNA genealogy. On the other hand, microsatellites failed to recover Moroccan populations as a differentiated genetic cluster. This is interpreted to result from post-divergence gene flow based on the results of IMA2 and Migrate analyses. Thus, Moroccan populations would have originated after overseas dispersal from the Iberian Peninsula in the Pleistocene, with subsequent gene flow in more recent times, implying at least two trans-marine dispersal events. We modeled the distribution of the species and of each lineage, and projected these models back in time to infer climatically favourable areas during the mid-Holocene, the last glacial maximum (LGM) and the last interglacial (LIG), to reconstruct more recent population dynamics. We found minor differences in climatic favourability across lineages, suggesting intraspecific niche conservatism. Genetic diversity was significantly correlated with the intersection of environmental favourability in the LIG and

  17. Inference of Population History by Coupling Exploratory and Model-Driven Phylogeographic Analyses

    PubMed Central

    Garrick, Ryan C.; Caccone, Adalgisa; Sunnucks, Paul

    2010-01-01

    Understanding the nature, timing and geographic context of historical events and population processes that shaped the spatial distribution of genetic diversity is critical for addressing questions relating to speciation, selection, and applied conservation management. Cladistic analysis of gene trees has been central to phylogeography, but when coupled with approaches that make use of different components of the information carried by DNA sequences and their frequencies, the strength and resolution of these inferences can be improved. However, assessing concordance of inferences drawn using different analytical methods or genetic datasets, and integrating their outcomes, can be challenging. Here we overview the strengths and limitations of different types of genetic data, analysis methods, and approaches to historical inference. We then turn our attention to the potentially synergistic interactions among widely-used and emerging phylogeographic analyses, and discuss some of the ways that spatial and temporal concordance among inferences can be assessed. We close this review with a brief summary and outlook on future research directions. PMID:20480016

  18. Gaussian process-based Bayesian nonparametric inference of population size trajectories from gene genealogies.

    PubMed

    Palacios, Julia A; Minin, Vladimir N

    2013-03-01

    Changes in population size influence genetic diversity of the population and, as a result, leave a signature of these changes in individual genomes in the population. We are interested in the inverse problem of reconstructing past population dynamics from genomic data. We start with a standard framework based on the coalescent, a stochastic process that generates genealogies connecting randomly sampled individuals from the population of interest. These genealogies serve as a glue between the population demographic history and genomic sequences. It turns out that only the times of genealogical lineage coalescences contain information about population size dynamics. Viewing these coalescent times as a point process, estimating population size trajectories is equivalent to estimating a conditional intensity of this point process. Therefore, our inverse problem is similar to estimating an inhomogeneous Poisson process intensity function. We demonstrate how recent advances in Gaussian process-based nonparametric inference for Poisson processes can be extended to Bayesian nonparametric estimation of population size dynamics under the coalescent. We compare our Gaussian process (GP) approach to one of the state-of-the-art Gaussian Markov random field (GMRF) methods for estimating population trajectories. Using simulated data, we demonstrate that our method has better accuracy and precision. Next, we analyze two genealogies reconstructed from real sequences of hepatitis C and human Influenza A viruses. In both cases, we recover more believed aspects of the viral demographic histories than the GMRF approach. We also find that our GP method produces more reasonable uncertainty estimates than the GMRF method.

  19. Inference of sex-specific expansion patterns in human populations from Y-chromosome polymorphism.

    PubMed

    Aimé, Carla; Heyer, Evelyne; Austerlitz, Frédéric

    2015-06-01

    Studying the current distribution of genetic diversity in humans has important implications for our understanding of the history of our species. We analyzed a set of linked STR and SNP loci from the paternally inherited Y chromosome to infer the past demography of 55 African and Eurasian populations, using both the parametric and nonparametric coalescent-based methods implemented in the BEAST application. We inferred expansion events in most sedentary farmer populations, while we found constant effective population sizes for both nomadic hunter-gatherers and seminomadic herders. Our results differed, on several aspects, from previous results on mtDNA and autosomal markers. First, we found more recent expansion patterns in Eurasia than in Africa. This discrepancy, substantially stronger than the ones found with the other kind of markers, may result from a lower effective population size for men, which might have made male-transmitted markers more sensitive to the out-of-Africa bottleneck. Second, we found expansion signals only for sedentary farmers but not for nomadic herders in Central Asia, while these signals were found for both kind of populations in this area when using mtDNA or autosomal markers. Expansion signals in this area may result from spatial expansion processes and may have been erased for the Y chromosome among the herders because of restricted male gene flow. © 2014 Wiley Periodicals, Inc.

  20. Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining

    SciTech Connect

    Hero, Alfred O.; Rajaratnam, Bala

    2015-12-09

    When can reliable inference be drawn in the ‘‘Big Data’’ context? This article presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large-scale inference. In large-scale data applications like genomics, connectomics, and eco-informatics, the data set is often variable rich but sample starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than the number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for ‘‘Big Data.’’ Sample complexity, however, has received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; and 3) the purely high-dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high-dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. We demonstrate various regimes of correlation mining based on the unifying perspective of high-dimensional learning rates and sample complexity for different structured covariance models and different inference

  1. Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining

    DOE PAGES

    Hero, Alfred O.; Rajaratnam, Bala

    2015-12-09

    When can reliable inference be drawn in the ‘‘Big Data’’ context? This article presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large-scale inference. In large-scale data applications like genomics, connectomics, and eco-informatics, the data set is often variable rich but sample starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than the number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for ‘‘Big Data.’’ Sample complexity, however, hasmore » received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; and 3) the purely high-dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high-dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. We demonstrate various regimes of correlation mining based on the unifying perspective of high-dimensional learning rates and sample complexity for different structured covariance models and different

  2. Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining.

    PubMed

    Hero, Alfred O; Rajaratnam, Bala

    2016-01-01

    When can reliable inference be drawn in fue "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, wifu implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics fue dataset is often variable-rich but sample-starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than fue number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data". Sample complexity however has received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address fuis gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where fue variable dimension is fixed and fue sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa cale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables fua t are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. we demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks.

  3. Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining

    PubMed Central

    Hero, Alfred O.; Rajaratnam, Bala

    2015-01-01

    When can reliable inference be drawn in fue “Big Data” context? This paper presents a framework for answering this fundamental question in the context of correlation mining, wifu implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics fue dataset is often variable-rich but sample-starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than fue number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for “Big Data”. Sample complexity however has received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address fuis gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where fue variable dimension is fixed and fue sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa cale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables fua t are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. we demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks. PMID:27087700

  4. Joint inference of microsatellite mutation models, population history and genealogies using transdimensional Markov Chain Monte Carlo.

    PubMed

    Wu, Chieh-Hsi; Drummond, Alexei J

    2011-05-01

    We provide a framework for Bayesian coalescent inference from microsatellite data that enables inference of population history parameters averaged over microsatellite mutation models. To achieve this we first implemented a rich family of microsatellite mutation models and related components in the software package BEAST. BEAST is a powerful tool that performs Bayesian MCMC analysis on molecular data to make coalescent and evolutionary inferences. Our implementation permits the application of existing nonparametric methods to microsatellite data. The implemented microsatellite models are based on the replication slippage mechanism and focus on three properties of microsatellite mutation: length dependency of mutation rate, mutational bias toward expansion or contraction, and number of repeat units changed in a single mutation event. We develop a new model that facilitates microsatellite model averaging and Bayesian model selection by transdimensional MCMC. With Bayesian model averaging, the posterior distributions of population history parameters are integrated across a set of microsatellite models and thus account for model uncertainty. Simulated data are used to evaluate our method in terms of accuracy and precision of estimation and also identification of the true mutation model. Finally we apply our method to a red colobus monkey data set as an example.

  5. Haplotype inference from short sequence reads using a population genealogical history model.

    PubMed

    Zhang, Jin; Wu, Yufeng

    2011-01-01

    High-throughput sequencing is currently a major transforming technology in biology. In this paper, we study a population genomics problem motivated by the newly available short reads data from high-throughput sequencing. In this problem, we are given short reads collected from individuals in a population. The objective is to infer haplotypes with the given reads. We first formulate the computational problem of haplotype inference with short reads. Based on a simple probabilistic model on short reads, we present a new approach of inferring haplotypes directly from given reads (i.e. without first calling genotypes). Our method is finding the most likely haplotypes whose local genealogical history can be approximately modeled as a perfect phylogeny. We show that the optimal haplotypes under this objective can be found for many data using integer linear programming for modest sized data when there is no recombination. We then develop a related heuristic method which can work with larger data, and also allows recombination. Simulation shows that the performance of our method is competitive against alternative approaches.

  6. Inferring genome-wide patterns of admixture in Qataris using fifty-five ancestral populations

    PubMed Central

    2012-01-01

    Background Populations of the Arabian Peninsula have a complex genetic structure that reflects waves of migrations including the earliest human migrations from Africa and eastern Asia, migrations along ancient civilization trading routes and colonization history of recent centuries. Results Here, we present a study of genome-wide admixture in this region, using 156 genotyped individuals from Qatar, a country located at the crossroads of these migration patterns. Since haplotypes of these individuals could have originated from many different populations across the world, we have developed a machine learning method "SupportMix" to infer loci-specific genomic ancestry when simultaneously analyzing many possible ancestral populations. Simulations show that SupportMix is not only more accurate than other popular admixture discovery tools but is the first admixture inference method that can efficiently scale for simultaneous analysis of 50-100 putative ancestral populations while being independent of prior demographic information. Conclusions By simultaneously using the 55 world populations from the Human Genome Diversity Panel, SupportMix was able to extract the fine-scale ancestry of the Qatar population, providing many new observations concerning the ancestry of the region. For example, as well as recapitulating the three major sub-populations in Qatar, composed of mainly Arabic, Persian, and African ancestry, SupportMix additionally identifies the specific ancestry of the Persian group to populations sampled in Greater Persia rather than from China and the ancestry of the African group to sub-Saharan origin and not Southern African Bantu origin as previously thought. PMID:22734698

  7. Streamlining and Large Ancestral Genomes in Archaea Inferred with a Phylogenetic Birth-and-Death Model

    PubMed Central

    Miklós, István

    2009-01-01

    Homologous genes originate from a common ancestor through vertical inheritance, duplication, or horizontal gene transfer. Entire homolog families spawned by a single ancestral gene can be identified across multiple genomes based on protein sequence similarity. The sequences, however, do not always reveal conclusively the history of large families. To study the evolution of complete gene repertoires, we propose here a mathematical framework that does not rely on resolved gene family histories. We show that so-called phylogenetic profiles, formed by family sizes across multiple genomes, are sufficient to infer principal evolutionary trends. The main novelty in our approach is an efficient algorithm to compute the likelihood of a phylogenetic profile in a model of birth-and-death processes acting on a phylogeny. We examine known gene families in 28 archaeal genomes using a probabilistic model that involves lineage- and family-specific components of gene acquisition, duplication, and loss. The model enables us to consider all possible histories when inferring statistics about archaeal evolution. According to our reconstruction, most lineages are characterized by a net loss of gene families. Major increases in gene repertoire have occurred only a few times. Our reconstruction underlines the importance of persistent streamlining processes in shaping genome composition in Archaea. It also suggests that early archaeal genomes were as complex as typical modern ones, and even show signs, in the case of the methanogenic ancestor, of an extremely large gene repertoire. PMID:19570746

  8. Thinking too positive? Revisiting current methods of population genetic selection inference.

    PubMed

    Bank, Claudia; Ewing, Gregory B; Ferrer-Admettla, Anna; Foll, Matthieu; Jensen, Jeffrey D

    2014-12-01

    In the age of next-generation sequencing, the availability of increasing amounts and improved quality of data at decreasing cost ought to allow for a better understanding of how natural selection is shaping the genome than ever before. However, alternative forces, such as demography and background selection (BGS), obscure the footprints of positive selection that we would like to identify. In this review, we illustrate recent developments in this area, and outline a roadmap for improved selection inference. We argue (i) that the development and obligatory use of advanced simulation tools is necessary for improved identification of selected loci, (ii) that genomic information from multiple time points will enhance the power of inference, and (iii) that results from experimental evolution should be utilized to better inform population genomic studies.

  9. Alternative Model-Based and Design-Based Frameworks for Inference from Samples to Populations: From Polarization to Integration

    ERIC Educational Resources Information Center

    Sterba, Sonya K.

    2009-01-01

    A model-based framework, due originally to R. A. Fisher, and a design-based framework, due originally to J. Neyman, offer alternative mechanisms for inference from samples to populations. We show how these frameworks can utilize different types of samples (nonrandom or random vs. only random) and allow different kinds of inference (descriptive vs.…

  10. Alternative Model-Based and Design-Based Frameworks for Inference from Samples to Populations: From Polarization to Integration

    ERIC Educational Resources Information Center

    Sterba, Sonya K.

    2009-01-01

    A model-based framework, due originally to R. A. Fisher, and a design-based framework, due originally to J. Neyman, offer alternative mechanisms for inference from samples to populations. We show how these frameworks can utilize different types of samples (nonrandom or random vs. only random) and allow different kinds of inference (descriptive vs.…

  11. Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation.

    PubMed

    Cornuet, Jean-Marie; Santos, Filipe; Beaumont, Mark A; Robert, Christian P; Marin, Jean-Michel; Balding, David J; Guillemaud, Thomas; Estoup, Arnaud

    2008-12-01

    Genetic data obtained on population samples convey information about their evolutionary history. Inference methods can extract part of this information but they require sophisticated statistical techniques that have been made available to the biologist community (through computer programs) only for simple and standard situations typically involving a small number of samples. We propose here a computer program (DIY ABC) for inference based on approximate Bayesian computation (ABC), in which scenarios can be customized by the user to fit many complex situations involving any number of populations and samples. Such scenarios involve any combination of population divergences, admixtures and population size changes. DIY ABC can be used to compare competing scenarios, estimate parameters for one or more scenarios and compute bias and precision measures for a given scenario and known values of parameters (the current version applies to unlinked microsatellite data). This article describes key methods used in the program and provides its main features. The analysis of one simulated and one real dataset, both with complex evolutionary scenarios, illustrates the main possibilities of DIY ABC. The software DIY ABC is freely available at http://www.montpellier.inra.fr/CBGP/diyabc.

  12. Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation

    PubMed Central

    Cornuet, Jean-Marie; Santos, Filipe; Beaumont, Mark A.; Robert, Christian P.; Marin, Jean-Michel; Balding, David J.; Guillemaud, Thomas; Estoup, Arnaud

    2008-01-01

    Summary: Genetic data obtained on population samples convey information about their evolutionary history. Inference methods can extract part of this information but they require sophisticated statistical techniques that have been made available to the biologist community (through computer programs) only for simple and standard situations typically involving a small number of samples. We propose here a computer program (DIY ABC) for inference based on approximate Bayesian computation (ABC), in which scenarios can be customized by the user to fit many complex situations involving any number of populations and samples. Such scenarios involve any combination of population divergences, admixtures and population size changes. DIY ABC can be used to compare competing scenarios, estimate parameters for one or more scenarios and compute bias and precision measures for a given scenario and known values of parameters (the current version applies to unlinked microsatellite data). This article describes key methods used in the program and provides its main features. The analysis of one simulated and one real dataset, both with complex evolutionary scenarios, illustrates the main possibilities of DIY ABC. Availability: The software DIY ABC is freely available at http://www.montpellier.inra.fr/CBGP/diyabc. Contact: j.cornuet@imperial.ac.uk Supplementary information: Supplementary data are also available at http://www.montpellier.inra.fr/CBGP/diyabc PMID:18842597

  13. Phylogeographic Triangulation: Using Predator-Prey-Parasite Interactions to Infer Population History from Partial Genetic Information

    PubMed Central

    Barbosa, A. Márcia; Thode, Guillermo; Real, Raimundo; Feliu, Carlos; Vargas, J. Mario

    2012-01-01

    Phylogeographic studies, which infer population history and dispersal movements from intra-specific spatial genetic variation, require expensive and time-consuming analyses that are not always feasible, especially in the case of rare or endangered species. On the other hand, comparative phylogeography of species involved in close biotic interactions may show congruent patterns depending on the specificity of the relationship. Consequently, the phylogeography of a parasite that needs two hosts to complete its life cycle should reflect population history traits of both hosts. Population movements evidenced by the parasite’s phylogeography that are not reflected in the phylogeography of one of these hosts may thus be attributed to the other host. Using the wild rabbit (Oryctolagus cuniculus) and a parasitic tapeworm (Taenia pisiformis) as an example, we propose comparing the phylogeography of easily available organisms such as game species and their specific heteroxenous parasites to infer population movements of definitive host/predator species, independently of performing genetic analyses on the latter. This may be an interesting approach for indirectly studying the history of species whose phylogeography is difficult to analyse directly. PMID:23209834

  14. Large-scale Inference Problems in Astronomy: Building a 3D Galactic Dust Map

    NASA Astrophysics Data System (ADS)

    Finkbeiner, Douglas

    2016-03-01

    The term ''Big Data'' has become trite, as modern technology has made data sets of terabytes or even petabytes easy to store. Such data sets provide a sandbox in which to develop new statistical inference techniques that can extract interesting results from increasingly rich (and large) databases. I will give an example from my work on mapping the interstellar dust of the Milky Way. 2D emission-based maps have been used for decades to estimate the reddening and emission from interstellar dust, with applications from CMB foregrounds to surveys of large-scale structure. For studies within the Milky Way, however, the third dimension is required. I will present our work on a 3D dust map based on Pan-STARRS1 and 2MASS over 3/4 of the sky (http://arxiv.org/abs/1507.01005), assess its usefulness relative to other dust maps, and discuss future work. Supported by the NSF.

  15. Inference of 3-dimensional structure underlying large-scale coronal events observed by Yohkoh and Ulysses

    NASA Technical Reports Server (NTRS)

    Slater, G. L.; Freeland, S. L.; Hoeksema, T.; Zhao, X.; Hudson, H. S.

    1995-01-01

    The Yohkoh/SXT images provide full-disk coverage of the solar corona, usually extending before and after one of the large-scale eruptive events that occur in the polar crown These produce large arcades of X-ray loops, often with a cusp-shaped coronal extension, and are known to be associated with coronal mass ejections. The Yohkoh prototype of such events occurred 12 Nov. 1991. This allows us to determine heights from the apparent rotation rates of these structures. In comparison v with magnetic-field extrapolations from Wilcox Solar Observatory. use use this tool to infer the three dimensional structure of the corona in particular cases: 24 Jan. 1992, 24 Feb. 1993, 14 Apr. 1994, and 13 Nov. 1994. The last event is a long-duration flare event.

  16. Inferring cetacean population densities from the absolute dynamic topography of the ocean in a hierarchical Bayesian framework.

    PubMed

    Pardo, Mario A; Gerrodette, Tim; Beier, Emilio; Gendron, Diane; Forney, Karin A; Chivers, Susan J; Barlow, Jay; Palacios, Daniel M

    2015-01-01

    We inferred the population densities of blue whales (Balaenoptera musculus) and short-beaked common dolphins (Delphinus delphis) in the Northeast Pacific Ocean as functions of the water-column's physical structure by implementing hierarchical models in a Bayesian framework. This approach allowed us to propagate the uncertainty of the field observations into the inference of species-habitat relationships and to generate spatially explicit population density predictions with reduced effects of sampling heterogeneity. Our hypothesis was that the large-scale spatial distributions of these two cetacean species respond primarily to ecological processes resulting from shoaling and outcropping of the pycnocline in regions of wind-forced upwelling and eddy-like circulation. Physically, these processes affect the thermodynamic balance of the water column, decreasing its volume and thus the height of the absolute dynamic topography (ADT). Biologically, they lead to elevated primary productivity and persistent aggregation of low-trophic-level prey. Unlike other remotely sensed variables, ADT provides information about the structure of the entire water column and it is also routinely measured at high spatial-temporal resolution by satellite altimeters with uniform global coverage. Our models provide spatially explicit population density predictions for both species, even in areas where the pycnocline shoals but does not outcrop (e.g. the Costa Rica Dome and the North Equatorial Countercurrent thermocline ridge). Interannual variations in distribution during El Niño anomalies suggest that the population density of both species decreases dramatically in the Equatorial Cold Tongue and the Costa Rica Dome, and that their distributions retract to particular areas that remain productive, such as the more oceanic waters in the central California Current System, the northern Gulf of California, the North Equatorial Countercurrent thermocline ridge, and the more southern portion of the

  17. Inferring Cetacean Population Densities from the Absolute Dynamic Topography of the Ocean in a Hierarchical Bayesian Framework

    PubMed Central

    Pardo, Mario A.; Gerrodette, Tim; Beier, Emilio; Gendron, Diane; Forney, Karin A.; Chivers, Susan J.; Barlow, Jay; Palacios, Daniel M.

    2015-01-01

    We inferred the population densities of blue whales (Balaenoptera musculus) and short-beaked common dolphins (Delphinus delphis) in the Northeast Pacific Ocean as functions of the water-column’s physical structure by implementing hierarchical models in a Bayesian framework. This approach allowed us to propagate the uncertainty of the field observations into the inference of species-habitat relationships and to generate spatially explicit population density predictions with reduced effects of sampling heterogeneity. Our hypothesis was that the large-scale spatial distributions of these two cetacean species respond primarily to ecological processes resulting from shoaling and outcropping of the pycnocline in regions of wind-forced upwelling and eddy-like circulation. Physically, these processes affect the thermodynamic balance of the water column, decreasing its volume and thus the height of the absolute dynamic topography (ADT). Biologically, they lead to elevated primary productivity and persistent aggregation of low-trophic-level prey. Unlike other remotely sensed variables, ADT provides information about the structure of the entire water column and it is also routinely measured at high spatial-temporal resolution by satellite altimeters with uniform global coverage. Our models provide spatially explicit population density predictions for both species, even in areas where the pycnocline shoals but does not outcrop (e.g. the Costa Rica Dome and the North Equatorial Countercurrent thermocline ridge). Interannual variations in distribution during El Niño anomalies suggest that the population density of both species decreases dramatically in the Equatorial Cold Tongue and the Costa Rica Dome, and that their distributions retract to particular areas that remain productive, such as the more oceanic waters in the central California Current System, the northern Gulf of California, the North Equatorial Countercurrent thermocline ridge, and the more southern portion of

  18. A framework for inferring unobserved multistrain epidemic sub-populations using synchronization dynamics

    PubMed Central

    Forgoston, Eric; Shaw, Leah B.; Schwartz, Ira B.

    2015-01-01

    A new method is proposed to infer unobserved epidemic sub-populations by exploiting the synchronization properties of multistrain epidemic models. A model for dengue fever is driven by simulated data from secondary infective populations. Primary infective populations in the driven system synchronize to the correct values from the driver system. Most hospital cases of dengue are secondary infections, so this method provides a way to deduce unobserved primary infection levels. We derive center manifold equations that relate the driven system to the driver system and thus motivate the use of synchronization to predict unobserved primary infectives. Synchronization stability between primary and secondary infections is demonstrated through numerical measurements of conditional Lyapunov exponents and through time series simulations. PMID:26251155

  19. Inferring the Clonal Structure of Viral Populations from Time Series Sequencing

    PubMed Central

    Chedom, Donatien F.; Murcia, Pablo R.; Greenman, Chris D.

    2015-01-01

    RNA virus populations will undergo processes of mutation and selection resulting in a mixed population of viral particles. High throughput sequencing of a viral population subsequently contains a mixed signal of the underlying clones. We would like to identify the underlying evolutionary structures. We utilize two sources of information to attempt this; within segment linkage information, and mutation prevalence. We demonstrate that clone haplotypes, their prevalence, and maximum parsimony reticulate evolutionary structures can be identified, although the solutions may not be unique, even for complete sets of information. This is applied to a chain of influenza infection, where we infer evolutionary structures, including reassortment, and demonstrate some of the difficulties of interpretation that arise from deep sequencing due to artifacts such as template switching during PCR amplification. PMID:26571026

  20. Inferring population and metapopulation dynamics of Liparis loeselii from single-census and inventory data

    NASA Astrophysics Data System (ADS)

    Oostermeijer, J. G. B.; Hartman, Y.

    2014-10-01

    To conserve endangered species, information is needed on (meta)population responses to habitat quality and management. As possibilities for long-term studies are generally limited, it is important to obtain as much information as possible in a single field season. We obtained such single-census data for the orchid Liparis loeselii, a European Habitat Directive species. Stage structures of 15 Dutch dune and fen populations were related to vegetation structure, environmental indicators, and management. Botanical inventory records from 1930 to 2003 were used to infer population life spans. Cluster analysis did not reveal successional stage structure types. Dense populations with high recruitment mainly occurred in open, young-successional vegetation with high soil pH. High soil humidity and acidification negatively affected orchid densities. Early mowing was preferable over late mowing in dune slacks, because the latter reduced juvenile densities. The predominant population life span was three to eight years, and similar for dune slacks and fens. Longer life spans were occasionally observed at mown sites with influx of base-rich water. Our results suggest high metapopulation dynamics. Long-term metapopulation viability requires the formation of new habitat by dune slack formation in dunes and peat removal in fens. Population persistence can be prolonged to some extent by mowing, extensive grazing, or sod removal if natural habitat formation is impossible. Our study demonstrates that useful information on (meta)population ecology and viability can be obtained in a single field season.

  1. Improving inferences in population studies of rare species that are detected imperfectly

    USGS Publications Warehouse

    MacKenzie, D.I.; Nichols, J.D.; Sutton, N.; Kawanishi, K.; Bailey, L.L.

    2005-01-01

    For the vast majority of cases, it is highly unlikely that all the individuals of a population will be encountered during a study. Furthermore, it is unlikely that a constant fraction of the population is encountered over times, locations, or species to be compared. Hence, simple counts usually will not be good indices of population size. We recommend that detection probabilities (the probability of including an individual in a count) be estimated and incorporated into inference procedures. However, most techniques for estimating detection probability require moderate sample sizes, which may not be achievable when studying rare species. In order to improve the reliability of inferences from studies of rare species, we suggest two general approaches that researchers may wish to consider that incorporate the concept of imperfect detectability: (1) borrowing information about detectability or the other quantities of interest from other times, places, or species; and (2) using state variables other than abundance (e.g., species richness and occupancy). We illustrate these suggestions with examples and discuss the relative benefits and drawbacks of each approach.

  2. A potential large and persistent black carbon forcing over Northern Pacific inferred from satellite observations

    NASA Astrophysics Data System (ADS)

    Li, Zhongshu; Liu, Junfeng; Mauzerall, Denise L.; Li, Xiaoyuan; Fan, Songmiao; Horowitz, Larry W.; He, Cenlin; Yi, Kan; Tao, Shu

    2017-03-01

    Black carbon (BC) aerosol strongly absorbs solar radiation, which warms climate. However, accurate estimation of BC’s climate effect is limited by the uncertainties of its spatiotemporal distribution, especially over remote oceanic areas. The HIAPER Pole-to-Pole Observation (HIPPO) program from 2009 to 2011 intercepted multiple snapshots of BC profiles over Pacific in various seasons, and revealed a 2 to 5 times overestimate of BC by current global models. In this study, we compared the measurements from aircraft campaigns and satellites, and found a robust association between BC concentrations and satellite-retrieved CO, tropospheric NO2, and aerosol optical depth (AOD) (R2 > 0.8). This establishes a basis to construct a satellite-based column BC approximation (sBC*) over remote oceans. The inferred sBC* shows that Asian outflows in spring bring much more BC aerosols to the mid-Pacific than those occurring in other seasons. In addition, inter-annual variability of sBC* is seen over the Northern Pacific, with abundances varying consistently with the springtime Pacific/North American (PNA) index. Our sBC* dataset infers a widespread overestimation of BC loadings and BC Direct Radiative Forcing by current models over North Pacific, which further suggests that large uncertainties exist on aerosol-climate interactions over other remote oceanic areas beyond Pacific.

  3. Unusually large earthquakes inferred from tsunami deposits along the Kuril trench

    USGS Publications Warehouse

    Nanayama, F.; Satake, K.; Furukawa, R.; Shimokawa, K.; Atwater, B.F.; Shigeno, K.; Yamaki, S.

    2003-01-01

    The Pacific plate converges with northeastern Eurasia at a rate of 8-9 m per century along the Kamchatka, Kuril and Japan trenches. Along the southern Kuril trench, which faces the Japanese island of Hokkaido, this fast subduction has recurrently generated earthquakes with magnitudes of up to ???8 over the past two centuries. These historical events, on rupture segments 100-200 km long, have been considered characteristic of Hokkaido's plate-boundary earthquakes. But here we use deposits of prehistoric tsunamis to infer the infrequent occurrence of larger earthquakes generated from longer ruptures. Many of these tsunami deposits form sheets of sand that extend kilometres inland from the deposits of historical tsunamis. Stratigraphic series of extensive sand sheets, intercalated with dated volcanic-ash layers, show that such unusually large tsunamis occurred about every 500 years on average over the past 2,000-7,000 years, most recently ???350 years ago. Numerical simulations of these tsunamis are best explained by earthquakes that individually rupture multiple segments along the southern Kuril trench. We infer that such multi-segment earthquakes persistently recur among a larger number of single-segment events.

  4. A potential large and persistent black carbon forcing over Northern Pacific inferred from satellite observations

    PubMed Central

    Li, Zhongshu; Liu, Junfeng; Mauzerall, Denise L.; Li, Xiaoyuan; Fan, Songmiao; Horowitz, Larry W.; He, Cenlin; Yi, Kan; Tao, Shu

    2017-01-01

    Black carbon (BC) aerosol strongly absorbs solar radiation, which warms climate. However, accurate estimation of BC’s climate effect is limited by the uncertainties of its spatiotemporal distribution, especially over remote oceanic areas. The HIAPER Pole-to-Pole Observation (HIPPO) program from 2009 to 2011 intercepted multiple snapshots of BC profiles over Pacific in various seasons, and revealed a 2 to 5 times overestimate of BC by current global models. In this study, we compared the measurements from aircraft campaigns and satellites, and found a robust association between BC concentrations and satellite-retrieved CO, tropospheric NO2, and aerosol optical depth (AOD) (R2 > 0.8). This establishes a basis to construct a satellite-based column BC approximation (sBC*) over remote oceans. The inferred sBC* shows that Asian outflows in spring bring much more BC aerosols to the mid-Pacific than those occurring in other seasons. In addition, inter-annual variability of sBC* is seen over the Northern Pacific, with abundances varying consistently with the springtime Pacific/North American (PNA) index. Our sBC* dataset infers a widespread overestimation of BC loadings and BC Direct Radiative Forcing by current models over North Pacific, which further suggests that large uncertainties exist on aerosol-climate interactions over other remote oceanic areas beyond Pacific. PMID:28266532

  5. Inference of higher-order relationships in the cycads from a large chloroplast data set.

    PubMed

    Rai, Hardeep S; O'Brien, Heath E; Reeves, Patrick A; Olmstead, Richard G; Graham, Sean W

    2003-11-01

    We investigated higher-order relationships in the cycads, an ancient group of seed-bearing plants, by examining a large portion of the chloroplast genome from seven species chosen to exemplify our current understanding of taxonomic diversity in the order. The regions considered span approximately 13.5 kb of unaligned data per taxon, and comprise a diverse range of coding sequences, introns and intergenic spacers dispersed throughout the plastid genome. Our results provide substantial support for most of the inferred backbone of cycad phylogeny, and weak evidence that the sister-group of the cycads among living seed plants is Ginkgo biloba. Cycas (representing Cycadaceae) is the sister-group of the remaining cycads; Dioon is part of the next most basal split. Two of the three commonly recognized families of cycads (Zamiaceae and Stangeriaceae) are not monophyletic; Stangeria is embedded within Zamiaceae, close to Zamia and Ceratozamia, and not closely allied to the other genus of Stangeriaceae, Bowenia. In contrast to the other seed plants, cycad chloroplast genomes share two features with Ginkgo: a reduced rate of evolution and an elevated transition:transversion ratio. We demonstrate that the latter aspect of their molecular evolution is unlikely to have affected inference of cycad relationships in the context of seed-plant wide analyses.

  6. A potential large and persistent black carbon forcing over Northern Pacific inferred from satellite observations.

    PubMed

    Li, Zhongshu; Liu, Junfeng; Mauzerall, Denise L; Li, Xiaoyuan; Fan, Songmiao; Horowitz, Larry W; He, Cenlin; Yi, Kan; Tao, Shu

    2017-03-07

    Black carbon (BC) aerosol strongly absorbs solar radiation, which warms climate. However, accurate estimation of BC's climate effect is limited by the uncertainties of its spatiotemporal distribution, especially over remote oceanic areas. The HIAPER Pole-to-Pole Observation (HIPPO) program from 2009 to 2011 intercepted multiple snapshots of BC profiles over Pacific in various seasons, and revealed a 2 to 5 times overestimate of BC by current global models. In this study, we compared the measurements from aircraft campaigns and satellites, and found a robust association between BC concentrations and satellite-retrieved CO, tropospheric NO2, and aerosol optical depth (AOD) (R(2) > 0.8). This establishes a basis to construct a satellite-based column BC approximation (sBC*) over remote oceans. The inferred sBC* shows that Asian outflows in spring bring much more BC aerosols to the mid-Pacific than those occurring in other seasons. In addition, inter-annual variability of sBC* is seen over the Northern Pacific, with abundances varying consistently with the springtime Pacific/North American (PNA) index. Our sBC* dataset infers a widespread overestimation of BC loadings and BC Direct Radiative Forcing by current models over North Pacific, which further suggests that large uncertainties exist on aerosol-climate interactions over other remote oceanic areas beyond Pacific.

  7. Model choice for phylogeographic inference using a large set of models.

    PubMed

    Pelletier, Tara A; Carstens, Bryan C

    2014-06-01

    Model-based analyses are common in phylogeographic inference because they parameterize processes such as population division, gene flow and expansion that are of interest to biologists. Approximate Bayesian computation is a model-based approach that can be customized to any empirical system and used to calculate the relative posterior probability of several models, provided that suitable models can be identified for comparison. The question of how to identify suitable models is explored using data from Plethodon idahoensis, a salamander that inhabits the North American inland northwest temperate rainforest. First, we conduct an ABC analysis using five models suggested by previous research, calculate the relative posterior probabilities and find that a simple model of population isolation has the best fit to the data (PP=0.70). In contrast to this subjective choice of models to include in the analysis, we also specify models in a more objective manner by simulating prior distributions for 143 models that included panmixia, population isolation, change in effective population size, migration and range expansion. We then identify a smaller subset of models for comparison by generating an expectation of the highest posterior probability that a false model is likely to achieve due to chance and calculate the relative posterior probabilities of only those models that exceed this expected level. A model that parameterized divergence with population expansion and gene flow in one direction offered the best fit to the P. idahoensis data (in contrast to an isolation-only model from the first analysis). Our investigation demonstrates that the determination of which models to include in ABC model choice experiments is a vital component of model-based phylogeographic analysis.

  8. Spatially explicit inference for open populations: estimating demographic parameters from camera-trap studies.

    PubMed

    Gardner, Beth; Reppucci, Juan; Lucherini, Mauro; Royle, J Andrew

    2010-11-01

    We develop a hierarchical capture-recapture model for demographically open populations when auxiliary spatial information about location of capture is obtained. Such spatial capture-recapture data arise from studies based on camera trapping, DNA sampling, and other situations in which a spatial array of devices records encounters of unique individuals. We integrate an individual-based formulation of a Jolly-Seber type model with recently developed spatially explicit capture-recapture models to estimate density and demographic parameters for survival and recruitment. We adopt a Bayesian framework for inference under this model using the method of data augmentation which is implemented in the software program WinBUGS. The model was motivated by a camera trapping study of Pampas cats Leopardus colocolo from Argentina, which we present as an illustration of the model in this paper. We provide estimates of density and the first quantitative assessment of vital rates for the Pampas cat in the High Andes. The precision of these estimates is poor due likely to the sparse data set. Unlike conventional inference methods which usually rely on asymptotic arguments, Bayesian inferences are valid in arbitrary sample sizes, and thus the method is ideal for the study of rare or endangered species for which small data sets are typical.

  9. A Population Genetics-Phylogenetics Approach to Inferring Natural Selection in Coding Sequences

    PubMed Central

    Wilson, Daniel J.; Hernandez, Ryan D.; Andolfatto, Peter; Przeworski, Molly

    2011-01-01

    Through an analysis of polymorphism within and divergence between species, we can hope to learn about the distribution of selective effects of mutations in the genome, changes in the fitness landscape that occur over time, and the location of sites involved in key adaptations that distinguish modern-day species. We introduce a novel method for the analysis of variation in selection pressures within and between species, spatially along the genome and temporally between lineages. We model codon evolution explicitly using a joint population genetics-phylogenetics approach that we developed for the construction of multiallelic models with mutation, selection, and drift. Our approach has the advantage of performing direct inference on coding sequences, inferring ancestral states probabilistically, utilizing allele frequency information, and generalizing to multiple species. We use a Bayesian sliding window model for intragenic variation in selection coefficients that efficiently combines information across sites and captures spatial clustering within the genome. To demonstrate the utility of the method, we infer selective pressures acting in Drosophila melanogaster and D. simulans from polymorphism and divergence data for 100 X-linked coding regions. PMID:22144911

  10. Spatially explicit inference for open populations: estimating demographic parameters from camera-trap studies

    USGS Publications Warehouse

    Gardner, Beth; Reppucci, Juan; Lucherini, Mauro; Royle, J. Andrew

    2010-01-01

    We develop a hierarchical capture–recapture model for demographically open populations when auxiliary spatial information about location of capture is obtained. Such spatial capture–recapture data arise from studies based on camera trapping, DNA sampling, and other situations in which a spatial array of devices records encounters of unique individuals. We integrate an individual-based formulation of a Jolly-Seber type model with recently developed spatially explicit capture–recapture models to estimate density and demographic parameters for survival and recruitment. We adopt a Bayesian framework for inference under this model using the method of data augmentation which is implemented in the software program WinBUGS. The model was motivated by a camera trapping study of Pampas cats Leopardus colocolo from Argentina, which we present as an illustration of the model in this paper. We provide estimates of density and the first quantitative assessment of vital rates for the Pampas cat in the High Andes. The precision of these estimates is poor due likely to the sparse data set. Unlike conventional inference methods which usually rely on asymptotic arguments, Bayesian inferences are valid in arbitrary sample sizes, and thus the method is ideal for the study of rare or endangered species for which small data sets are typical.

  11. Statistical inference for clinical trials with binary responses when there is a shift in patient population.

    PubMed

    Yang, Lan-Yan; Chi, Yunchan; Chow, Shein-Chung

    2011-05-01

    In clinical research, it is not uncommon to modify a trial procedure and/or statistical methods of ongoing clinical trials through protocol amendments. A major modification to the study protocol could result in a shift in target patient population. In addition, frequent and significant modifications could lead to a totally different study that is unable to address the medical questions that the original study intended to answer. In this article, we propose a logistic regression model for statistical inference based on a binary study endpoint for trials with protocol amendments. Under the proposed method, sample size adjustment is also derived.

  12. Multi-agent based control of large-scale complex systems employing distributed dynamic inference engine

    NASA Astrophysics Data System (ADS)

    Zhang, Daili

    Increasing societal demand for automation has led to considerable efforts to control large-scale complex systems, especially in the area of autonomous intelligent control methods. The control system of a large-scale complex system needs to satisfy four system level requirements: robustness, flexibility, reusability, and scalability. Corresponding to the four system level requirements, there arise four major challenges. First, it is difficult to get accurate and complete information. Second, the system may be physically highly distributed. Third, the system evolves very quickly. Fourth, emergent global behaviors of the system can be caused by small disturbances at the component level. The Multi-Agent Based Control (MABC) method as an implementation of distributed intelligent control has been the focus of research since the 1970s, in an effort to solve the above-mentioned problems in controlling large-scale complex systems. However, to the author's best knowledge, all MABC systems for large-scale complex systems with significant uncertainties are problem-specific and thus difficult to extend to other domains or larger systems. This situation is partly due to the control architecture of multiple agents being determined by agent to agent coupling and interaction mechanisms. Therefore, the research objective of this dissertation is to develop a comprehensive, generalized framework for the control system design of general large-scale complex systems with significant uncertainties, with the focus on distributed control architecture design and distributed inference engine design. A Hybrid Multi-Agent Based Control (HyMABC) architecture is proposed by combining hierarchical control architecture and module control architecture with logical replication rings. First, it decomposes a complex system hierarchically; second, it combines the components in the same level as a module, and then designs common interfaces for all of the components in the same module; third, replications

  13. Genetic diversity in India and the inference of Eurasian population expansion

    PubMed Central

    2010-01-01

    Background Genetic studies of populations from the Indian subcontinent are of great interest because of India's large population size, complex demographic history, and unique social structure. Despite recent large-scale efforts in discovering human genetic variation, India's vast reservoir of genetic diversity remains largely unexplored. Results To analyze an unbiased sample of genetic diversity in India and to investigate human migration history in Eurasia, we resequenced one 100-kb ENCODE region in 92 samples collected from three castes and one tribal group from the state of Andhra Pradesh in south India. Analyses of the four Indian populations, along with eight HapMap populations (692 samples), showed that 30% of all SNPs in the south Indian populations are not seen in HapMap populations. Several Indian populations, such as the Yadava, Mala/Madiga, and Irula, have nucleotide diversity levels as high as those of HapMap African populations. Using unbiased allele-frequency spectra, we investigated the expansion of human populations into Eurasia. The divergence time estimates among the major population groups suggest that Eurasian populations in this study diverged from Africans during the same time frame (approximately 90 to 110 thousand years ago). The divergence among different Eurasian populations occurred more than 40,000 years after their divergence with Africans. Conclusions Our results show that Indian populations harbor large amounts of genetic variation that have not been surveyed adequately by public SNP discovery efforts. Our data also support a delayed expansion hypothesis in which an ancestral Eurasian founding population remained isolated long after the out-of-Africa diaspora, before expanding throughout Eurasia. PMID:21106085

  14. Afro-derived Amazonian populations: inferring continental ancestry and population substructure.

    PubMed

    Lopes Maciel, Luana Gomes; Ribeiro Rodrigues, Elzemar Martins; Carneiro Dos Santos, Ney Pereira; Ribeiro Dos Santos, Ândrea; Guerreiro, João Farias; Santos, Sidney

    2011-10-01

    A panel of Ancestry Informative Markers (AIMs) was used to identify population substructure and estimate individual and overall interethnic admixture in 294 individuals from seven African-derived communities of the Brazilian Amazon. A panel of 48 biallelic markers, representing the insertion (IN) or the deletion (DEL) of small DNA fragments, was employed for this purpose. Overall interethnic admixture estimates showed high miscegenation with other ethnic groups in all populations (between 46% and 64%). The proportion of ancestral genes varied significantly among individuals of the sample: the contribution of African genes varied between 12% and 75%; of European genes between 10% and 73%; and of Amerindians genes between 8% and 66%. The obtained data reveal a high contribution of Amerindian genes in these communities, unlike in other African-derived communities of the Northeast and the South of Brazil. In addition, the majority of the Amerindian contribution may result from the preferential inclusion of indigenous women in the African descent groups. High heterogeneity of the proportion of interethnic admixture among analyzed individuals was found when the proportion of ancestral genes of each individual of the sample was estimated. This heterogeneity is reflected in the fact that four populations can be considered as substructured and that the global African descent sample is possibly formed by two subpopulations.

  15. Consistent estimation of complete neuronal connectivity in large neuronal populations using sparse "shotgun" neuronal activity sampling.

    PubMed

    Mishchenko, Yuriy

    2016-10-01

    We investigate the properties of recently proposed "shotgun" sampling approach for the common inputs problem in the functional estimation of neuronal connectivity. We study the asymptotic correctness, the speed of convergence, and the data size requirements of such an approach. We show that the shotgun approach can be expected to allow the inference of complete connectivity matrix in large neuronal populations under some rather general conditions. However, we find that the posterior error of the shotgun connectivity estimator grows quickly with the size of unobserved neuronal populations, the square of average connectivity strength, and the square of observation sparseness. This implies that the shotgun connectivity estimation will require significantly larger amounts of neuronal activity data whenever the number of neurons in observed neuronal populations remains small. We present a numerical approach for solving the shotgun estimation problem in general settings and use it to demonstrate the shotgun connectivity inference in the examples of simulated synfire and weakly coupled cortical neuronal networks.

  16. Population structure and demographic inferences concerning the endangered onychophoran species Epiperipatus acacioi (Onychophora: Peripatidae).

    PubMed

    Lacorte, G A; Oliveira, I S; Fonseca, C G

    2011-11-09

    Epiperipatus acacioi (Onychophora: Peripatidae) is an endemic species of the Atlantic rainforest in southeastern Brazil, with a restricted known distribution, found only in two nearby areas (Tripuí and Itacolomi). Mitochondrial gene COI sequences of 93 specimens collected across the known range of E. acacioi were used to assess the extant genetic diversity and patterns of genetic structure, as well as to infer the demographic history of this species. We found considerable variability within the populations, even though there has been recent environmental disturbance in these habitats. The samples from the two areas where this species is found showed significantly different COI sequences and constitute two distinct populations [exact test of sample differentiation (P = 0.0008) and pairwise F(ST) analyses (F(ST) = 0.214, P < 0.00001)]. However, there was little genetic differentiation among samples from different sampling sites within populations, suggesting that the potential for dispersal of E. acacioi greater than would have been expected, based on their cryptic behavior and reduced vagility. Mismatch analyses and neutrality tests revealed evidence of recent population expansion processes for both populations, possibly related to variations in the past distribution of this species.

  17. Population biology of establishment in New Zealand hedgehogs inferred from genetic and historical data: conflict or compromise?

    PubMed

    Bolfíková, Barbora; Konečný, Adam; Pfäffle, Miriam; Skuballa, Jasmin; Hulva, Pavel

    2013-07-01

    The crucial steps in biological invasions, related to the shaping of genetic architecture and the current evolution of adaptations to a novel environment, usually occur in small populations during the phases of introduction and establishment. However, these processes are difficult to track in nature due to invasion lag, large geographic and temporal scales compared with human observation capabilities, the frequent depletion of genetic variance, admixture and other phenomena. In this study, we compared genetic and historical evidence related to the invasion of the West European hedgehog to New Zealand to infer details about the introduction and establishment. Historical information indicates that the species was initially established on the South Island. A molecular assay of populations from Great Britain and New Zealand using mitochondrial sequences and nuclear microsatellite loci was performed based on a set of analyses including approximate Bayesian computation, a powerful approach for disentangling complex population demographies. According to these analyses, the population of the North Island was most similar to that of the native area and showed greatest reduction in genetic variation caused by founder demography and/or drift. This evidence indicated the location of the establishment phase. The hypothesis was corroborated by data on climate and urbanization. We discuss the contrasting results obtained by the molecular and historical approaches in the light of their different explanatory power and the possible biases influencing the description of particular aspects of invasions, and we advocate the integration of the two types of approaches in invasion biology.

  18. Inferred vs realized patterns of gene flow: an analysis of population structure in the Andros Island Rock Iguana.

    PubMed

    Colosimo, Giuliano; Knapp, Charles R; Wallace, Lisa E; Welch, Mark E

    2014-01-01

    Ecological data, the primary source of information on patterns and rates of migration, can be integrated with genetic data to more accurately describe the realized connectivity between geographically isolated demes. In this paper we implement this approach and discuss its implications for managing populations of the endangered Andros Island Rock Iguana, Cyclura cychlura cychlura. This iguana is endemic to Andros, a highly fragmented landmass of large islands and smaller cays. Field observations suggest that geographically isolated demes were panmictic due to high, inferred rates of gene flow. We expand on these observations using 16 polymorphic microsatellites to investigate the genetic structure and rates of gene flow from 188 Andros Iguanas collected across 23 island sites. Bayesian clustering of specimens assigned individuals to three distinct genotypic clusters. An analysis of molecular variance (AMOVA) indicates that allele frequency differences are responsible for a significant portion of the genetic variance across the three defined clusters (Fst =  0.117, p<0.01). These clusters are associated with larger islands and satellite cays isolated by broad water channels with strong currents. These findings imply that broad water channels present greater obstacles to gene flow than was inferred from field observation alone. Additionally, rates of gene flow were indirectly estimated using BAYESASS 3.0. The proportion of individuals originating from within each identified cluster varied from 94.5 to 98.7%, providing further support for local isolation. Our assessment reveals a major disparity between inferred and realized gene flow. We discuss our results in a conservation perspective for species inhabiting highly fragmented landscapes.

  19. Inferred vs Realized Patterns of Gene Flow: An Analysis of Population Structure in the Andros Island Rock Iguana

    PubMed Central

    Colosimo, Giuliano; Knapp, Charles R.; Wallace, Lisa E.; Welch, Mark E.

    2014-01-01

    Ecological data, the primary source of information on patterns and rates of migration, can be integrated with genetic data to more accurately describe the realized connectivity between geographically isolated demes. In this paper we implement this approach and discuss its implications for managing populations of the endangered Andros Island Rock Iguana, Cyclura cychlura cychlura. This iguana is endemic to Andros, a highly fragmented landmass of large islands and smaller cays. Field observations suggest that geographically isolated demes were panmictic due to high, inferred rates of gene flow. We expand on these observations using 16 polymorphic microsatellites to investigate the genetic structure and rates of gene flow from 188 Andros Iguanas collected across 23 island sites. Bayesian clustering of specimens assigned individuals to three distinct genotypic clusters. An analysis of molecular variance (AMOVA) indicates that allele frequency differences are responsible for a significant portion of the genetic variance across the three defined clusters (Fst =  0.117, p0.01). These clusters are associated with larger islands and satellite cays isolated by broad water channels with strong currents. These findings imply that broad water channels present greater obstacles to gene flow than was inferred from field observation alone. Additionally, rates of gene flow were indirectly estimated using BAYESASS 3.0. The proportion of individuals originating from within each identified cluster varied from 94.5 to 98.7%, providing further support for local isolation. Our assessment reveals a major disparity between inferred and realized gene flow. We discuss our results in a conservation perspective for species inhabiting highly fragmented landscapes. PMID:25229344

  20. Boolean networks using the chi-square test for inferring large-scale gene regulatory networks.

    PubMed

    Kim, Haseong; Lee, Jae K; Park, Taesung

    2007-02-01

    Boolean network (BN) modeling is a commonly used method for constructing gene regulatory networks from time series microarray data. However, its major drawback is that its computation time is very high or often impractical to construct large-scale gene networks. We propose a variable selection method that are not only reduces BN computation times significantly but also obtains optimal network constructions by using chi-square statistics for testing the independence in contingency tables. Both the computation time and accuracy of the network structures estimated by the proposed method are compared with those of the original BN methods on simulated and real yeast cell cycle microarray gene expression data sets. Our results reveal that the proposed chi-square testing (CST)-based BN method significantly improves the computation time, while its ability to identify all the true network mechanisms was effectively the same as that of full-search BN methods. The proposed BN algorithm is approximately 70.8 and 7.6 times faster than the original BN algorithm when the error sizes of the Best-Fit Extension problem are 0 and 1, respectively. Further, the false positive error rate of the proposed CST-based BN algorithm tends to be less than that of the original BN. The CST-based BN method dramatically improves the computation time of the original BN algorithm. Therefore, it can efficiently infer large-scale gene regulatory network mechanisms.

  1. An incremental and distributed inference method for large-scale ontologies based on MapReduce paradigm.

    PubMed

    Liu, Bo; Huang, Keman; Li, Jianqiang; Zhou, MengChu

    2015-01-01

    With the upcoming data deluge of semantic data, the fast growth of ontology bases has brought significant challenges in performing efficient and scalable reasoning. Traditional centralized reasoning methods are not sufficient to process large ontologies. Distributed reasoning methods are thus required to improve the scalability and performance of inferences. This paper proposes an incremental and distributed inference method for large-scale ontologies by using MapReduce, which realizes high-performance reasoning and runtime searching, especially for incremental knowledge base. By constructing transfer inference forest and effective assertional triples, the storage is largely reduced and the reasoning process is simplified and accelerated. Finally, a prototype system is implemented on a Hadoop framework and the experimental results validate the usability and effectiveness of the proposed approach.

  2. Common garden comparisons of native and introduced plant populations: latitudinal clines can obscure evolutionary inferences

    PubMed Central

    Colautti, Robert I; Maron, John L; Barrett, Spencer C H

    2009-01-01

    Common garden studies are increasingly used to identify differences in phenotypic traits between native and introduced genotypes, often ignoring sources of among-population variation within each range. We re-analyzed data from 32 common garden studies of 28 plant species that tested for rapid evolution associated with biological invasion. Our goals were: (i) to identify patterns of phenotypic trait variation among populations within native and introduced ranges, and (ii) to explore the consequences of this variation for how differences between the ranges are interpreted. We combined life history and physiologic traits into a single principal component (PCALL) and also compared subsets of traits related to size, reproduction, and defense (PCSIZE, PCREP, and PCDEF, respectively). On average, introduced populations exhibited increased growth and reproduction compared to native conspecifics when latitude was not included in statistical models. However, significant correlations between PC-scores and latitude were detected in both the native and introduced ranges, indicating population differentiation along latitudinal gradients. When latitude was explicitly incorporated into statistical models as a covariate, it reduced the magnitude and reversed the direction of the effect for PCALL and PCSIZE. These results indicate that unrecognized geographic clines in phenotypic traits can confound inferences about the causes of evolutionary change in invasive plants. PMID:25567860

  3. Inferred Paternity and Male Reproductive Success in a Killer Whale (Orcinus orca) Population.

    PubMed

    Ford, Michael J; Hanson, M Bradley; Hempelmann, Jennifer A; Ayres, Katherine L; Emmons, Candice K; Schorr, Gregory S; Baird, Robin W; Balcomb, Kenneth C; Wasser, Samuel K; Parsons, Kim M; Balcomb-Bartok, Kelly

    2011-01-01

    We used data from 78 individuals at 26 microsatellite loci to infer parental and sibling relationships within a community of fish-eating ("resident") eastern North Pacific killer whales (Orcinus orca). Paternity analysis involving 15 mother/calf pairs and 8 potential fathers and whole-pedigree analysis of the entire sample produced consistent results. The variance in male reproductive success was greater than expected by chance and similar to that of other aquatic mammals. Although the number of confirmed paternities was small, reproductive success appeared to increase with male age and size. We found no evidence that males from outside this small population sired any of the sampled individuals. In contrast to previous results in a different population, many offspring were the result of matings within the same "pod" (long-term social group). Despite this pattern of breeding within social groups, we found no evidence of offspring produced by matings between close relatives, and the average internal relatedness of individuals was significantly less than expected if mating were random. The population's estimated effective size was <30 or about 1/3 of the current census size. Patterns of allele frequency variation were consistent with a population bottleneck.

  4. Fingerprint ridge density in the Argentinean population and its application to sex inference: A comparative study.

    PubMed

    Rivaldería, Noemí; Sánchez-Andrés, Ángeles; Alonso-Rodríguez, Concepción; Dipierri, José E; Gutiérrez-Redomero, Esperanza

    2016-02-01

    Fingerprint ridge density (RD) is known to vary according to sex and population, and such variation can be used for forensic purposes. The aim of this study was to analyze the fingerprint RD of two samples of the Argentinean population in order to assess their topological, digital, bilateral, sexual, and population differences for subsequent application in the inference of sex. Data were collected from the fingerprints of 172 individuals from the Buenos Aires province and 163 from the Chubut province. RD was assessed for three different count areas for all 10 fingers of each individual. In both sexes and both samples, significant differences among areas were obtained, so that radial-RD>ulnar-RD>proximal-RD. Females presented greater RD than males in all areas and on all fingers. Regarding population differences, no significant differences were found between the Buenos Aires and Chubut samples (except for proximal RD in males). However, both samples showed RD significantly different from that of the Jujuy province. The application of Bayes' theorem allowed for the identification of an RD threshold for discrimination of sexes in these Argentinean samples. In conclusion females consistently exhibit narrower epidermal ridges than males, which may evidence a universal pattern of sexual dimorphism in this trait that can be useful in forensics in the identification of individuals.

  5. Using large clinical data sets to infer pathogenicity for rare copy number variants in autism cohorts

    PubMed Central

    Moreno-De-Luca, D; Sanders, S J; Willsey, A J; Mulle, J G; Lowe, J K; Geschwind, D H; State, M W; Martin, C L; Ledbetter, D H

    2013-01-01

    Copy number variants (CNVs) have a major role in the etiology of autism spectrum disorders (ASD), and several of these have reached statistical significance in case–control analyses. Nevertheless, current ASD cohorts are not large enough to detect very rare CNVs that may be causative or contributory (that is, risk alleles). Here, we use a tiered approach, in which clinically significant CNVs are first identified in large clinical cohorts of neurodevelopmental disorders (including but not specific to ASD), after which these CNVs are then systematically identified within well-characterized ASD cohorts. We focused our initial analysis on 48 recurrent CNVs (segmental duplication-mediated ‘hotspots') from 24 loci in 31 516 published clinical cases with neurodevelopmental disorders and 13 696 published controls, which yielded a total of 19 deletion CNVs and 11 duplication CNVs that reached statistical significance. We then investigated the overlap of these 30 CNVs in a combined sample of 3955 well-characterized ASD cases from three published studies. We identified 73 deleterious recurrent CNVs, including 36 deletions from 11 loci and 37 duplications from seven loci, for a frequency of 1 in 54; had we considered the ASD cohorts alone, only 58 CNVs from eight loci (24 deletions from three loci and 34 duplications from five loci) would have reached statistical significance. In conclusion, until there are sufficiently large ASD research cohorts with enough power to detect very rare causative or contributory CNVs, data from larger clinical cohorts can be used to infer the likely clinical significance of CNVs in ASD. PMID:23044707

  6. Subgroup mixable inference on treatment efficacy in mixture populations, with an application to time-to-event outcomes.

    PubMed

    Ding, Ying; Lin, Hui-Min; Hsu, Jason C

    2016-05-10

    In tailored drug development, the patient population is thought of as a mixture of two or more subgroups that may derive differential treatment efficacy. To find the right patient population for the drug to target, it is necessary to infer treatment efficacy in subgroups and combinations of subgroups. A fundamental consideration in this inference process is that the logical relationships between treatment efficacy in subgroups and their combinations should be respected (for otherwise the assessment of treatment efficacy may become paradoxical). We show that some commonly used efficacy measures are not suitable for a mixture population. We also show that the current practice of over-simply extending the least squares means concept when estimating the efficacy in a mixture population is inappropriate. Proposing a new principle called subgroup mixable estimation, we establish the logical relationships among parameters that represent efficacy and develop a simultaneous inference procedure to confidently infer efficacy in subgroups and their combinations. Using oncology studies with time-to-event outcomes as an example, we show that the hazard ratio is not suitable for measuring treatment efficacy in a mixture population and provide appropriate efficacy measures with a rigorous inference procedure. Copyright © 2015 John Wiley & Sons, Ltd.

  7. Velocity-Based Movement Modeling for Individual and Population Level Inference

    PubMed Central

    Hanks, Ephraim M.; Hooten, Mevin B.; Johnson, Devin S.; Sterling, Jeremy T.

    2011-01-01

    Understanding animal movement and resource selection provides important information about the ecology of the animal, but an animal's movement and behavior are not typically constant in time. We present a velocity-based approach for modeling animal movement in space and time that allows for temporal heterogeneity in an animal's response to the environment, allows for temporal irregularity in telemetry data, and accounts for the uncertainty in the location information. Population-level inference on movement patterns and resource selection can then be made through cluster analysis of the parameters related to movement and behavior. We illustrate this approach through a study of northern fur seal (Callorhinus ursinus) movement in the Bering Sea, Alaska, USA. Results show sex differentiation, with female northern fur seals exhibiting stronger response to environmental variables. PMID:21931584

  8. Velocity-based movement modeling for individual and population level inference

    USGS Publications Warehouse

    Hanks, Ephraim M.; Hooten, Mevin B.; Johnson, Devin S.; Sterling, Jeremy T.

    2011-01-01

    Understanding animal movement and resource selection provides important information about the ecology of the animal, but an animal's movement and behavior are not typically constant in time. We present a velocity-based approach for modeling animal movement in space and time that allows for temporal heterogeneity in an animal's response to the environment, allows for temporal irregularity in telemetry data, and accounts for the uncertainty in the location information. Population-level inference on movement patterns and resource selection can then be made through cluster analysis of the parameters related to movement and behavior. We illustrate this approach through a study of northern fur seal (Callorhinus ursinus) movement in the Bering Sea, Alaska, USA. Results show sex differentiation, with female northern fur seals exhibiting stronger response to environmental variables.

  9. Improved genome inference in the MHC using a population reference graph.

    PubMed

    Dilthey, Alexander; Cox, Charles; Iqbal, Zamin; Nelson, Matthew R; McVean, Gil

    2015-06-01

    Although much is known about human genetic variation, such information is typically ignored in assembling new genomes. Instead, reads are mapped to a single reference, which can lead to poor characterization of regions of high sequence or structural diversity. We introduce a population reference graph, which combines multiple reference sequences and catalogs of variation. The genomes of new samples are reconstructed as paths through the graph using an efficient hidden Markov model, allowing for recombination between different haplotypes and additional variants. By applying the method to the 4.5-Mb extended MHC region on human chromosome 6, combining 8 assembled haplotypes, the sequences of known classical HLA alleles and 87,640 SNP variants from the 1000 Genomes Project, we demonstrate using simulations, SNP genotyping, and short-read and long-read data how the method improves the accuracy of genome inference and identified regions where the current set of reference sequences is substantially incomplete.

  10. COSMOABC: Likelihood-free inference via Population Monte Carlo Approximate Bayesian Computation

    NASA Astrophysics Data System (ADS)

    Ishida, E. E. O.; Vitenti, S. D. P.; Penna-Lima, M.; Cisewski, J.; de Souza, R. S.; Trindade, A. M. M.; Cameron, E.; Busti, V. C.

    2015-11-01

    Approximate Bayesian Computation (ABC) enables parameter inference for complex physical systems in cases where the true likelihood function is unknown, unavailable, or computationally too expensive. It relies on the forward simulation of mock data and comparison between observed and synthetic catalogues. Here we present COSMOABC, a Python ABC sampler featuring a Population Monte Carlo variation of the original ABC algorithm, which uses an adaptive importance sampling scheme. The code is very flexible and can be easily coupled to an external simulator, while allowing to incorporate arbitrary distance and prior functions. As an example of practical application, we coupled COSMOABC with the NUMCOSMO library and demonstrate how it can be used to estimate posterior probability distributions over cosmological parameters based on measurements of galaxy clusters number counts without computing the likelihood function. COSMOABC is published under the GPLv3 license on PyPI and GitHub and documentation is available at http://goo.gl/SmB8EX.

  11. Bayesian inference on the effect of density dependence and weather on a guanaco population from Chile.

    PubMed

    Zubillaga, María; Skewes, Oscar; Soto, Nicolás; Rabinovich, Jorge E; Colchero, Fernando

    2014-01-01

    Understanding the mechanisms that drive population dynamics is fundamental for management of wild populations. The guanaco (Lama guanicoe) is one of two wild camelid species in South America. We evaluated the effects of density dependence and weather variables on population regulation based on a time series of 36 years of population sampling of guanacos in Tierra del Fuego, Chile. The population density varied between 2.7 and 30.7 guanaco/km2, with an apparent monotonic growth during the first 25 years; however, in the last 10 years the population has shown large fluctuations, suggesting that it might have reached its carrying capacity. We used a Bayesian state-space framework and model selection to determine the effect of density and environmental variables on guanaco population dynamics. Our results show that the population is under density dependent regulation and that it is currently fluctuating around an average carrying capacity of 45,000 guanacos. We also found a significant positive effect of previous winter temperature while sheep density has a strong negative effect on the guanaco population growth. We conclude that there are significant density dependent processes and that climate as well as competition with domestic species have important effects determining the population size of guanacos, with important implications for management and conservation.

  12. Bayesian Inference on the Effect of Density Dependence and Weather on a Guanaco Population from Chile

    PubMed Central

    Zubillaga, María; Skewes, Oscar; Soto, Nicolás; Rabinovich, Jorge E.; Colchero, Fernando

    2014-01-01

    Understanding the mechanisms that drive population dynamics is fundamental for management of wild populations. The guanaco (Lama guanicoe) is one of two wild camelid species in South America. We evaluated the effects of density dependence and weather variables on population regulation based on a time series of 36 years of population sampling of guanacos in Tierra del Fuego, Chile. The population density varied between 2.7 and 30.7 guanaco/km2, with an apparent monotonic growth during the first 25 years; however, in the last 10 years the population has shown large fluctuations, suggesting that it might have reached its carrying capacity. We used a Bayesian state-space framework and model selection to determine the effect of density and environmental variables on guanaco population dynamics. Our results show that the population is under density dependent regulation and that it is currently fluctuating around an average carrying capacity of 45,000 guanacos. We also found a significant positive effect of previous winter temperature while sheep density has a strong negative effect on the guanaco population growth. We conclude that there are significant density dependent processes and that climate as well as competition with domestic species have important effects determining the population size of guanacos, with important implications for management and conservation. PMID:25514510

  13. Genetic inferences about the population dynamics of codling moth females at a local scale.

    PubMed

    Franck, P; Ricci, B; Klein, E K; Olivares, J; Simon, S; Cornuet, J-M; Lavigne, C

    2011-07-01

    Estimation of demographic parameters is important for understanding the functioning of natural populations and the underlying ecological and evolutionary processes that may impact their dynamics. Here, we used sibship assignment methods to shed light on the local dynamics of codling moth females in eight orchards in a 90-ha domain near Valence, France. Based on full-sib inference among 1,063 genotyped moths, we estimated (1) the effective number of females that had offspring, (2) their fertility and (3) the distribution of their oviposition sites within and among orchards. The average number of females in all the orchards increased between the first (~130) and the second (~235) annual generations. The average fertilities of the females were similar at each generation according to the host plant considered (apple, pear, or walnut), but differed between commercial (~10) and non-treated (~25) apple orchards. Females mainly clustered their eggs on contiguous trees along orchard borders, but they also occasionally dispersed their eggs among different orchards independently of the cultivated host plants or the inter-orchard distances (up to 698 m) during the second annual generation. The mean distance between two oviposition sites was 30 m. Sibship estimates of both the effective number of females and the inter-orchard migration rates (~5%) were in agreement with the observed genetic differentiation among the eight orchards (0.006 < F ( st ) < 0.013). These results confirm and extend previous field and laboratory observations in Cydia pomonella, and they demonstrate that sibship assignments based on genetic data are an interesting alternative to mark-release-recapture methods for inferring insect population dynamics.

  14. Surfing among species, populations and morphotypes: Inferring boundaries between two species of new world silversides (Atherinopsidae).

    PubMed

    González-Castro, Mariano; Rosso, Juan José; Mabragaña, Ezequiel; Díaz de Astarloa, Juan Martín

    2016-01-01

    Atherinopsidae are widespread freshwater and shallow marine fish with singular economic importance. Morphological, genetical and life cycles differences between marine and estuarine populations were already reported in this family, suggesting ongoing speciation. Also, coexistence and interbreeding between closely related species were documented. The aim of this study was to infer boundaries among: (A) Odontesthes bonariensis and O. argentinensis at species level, and intermediate morphs; (B) the population of O. argentinensis of Mar Chiquita Lagoon and its marine conspecifics. To achieve this, we integrated, meristic, Geometrics Morphometrics and DNA Barcode approaches. Four groups were discriminated and subsequently characterized according to their morphological traits, shape and meristic characters. No shared haplotypes between O. bonariensis and O. argentinensis were found. Significative-meristic and body shape differences between the Mar Chiquita and marine individuals of O. argentinensis were found, suggesting they behave as well differentiated populations, or even incipient ecological species. The fact that the Odontesthes morphotypes shared haplotypes with both, O. argentinensis and O. bonariensis, but also possess meristic and morphometric distinctive traits open new questions related to the origin of this morphogroup.

  15. A large underestimate of the pyrogenic source of formic acid inferred from space-borne measurements.

    NASA Astrophysics Data System (ADS)

    Chaliyakunnel, S.; Millet, D. B.; Wells, K. C.; Cady-Pereira, K.; Shephard, M.

    2015-12-01

    Formic acid (HCOOH) is one of the most abundant carboxylic acids in the atmosphere, and a dominant source of acidity in the global troposphere. Recent work has revealed a major gap in our present understanding of the atmospheric formic acid budget, with observed concentrations much larger than can be reconciled with current estimates of its sources. In this work, we employ new space-based observations from the Tropospheric Emission Spectrometer (TES) satellite instrument with the GEOS-Chem chemical transport model to better quantify the source of atmospheric formic acid from biomass burning, and assess the degree to which this source can help close the large budget gap for this species. The space-based formic acid data reveal a severe model underestimate for HCOOH that is most prominent over tropical biomass burning regions, indicating a major missing source of organic acids from fires. Based on two independent methods for inferring the fractional contribution of fires to the measured HCOOH abundance, we find that the pyrogenic HCOOH:CO enhancement ratio measured by TES (including direct emissions plus secondary production) is 5-10 times higher than current estimates of the direct emission ratio, providing evidence of substantial secondary production of HCOOH in fire plumes. We further show that current models significantly underestimate (by a factor of 2-6) the total primary and secondary source of HCOOH from tropical fires.

  16. Long-time analytic approximation of large stochastic oscillators: Simulation, analysis and inference

    PubMed Central

    2017-01-01

    In order to analyse large complex stochastic dynamical models such as those studied in systems biology there is currently a great need for both analytical tools and also algorithms for accurate and fast simulation and estimation. We present a new stochastic approximation of biological oscillators that addresses these needs. Our method, called phase-corrected LNA (pcLNA) overcomes the main limitations of the standard Linear Noise Approximation (LNA) to remain uniformly accurate for long times, still maintaining the speed and analytically tractability of the LNA. As part of this, we develop analytical expressions for key probability distributions and associated quantities, such as the Fisher Information Matrix and Kullback-Leibler divergence and we introduce a new approach to system-global sensitivity analysis. We also present algorithms for statistical inference and for long-term simulation of oscillating systems that are shown to be as accurate but much faster than leaping algorithms and algorithms for integration of diffusion equations. Stochastic versions of published models of the circadian clock and NF-κB system are used to illustrate our results. PMID:28742083

  17. Long-time analytic approximation of large stochastic oscillators: Simulation, analysis and inference.

    PubMed

    Minas, Giorgos; Rand, David A

    2017-07-01

    In order to analyse large complex stochastic dynamical models such as those studied in systems biology there is currently a great need for both analytical tools and also algorithms for accurate and fast simulation and estimation. We present a new stochastic approximation of biological oscillators that addresses these needs. Our method, called phase-corrected LNA (pcLNA) overcomes the main limitations of the standard Linear Noise Approximation (LNA) to remain uniformly accurate for long times, still maintaining the speed and analytically tractability of the LNA. As part of this, we develop analytical expressions for key probability distributions and associated quantities, such as the Fisher Information Matrix and Kullback-Leibler divergence and we introduce a new approach to system-global sensitivity analysis. We also present algorithms for statistical inference and for long-term simulation of oscillating systems that are shown to be as accurate but much faster than leaping algorithms and algorithms for integration of diffusion equations. Stochastic versions of published models of the circadian clock and NF-κB system are used to illustrate our results.

  18. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs

    PubMed Central

    Dilthey, Alexander T.; Gourraud, Pierre-Antoine; McVean, Gil

    2016-01-01

    Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30–250 CPU hours per sample) remain a significant

  19. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs.

    PubMed

    Dilthey, Alexander T; Gourraud, Pierre-Antoine; Mentzer, Alexander J; Cereb, Nezih; Iqbal, Zamin; McVean, Gil

    2016-10-01

    Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30-250 CPU hours per sample) remain a significant

  20. Statistical inference on genetic data reveals the complex demographic history of human populations in central Asia.

    PubMed

    Palstra, Friso P; Heyer, Evelyne; Austerlitz, Frédéric

    2015-06-01

    The demographic history of modern humans constitutes a combination of expansions, colonizations, contractions, and remigrations. The advent of large scale genetic data combined with statistically refined methods facilitates inference of this complex history. Here we study the demographic history of two genetically admixed ethnic groups in Central Asia, an area characterized by high levels of genetic diversity and a history of recurrent immigration. Using Approximate Bayesian Computation, we infer that the timing of admixture markedly differs between the two groups. Admixture in the traditionally agricultural Tajiks could be dated back to the onset of the Neolithic transition in the region, whereas admixture in Kyrgyz is more recent, and may have involved the westward movement of Turkic peoples. These results are confirmed by a coalescent method that fits an isolation-with-migration model to the genetic data, with both Central Asian groups having received gene flow from the extremities of Eurasia. Interestingly, our analyses also uncover signatures of gene flow from Eastern to Western Eurasia during Paleolithic times. In conclusion, the high genetic diversity currently observed in these two Central Asian peoples most likely reflects the effects of recurrent immigration that likely started before historical times. Conversely, conquests during historical times may have had a relatively limited genetic impact. These results emphasize the need for a better understanding of the genetic consequences of transmission of culture and technological innovations, as well as those of invasions and conquests.

  1. Alternative Model-Based and Design-Based Frameworks for Inference From Samples to Populations: From Polarization to Integration

    PubMed Central

    Sterba, Sonya K.

    2010-01-01

    A model-based framework, due originally to R. A. Fisher, and a design-based framework, due originally to J. Neyman, offer alternative mechanisms for inference from samples to populations. We show how these frameworks can utilize different types of samples (nonrandom or random vs. only random) and allow different kinds of inference (descriptive vs. analytic) to different kinds of populations (finite vs. infinite). We describe the extent of each framework's implementation in observational psychology research. After clarifying some important limitations of each framework, we describe how these limitations are overcome by a newer hybrid model/design-based inferential framework. This hybrid framework allows both kinds of inference to both kinds of populations, given a random sample. We illustrate implementation of the hybrid framework using the High School and Beyond data set. PMID:20411042

  2. The first large population based twin study of coeliac disease

    PubMed Central

    Greco, L; Romino, R; Coto, I; Di Cosmo, N; Percopo, S; Maglio, M; Paparo, F; Gasperi, V; Limongelli, M G; Cotichini, R; D'Agate, C; Tinto, N; Sacchetti, L; Tosi, R; Stazi, M A

    2002-01-01

    Background and aims: The genetic load in coeliac disease has hitherto been inferred from case series or anecdotally referred twin pairs. We have evaluated the genetic component in coeliac disease by estimating the concordance rate for the disease among twin pairs in a large population based study. Methods: The Italian Twin Registry was matched with the membership lists of a patient support group. Forty seven twin pairs were recruited and screened for antiendomysial (EMA) and antihuman-tissue transglutaminase (anti-tTG) antibodies; zygosity was verified by DNA fingerprinting and twins were typed for HLA class II DRB1 and DQB1 molecules. Results: Concordance rates for coeliac disease differ significantly between monozygotic (MZ) (0.86 probandwise and 0.75 pairwise) and dizygotic (DZ) (0.20 probandwise and 0.11 pairwise) twins. This is the highest concordance so far reported for a multifactorial disease. A logistic regression model, adjusted for age, sex, number of shared HLA haplotypes, and zygosity, showed that genotypes DQA1*0501/DQB1*0201 and DQA1*0301/DQB1*0302 (encoding for heterodimers DQ2 and DQ8, respectively) conferred to the non-index twin a risk of contracting the disease of 3.3 and 1.4, respectively. The risk of being concordant for coeliac disease estimated for the non-index twin of MZ pairs was 17 (95% confidence interval 2.1–134), independent of the DQ at risk genotype. Conclusion: This study provides substantial evidence for a very strong genetic component in coeliac disease, which is only partially due to the HLA region. PMID:11950806

  3. Inferring the population structure of Myzus persicae in diverse agroecosystems using microsatellite markers.

    PubMed

    Sanchez, Juan Antonio; La-Spina, Michelangelo; Guirao, Pedro; Cánovas, Fernando

    2013-08-01

    Diverse agroecosystems offer phytophagous insects a wide choice of host plants. Myzus persicae is a polyphagous aphid common in moderate climates. During its life cycle it alternates between primary and secondary hosts. A spatial genetic population structure may arise due to environmental factors and reproduction modes. The aim of this work was to determine the spatial and temporal genetic population structure of M. persicae in relation to host plants and climatic conditions. For this, 923 individuals of M. persicae collected from six plant families between 2005 and 2008 in south-eastern Spain were genotyped for eight microsatellite loci. The population structure was inferred by neighbour-joining, analysis of molecular variance (AMOVA) and Bayesian analyses. Moderate polymorphism was observed for the eight loci in almost all the samples. No differences in the number of alleles were observed between primary and secondary hosts or between geographical areas. The proportion of unique genotypes found in the primary host was similar in the north (0.961 ± 0.036) and the south (0.987 ± 0.013), while in the secondary host it was higher in the north (0.801 ± 0.159) than in the south (0.318 ± 0.063). Heterozygosity excess and linkage disequilibrium suggest a high representation of obligate parthenogens in areas with warmer climate and in the secondary hosts. The F ST-values pointed to no genetic differentiation of M. persicae on the different plant families. F ST-values, AMOVA and Bayesian model-based cluster analyses pointed to a significant population structure that was related to primary and secondary hosts. Differences between primary and secondary hosts could be due to the overrepresentation of parthenogens on herbaceous plants.

  4. Misspecified poisson regression models for large-scale registry data: inference for 'large n and small p'.

    PubMed

    Grøn, Randi; Gerds, Thomas A; Andersen, Per K

    2016-03-30

    Poisson regression is an important tool in register-based epidemiology where it is used to study the association between exposure variables and event rates. In this paper, we will discuss the situation with 'large n and small p', where n is the sample size and p is the number of available covariates. Specifically, we are concerned with modeling options when there are time-varying covariates that can have time-varying effects. One problem is that tests of the proportional hazards assumption, of no interactions between exposure and other observed variables, or of other modeling assumptions have large power due to the large sample size and will often indicate statistical significance even for numerically small deviations that are unimportant for the subject matter. Another problem is that information on important confounders may be unavailable. In practice, this situation may lead to simple working models that are then likely misspecified. To support and improve conclusions drawn from such models, we discuss methods for sensitivity analysis, for estimation of average exposure effects using aggregated data, and a semi-parametric bootstrap method to obtain robust standard errors. The methods are illustrated using data from the Danish national registries investigating the diabetes incidence for individuals treated with antipsychotics compared with the general unexposed population.

  5. Pronounced population genetic differentiation in the rock bream Oplegnathus fasciatus inferred from mitochondrial DNA sequences.

    PubMed

    Xiao, Yongshuang; Li, Jun; Ren, Guijing; Ma, Daoyuan; Wang, Yanfeng; Xiao, ZhiZhong; Xu, Shihong

    2016-05-01

    The population genetic structure of the rock bream (Oplegnathus fasciatus) along the coastal waters of China was estimated based on three mtDNA fragments (D-loop, COI, and Cytb). A total of 112 polymorphic sites were checked, which defined 63 haplotypes. A pattern with high levels of haplotype diversity (hCOI = 0.886 ± 0.034, hCytb = 0.874 ± 0.023) and low levels of nucleotide diversity (лCOI = 0.009 ± 0.005, лCytb = 0.006 ± 0.003) was detected based on the COI and Cytb fragments, and high levels of genetic diversity (hD-loop = 0.995 ± 0.007, лD-loop = 0.021 ± 0.011) were detected from the mtDNA D-loop. The population genetic diversity of O. fasciatus in south China was significantly higher than those of north China. Three genealogical clades were checked in the O. fasciatus populations based on the NJ and MST analyses of mtDNA COI gene sequence, and the genetic distances among the clades ranged from 0.018 to 0.025. Significant population genetic differentiation was also checked based on the Fst (0.331, p = 0.000) and exact p (0.000) test analyses. No significant population differentiations were checked based on mtDNA D-loop and Cytb fragments. Using a variety of phylogenetic methods, coalescent reasoning, and molecular dating interpreted in conjunction with paleoclimatic and physiographic evidences, we inferred that the genetic make-up of extant populations of O. fasciatus was shaped by Pleistocene environmental impacts on the historical demography of this species. Coalescent analyses (neutrality tests, mismatch distribution analysis, and Bayesian skyline analyses) showed that the species along coastline of China has experienced population expansions originated in its most recent history at about 169-175 kya before present.

  6. Genetic variation among world populations: inferences from 100 Alu insertion polymorphisms.

    PubMed

    Watkins, W Scott; Rogers, Alan R; Ostler, Christopher T; Wooding, Steve; Bamshad, Michael J; Brassington, Anna-Marie E; Carroll, Marion L; Nguyen, Son V; Walker, Jerilyn A; Prasad, B V Ravi; Reddy, P Govinda; Das, Pradipta K; Batzer, Mark A; Jorde, Lynn B

    2003-07-01

    We examine the distribution and structure of human genetic diversity for 710 individuals representing 31 populations from Africa, East Asia, Europe, and India using 100 Alu insertion polymorphisms from all 22 autosomes. Alu diversity is highest in Africans (0.349) and lowest in Europeans (0.297). Alu insertion frequency is lowest in Africans (0.463) and higher in Indians (0.544), E. Asians (0.557), and Europeans (0.559). Large genetic distances are observed among African populations and between African and non-African populations. The root of a neighbor-joining network is located closest to the African populations. These findings are consistent with an African origin of modern humans and with a bottleneck effect in the human populations that left Africa to colonize the rest of the world. Genetic distances among all pairs of populations show a significant product-moment correlation with geographic distances (r = 0.69, P < 0.00001). F(ST), the proportion of genetic diversity attributable to population subdivision is 0.141 for Africans/E. Asians/Europeans, 0.047 for E. Asians/Indians/Europeans, and 0.090 for all 31 populations. Resampling analyses show that approximately 50 Alu polymorphisms are sufficient to obtain accurate and reliable genetic distance estimates. These analyses also demonstrate that markers with higher F(ST) values have greater resolving power and produce more consistent genetic distance estimates.

  7. Population subdivision in Europe's great bustard inferred from mitochondrial and nuclear DNA sequence variation.

    PubMed

    Pitra, C; Lieckfeldt, D; Alonso, J C

    2000-08-01

    A continent-wide survey of sequence variation in mitochondrial (mt) and nuclear (n) DNA of the endangered great bustard (Otis tarda) was conducted to assess the extent of phylogeographic structure in a morphologically monotypic bird. DNA sequence variation in a combined 809 bp segment of the mtDNA genome from 66 individuals from the last six breeding regions showed relatively low levels of intraspecific sequence diversity (n = 0.32%) but significant differences in the regional distribution of 11 haplotypes (phiST = 0.49). Despite their exceptional potential for dispersal, a complete and long-term historical separation between the populations from the Iberian Peninsula (Spain) and mainland Europe (Hungary, Slovakia, Germany, and Russia) was demonstrated. Divergence between populations based on a 3-bp insertion-deletion polymorphism within the intron region of the nuclear CHD-Z gene was geographically concordant with the primary subdivision identified within the mtDNA sequences. Inferred aspects of phylogeography were used to formulate conservation recommendations for this endangered species.

  8. Controller certification: The generalized stability margin inference for a large number of MIMO controllers

    NASA Astrophysics Data System (ADS)

    Park, Jisang

    In this dissertation, we investigate MIMO stability margin inference of a large number of controllers using pre-established stability margins of a small number of nu-gap-wise adjacent controllers. The generalized stability margin and the nu-gap metric are inherently able to handle MIMO system analysis without the necessity of repeating multiple channel-by-channel SISO analyses. This research consists of three parts: (i) development of a decision support tool for inference of the stability margin, (ii) computational considerations for yielding the maximal stability margin with the minimal nu-gap metric in a less conservative manner, and (iii) experiment design for estimating the generalized stability margin with an assured error bound. A modern problem from aerospace control involves the certification of a large set of potential controllers with either a single plant or a fleet of potential plant systems, with both plants and controllers being MIMO and, for the moment, linear. Experiments on a limited number of controller/plant pairs should establish the stability and a certain level of margin of the complete set. We consider this certification problem for a set of controllers and provide algorithms for selecting an efficient subset for testing. This is done for a finite set of candidate controllers and, at least for SISO plants, for an infinite set. In doing this, the nu-gap metric will be the main tool. We provide a theorem restricting a radius of a ball in the parameter space so that the controller can guarantee a prescribed level of stability and performance if parameters of the controllers are contained in the ball. Computational examples are given, including one of certification of an aircraft engine controller. The overarching aim is to introduce truly MIMO margin calculations and to understand their efficacy in certifying stability over a set of controllers and in replacing legacy single-loop gain and phase margin calculations. We consider methods for the

  9. Quantitative inference of population response properties across eccentricity from motion-induced maps in macaque V1

    PubMed Central

    Chen, Ming; Wu, Si; Lu, Haidong D.; Roe, Anna W.

    2013-01-01

    Interpreting population responses in the primary visual cortex (V1) remains a challenge especially with the advent of techniques measuring activations of large cortical areas simultaneously with high precision. For successful interpretation, a quantitatively precise model prediction is of great importance. In this study, we investigate how accurate a spatiotemporal filter (STF) model predicts average response profiles to coherently drifting random dot motion obtained by optical imaging of intrinsic signals in V1 of anesthetized macaques. We establish that orientation difference maps, obtained by subtracting orthogonal axis-of-motion, invert with increasing drift speeds, consistent with the motion streak effect. Consistent with perception, the speed at which the map inverts (the critical speed) depends on cortical eccentricity and systematically increases from foveal to parafoveal. We report that critical speeds and response maps to drifting motion are excellently reproduced by the STF model. Our study thus suggests that the STF model is quantitatively accurate enough to be used as a first model of choice for interpreting responses obtained with intrinsic imaging methods in V1. We show further that this good quantitative correspondence opens the possibility to infer otherwise not easily accessible population receptive field properties from responses to complex stimuli, such as drifting random dot motions. PMID:23197457

  10. Population genetic structure of sexual and parthenogenetic damselflies inferred from mitochondrial and nuclear markers

    PubMed Central

    Lorenzo-Carballa, M O; Hadrys, H; Cordero-Rivera, A; Andrés, J A

    2012-01-01

    It has been postulated that obligate asexual lineages may persist in the long term if they escape from negative interactions with either sexual lineages or biological enemies; and thus, parthenogenetic populations will be more likely to occur in places that are difficult for sexuals to colonize, or those in which biological interactions are rare, such as islands or island-like habitats. Ischnura hastata is the only known example of natural parthenogenesis within the insect order Odonata, and it represents also a typical example of geographic parthenogenesis, as sexual populations are widely distributed in North America, whereas parthenogenetic populations of this species have only been found at the Azores archipelago. In order to gain insight in the origin and distribution of parthenogenetic I. hastata lineages, we have used microsatellites, mitochondrial and nuclear DNA sequence data, to examine the population genetic structure of this species over a wide geographic area. Our results suggest that sexual populations of I. hastata in North America conform to a large subdivided population that has gone through a recent spatial expansion. A recent single long distance dispersal event, followed by a demographic expansion, is the most parsimonious hypothesis explaining the origin of the parthenogenetic population of this species in the Azores islands. PMID:21915148

  11. Genomic inference accurately predicts the timing and severity of a recent bottleneck in a non-model insect population

    PubMed Central

    McCoy, Rajiv C.; Garud, Nandita R.; Kelley, Joanna L.; Boggs, Carol L.; Petrov, Dmitri A.

    2015-01-01

    The analysis of molecular data from natural populations has allowed researchers to answer diverse ecological questions that were previously intractable. In particular, ecologists are often interested in the demographic history of populations, information that is rarely available from historical records. Methods have been developed to infer demographic parameters from genomic data, but it is not well understood how inferred parameters compare to true population history or depend on aspects of experimental design. Here we present and evaluate a method of SNP discovery using RNA-sequencing and demographic inference using the program δaδi, which uses a diffusion approximation to the allele frequency spectrum to fit demographic models. We test these methods in a population of the checkerspot butterfly Euphydryas gillettii. This population was intentionally introduced to Gothic, Colorado in 1977 and has since experienced extreme fluctuations including bottlenecks of fewer than 25 adults, as documented by nearly annual field surveys. Using RNA-sequencing of eight individuals from Colorado and eight individuals from a native population in Wyoming, we generate the first genomic resources for this system. While demographic inference is commonly used to examine ancient demography, our study demonstrates that our inexpensive, all-in-one approach to marker discovery and genotyping provides sufficient data to accurately infer the timing of a recent bottleneck. This demographic scenario is relevant for many species of conservation concern, few of which have sequenced genomes. Our results are remarkably insensitive to sample size or number of genomic markers, which has important implications for applying this method to other non-model systems. PMID:24237665

  12. Variational mean-field algorithm for efficient inference in large systems of stochastic differential equations.

    PubMed

    Vrettas, Michail D; Opper, Manfred; Cornford, Dan

    2015-01-01

    This work introduces a Gaussian variational mean-field approximation for inference in dynamical systems which can be modeled by ordinary stochastic differential equations. This new approach allows one to express the variational free energy as a functional of the marginal moments of the approximating Gaussian process. A restriction of the moment equations to piecewise polynomial functions, over time, dramatically reduces the complexity of approximate inference for stochastic differential equation models and makes it comparable to that of discrete time hidden Markov models. The algorithm is demonstrated on state and parameter estimation for nonlinear problems with up to 1000 dimensional state vectors and compares the results empirically with various well-known inference methodologies.

  13. Fine-scale population dynamics in a marine fish species inferred from dynamic state-space models.

    PubMed

    Rogers, Lauren A; Storvik, Geir O; Knutsen, Halvor; Olsen, Esben M; Stenseth, Nils C

    2017-07-01

    Identifying the spatial scale of population structuring is critical for the conservation of natural populations and for drawing accurate ecological inferences. However, population studies often use spatially aggregated data to draw inferences about population trends and drivers, potentially masking ecologically relevant population sub-structure and dynamics. The goals of this study were to investigate how population dynamics models with and without spatial structure affect inferences on population trends and the identification of intrinsic drivers of population dynamics (e.g. density dependence). Specifically, we developed dynamic, age-structured, state-space models to test different hypotheses regarding the spatial structure of a population complex of coastal Atlantic cod (Gadus morhua). Data were from a 93-year survey of juvenile (age 0 and 1) cod sampled along >200 km of the Norwegian Skagerrak coast. We compared two models: one which assumes all sampled cod belong to one larger population, and a second which assumes that each fjord contains a unique population with locally determined dynamics. Using the best supported model, we then reconstructed the historical spatial and temporal dynamics of Skagerrak coastal cod. Cross-validation showed that the spatially structured model with local dynamics had better predictive ability. Furthermore, posterior predictive checks showed that a model which assumes one homogeneous population failed to capture the spatial correlation pattern present in the survey data. The spatially structured model indicated that population trends differed markedly among fjords, as did estimates of population parameters including density-dependent survival. Recent biomass was estimated to be at a near-record low all along the coast, but the finer scale model indicated that the decline occurred at different times in different regions. Warm temperatures were associated with poor recruitment, but local changes in habitat and fishing pressure may

  14. The effects of inference method, population sampling, and gene sampling on species tree inferences: an empirical study in slender salamanders (Plethodontidae: Batrachoseps).

    PubMed

    Jockusch, Elizabeth L; Martínez-Solano, Iñigo; Timpe, Elizabeth K

    2015-01-01

    Species tree methods are now widely used to infer the relationships among species from multilocus data sets. Many methods have been developed, which differ in whether gene and species trees are estimated simultaneously or sequentially, and in how gene trees are used to infer the species tree. While these methods perform well on simulated data, less is known about what impacts their performance on empirical data. We used a data set including five nuclear genes and one mitochondrial gene for 22 species of Batrachoseps to compare the effects of method of analysis, within-species sampling and gene sampling on species tree inferences. For this data set, the choice of inference method had the largest effect on the species tree topology. Exclusion of individual loci had large effects in *BEAST and STEM, but not in MP-EST. Different loci carried the greatest leverage in these different methods, showing that the causes of their disproportionate effects differ. Even though substantial information was present in the nuclear loci, the mitochondrial gene dominated the *BEAST species tree. This leverage is inherent to the mtDNA locus and results from its high variation and lower assumed ploidy. This mtDNA leverage may be problematic when mtDNA has undergone introgression, as is likely in this data set. By contrast, the leverage of RAG1 in STEM analyses does not reflect properties inherent to the locus, but rather results from a gene tree that is strongly discordant with all others, and is best explained by introgression between distantly related species. Within-species sampling was also important, especially in *BEAST analyses, as shown by differences in tree topology across 100 subsampled data sets. Despite the sensitivity of the species tree methods to multiple factors, five species groups, the relationships among these, and some relationships within them, are generally consistently resolved for Batrachoseps.

  15. Mitochondrial DNA inference between European populations of Tanymastix stagnalis and their glacial survival in Scandinavia

    PubMed Central

    Arukwe, Augustine; Langeland, Arnfinn

    2013-01-01

    The early observation from 1914 of Tanymastix stagnalis in Norway was not repeated recently, showing a rare and restricted distribution of this species. All four sampled localities were concentrated in the same area of the Trollheimen Mountains with altitudes of 900–1244 m above sea level. In March 2002, a new population of T. stagnalis was observed at about 50 km north of Madrid at an altitude of 1350 m. In general, all habitats with T. stagnalis were fishless shallow ponds and varied in size from 1 to about 300 m2. Natural variability of the global temperature is well accepted, but recent climate models have predicted increases in global average temperature. Based on the new biogeographical distribution, diurnal temperature variations, and biological evidence (inference with the analysis of mitochondria DNA), the immigration history of T. stagnalis was considered on the basis of two opposing immigration theories and in relation to the implications of global climate change. Two immigration theories, namely – the Tabula rasa and Nunatak, have prevailed in explaining the present distribution of plants and animals in Scandinavia. It was concluded that the rare occurrence of T. stagnalis in Norway fits into the Nunatak theory and that the species probably survived, at least, the last glaciation on Nunataks or coast refuges located in central northwestern Norway at Møre mountain and coast areas. PMID:24198945

  16. Mitochondrial DNA inference between European populations of Tanymastix stagnalis and their glacial survival in Scandinavia.

    PubMed

    Arukwe, Augustine; Langeland, Arnfinn

    2013-10-01

    The early observation from 1914 of Tanymastix stagnalis in Norway was not repeated recently, showing a rare and restricted distribution of this species. All four sampled localities were concentrated in the same area of the Trollheimen Mountains with altitudes of 900-1244 m above sea level. In March 2002, a new population of T. stagnalis was observed at about 50 km north of Madrid at an altitude of 1350 m. In general, all habitats with T. stagnalis were fishless shallow ponds and varied in size from 1 to about 300 m(2). Natural variability of the global temperature is well accepted, but recent climate models have predicted increases in global average temperature. Based on the new biogeographical distribution, diurnal temperature variations, and biological evidence (inference with the analysis of mitochondria DNA), the immigration history of T. stagnalis was considered on the basis of two opposing immigration theories and in relation to the implications of global climate change. Two immigration theories, namely - the Tabula rasa and Nunatak, have prevailed in explaining the present distribution of plants and animals in Scandinavia. It was concluded that the rare occurrence of T. stagnalis in Norway fits into the Nunatak theory and that the species probably survived, at least, the last glaciation on Nunataks or coast refuges located in central northwestern Norway at Møre mountain and coast areas.

  17. Inferring sex-biased dispersal from population genetic tools: a review.

    PubMed

    Prugnolle, F; de Meeus, T

    2002-03-01

    Sex-biased dispersal, where individuals of one sex stay or return to their natal site (or group) to breed while individuals of the other sex are prone to disperse, is a wide-spread pattern in vertebrate organisms. In general, mammals exhibit male-biased dispersal whereas birds exhibit female-bias. Dispersal estimates are often difficult to obtain from direct field observations. Here we describe different methods for inferring sex-specific dispersal using population genetic tools and discuss the problems they can raise. We distinguish two types of methods: those based on bi-parental markers (eg comparison of male/female relatedness, F(st) and assignment probabilities) and those relying on the comparison between markers with different modes of inheritance (eg mtDNA markers and microsatellites). Finally, we discuss statistical problems that are encountered with these different methods (eg pseudoreplication, problems due to the comparison of distinct markers). While the genetic methods to detect sex-biased dispersal are now relatively well developed, their interpretation can prove problematic due to the confounding effects of factors such as the mating system of the species. Moreover, the relative power of these methods is not well known and requires further investigation.

  18. Population Genetic Analysis Infers Migration Pathways of Phytophthora ramorum in US Nurseries

    PubMed Central

    Goss, Erica M.; Larsen, Meg; Chastagner, Gary A.; Givens, Donald R.; Grünwald, Niklaus J.

    2009-01-01

    Recently introduced, exotic plant pathogens may exhibit low genetic diversity and be limited to clonal reproduction. However, rapidly mutating molecular markers such as microsatellites can reveal genetic variation within these populations and be used to model putative migration patterns. Phytophthora ramorum is the exotic pathogen, discovered in the late 1990s, that is responsible for sudden oak death in California forests and ramorum blight of common ornamentals. The nursery trade has moved this pathogen from source populations on the West Coast to locations across the United States, thus risking introduction to other native forests. We examined the genetic diversity of P. ramorum in United States nurseries by microsatellite genotyping 279 isolates collected from 19 states between 2004 and 2007. Of the three known P. ramorum clonal lineages, the most common and genetically diverse lineage in the sample was NA1. Two eastward migration pathways were revealed in the clustering of NA1 isolates into two groups, one containing isolates from Connecticut, Oregon, and Washington and the other isolates from California and the remaining states. This finding is consistent with trace forward analyses conducted by the US Department of Agriculture's Animal and Plant Health Inspection Service. At the same time, genetic diversities in several states equaled those observed in California, Oregon, and Washington and two-thirds of multilocus genotypes exhibited limited geographic distributions, indicating that mutation was common during or subsequent to migration. Together, these data suggest that migration, rapid mutation, and genetic drift all play a role in structuring the genetic diversity of P. ramorum in US nurseries. This work demonstrates that fast-evolving genetic markers can be used to examine the evolutionary processes acting on recently introduced pathogens and to infer their putative migration patterns, thus showing promise for the application of forensics to plant

  19. Movements of diadromous fish in large unregulated tropical rivers inferred from geochemical tracers.

    PubMed

    Walther, Benjamin D; Dempster, Tim; Letnic, Mike; McCulloch, Malcolm T

    2011-04-06

    Patterns of migration and habitat use in diadromous fishes can be highly variable among individuals. Most investigations into diadromous movement patterns have been restricted to populations in regulated rivers, and little information exists for those in unregulated catchments. We quantified movements of migratory barramundi Lates calcarifer (Bloch) in two large unregulated rivers in northern Australia using both elemental (Sr/Ba) and isotope ((87)Sr/(86)Sr) ratios in aragonitic ear stones, or otoliths. Chemical life history profiles indicated significant individual variation in habitat use, particularly among chemically distinct freshwater habitats within a catchment. A global zoning algorithm was used to quantify distinct changes in chemical signatures across profiles. This algorithm identified between 2 and 6 distinct chemical habitats in individual profiles, indicating variable movement among habitats. Profiles of (87)Sr/(86)Sr ratios were notably distinct among individuals, with highly radiogenic values recorded in some otoliths. This variation suggested that fish made full use of habitats across the entire catchment basin. Our results show that unrestricted movement among freshwater habitats is an important component of diadromous life histories for populations in unregulated systems.

  20. Movements of Diadromous Fish in Large Unregulated Tropical Rivers Inferred from Geochemical Tracers

    PubMed Central

    Walther, Benjamin D.; Dempster, Tim; Letnic, Mike; McCulloch, Malcolm T.

    2011-01-01

    Patterns of migration and habitat use in diadromous fishes can be highly variable among individuals. Most investigations into diadromous movement patterns have been restricted to populations in regulated rivers, and little information exists for those in unregulated catchments. We quantified movements of migratory barramundi Lates calcarifer (Bloch) in two large unregulated rivers in northern Australia using both elemental (Sr/Ba) and isotope (87Sr/86Sr) ratios in aragonitic ear stones, or otoliths. Chemical life history profiles indicated significant individual variation in habitat use, particularly among chemically distinct freshwater habitats within a catchment. A global zoning algorithm was used to quantify distinct changes in chemical signatures across profiles. This algorithm identified between 2 and 6 distinct chemical habitats in individual profiles, indicating variable movement among habitats. Profiles of 87Sr/86Sr ratios were notably distinct among individuals, with highly radiogenic values recorded in some otoliths. This variation suggested that fish made full use of habitats across the entire catchment basin. Our results show that unrestricted movement among freshwater habitats is an important component of diadromous life histories for populations in unregulated systems. PMID:21494693

  1. minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information

    PubMed Central

    Meyer, Patrick E; Lafitte, Frédéric; Bontempi, Gianluca

    2008-01-01

    Results This paper presents the R/Bioconductor package minet (version 1.1.6) which provides a set of functions to infer mutual information networks from a dataset. Once fed with a microarray dataset, the package returns a network where nodes denote genes, edges model statistical dependencies between genes and the weight of an edge quantifies the statistical evidence of a specific (e.g transcriptional) gene-to-gene interaction. Four different entropy estimators are made available in the package minet (empirical, Miller-Madow, Schurmann-Grassberger and shrink) as well as four different inference methods, namely relevance networks, ARACNE, CLR and MRNET. Also, the package integrates accuracy assessment tools, like F-scores, PR-curves and ROC-curves in order to compare the inferred network with a reference one. Conclusion The package minet provides a series of tools for inferring transcriptional networks from microarray data. It is freely available from the Comprehensive R Archive Network (CRAN) as well as from the Bioconductor website. PMID:18959772

  2. Inferences of Recent and Ancient Human Population History Using Genetic and Non-Genetic Data

    ERIC Educational Resources Information Center

    Kitchen, Andrew

    2008-01-01

    I have adopted complementary approaches to inferring human demographic history utilizing human and non-human genetic data as well as cultural data. These complementary approaches form an interdisciplinary perspective that allows one to make inferences of human history at varying timescales, from the events that occurred tens of thousands of years…

  3. Inferences of Recent and Ancient Human Population History Using Genetic and Non-Genetic Data

    ERIC Educational Resources Information Center

    Kitchen, Andrew

    2008-01-01

    I have adopted complementary approaches to inferring human demographic history utilizing human and non-human genetic data as well as cultural data. These complementary approaches form an interdisciplinary perspective that allows one to make inferences of human history at varying timescales, from the events that occurred tens of thousands of years…

  4. Inferring cortical function in the mouse visual system through large-scale systems neuroscience.

    PubMed

    Hawrylycz, Michael; Anastassiou, Costas; Arkhipov, Anton; Berg, Jim; Buice, Michael; Cain, Nicholas; Gouwens, Nathan W; Gratiy, Sergey; Iyer, Ramakrishnan; Lee, Jung Hoon; Mihalas, Stefan; Mitelut, Catalin; Olsen, Shawn; Reid, R Clay; Teeter, Corinne; de Vries, Saskia; Waters, Jack; Zeng, Hongkui; Koch, Christof

    2016-07-05

    The scientific mission of the Project MindScope is to understand neocortex, the part of the mammalian brain that gives rise to perception, memory, intelligence, and consciousness. We seek to quantitatively evaluate the hypothesis that neocortex is a relatively homogeneous tissue, with smaller functional modules that perform a common computational function replicated across regions. We here focus on the mouse as a mammalian model organism with genetics, physiology, and behavior that can be readily studied and manipulated in the laboratory. We seek to describe the operation of cortical circuitry at the computational level by comprehensively cataloging and characterizing its cellular building blocks along with their dynamics and their cell type-specific connectivities. The project is also building large-scale experimental platforms (i.e., brain observatories) to record the activity of large populations of cortical neurons in behaving mice subject to visual stimuli. A primary goal is to understand the series of operations from visual input in the retina to behavior by observing and modeling the physical transformations of signals in the corticothalamic system. We here focus on the contribution that computer modeling and theory make to this long-term effort.

  5. Inferring cortical function in the mouse visual system through large-scale systems neuroscience

    PubMed Central

    Hawrylycz, Michael; Anastassiou, Costas; Arkhipov, Anton; Berg, Jim; Buice, Michael; Cain, Nicholas; Gouwens, Nathan W.; Gratiy, Sergey; Iyer, Ramakrishnan; Lee, Jung Hoon; Mihalas, Stefan; Mitelut, Catalin; Olsen, Shawn; Reid, R. Clay; Teeter, Corinne; de Vries, Saskia; Waters, Jack; Zeng, Hongkui; Koch, Christof

    2016-01-01

    The scientific mission of the Project MindScope is to understand neocortex, the part of the mammalian brain that gives rise to perception, memory, intelligence, and consciousness. We seek to quantitatively evaluate the hypothesis that neocortex is a relatively homogeneous tissue, with smaller functional modules that perform a common computational function replicated across regions. We here focus on the mouse as a mammalian model organism with genetics, physiology, and behavior that can be readily studied and manipulated in the laboratory. We seek to describe the operation of cortical circuitry at the computational level by comprehensively cataloging and characterizing its cellular building blocks along with their dynamics and their cell type-specific connectivities. The project is also building large-scale experimental platforms (i.e., brain observatories) to record the activity of large populations of cortical neurons in behaving mice subject to visual stimuli. A primary goal is to understand the series of operations from visual input in the retina to behavior by observing and modeling the physical transformations of signals in the corticothalamic system. We here focus on the contribution that computer modeling and theory make to this long-term effort. PMID:27382147

  6. The Large Impact Process Inferred from the Geology of Lunar Multiring Basins

    NASA Technical Reports Server (NTRS)

    Spudis, Paul D.

    1994-01-01

    The study of the geology of multiring impact basins on the Moon over the past ten years has given us a rudimentary understanding of how these large structures have formed and evolved on the Moon and other bodies. Two-ring basins on the Moon begin to form at diameters of about 300 km; the transition diameter at which more than two rings appear is uncertain, but it appears to be between 400 and 500 km in diameter. Inner rings tend to be made up of clusters or aligned segments of massifs and are arranged into a crudely concentric pattern; scarp-like elements may or may not be present. Outer rings are much more scarp-like and massifs are rare to absent. Basins display textured deposits, interpreted as ejecta, extending roughly an apparent basin radius exterior to the main topographic rim. Ejecta may have various morphologies, ranging from wormy and hummocky deposits to knobby surfaces; the causes of these variations are not known, but may be related to the energy regime in which the ejecta are deposited. Outside the limits of the textured ejecta are found both fields of satellitic craters (secondaries) and light plains deposits. Impact melt sheets are observed on the floors of relatively unflooded basins. Samples of impact melts from lunar basins have basaltic major-element chemistry, characterized by K, rare-earth elements (REE), P, and other trace elements of varying concentration (KREEP); ages are between 3.8 and 3.9 Ga. These lithologies cannot be produced through the fusion of known pristine (plutonic) rock types, suggesting the occurrence of unknown lithologies within the Moon. These melts were probably generated at middle to lower crustal levels. Ejecta compositions, preservation of pre-basin topography, and deposit morphologies all indicate that the excavation cavity of multiring basins is between about 0.4 and 0.6 times the diameter of the apparent crater diameter. Basin depths of excavation can be inferred from the composition of basin ejecta. A variety of

  7. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness.

    PubMed

    Conomos, Matthew P; Miller, Michael B; Thornton, Timothy A

    2015-05-01

    Population structure inference with genetic data has been motivated by a variety of applications in population genetics and genetic association studies. Several approaches have been proposed for the identification of genetic ancestry differences in samples where study participants are assumed to be unrelated, including principal components analysis (PCA), multidimensional scaling (MDS), and model-based methods for proportional ancestry estimation. Many genetic studies, however, include individuals with some degree of relatedness, and existing methods for inferring genetic ancestry fail in related samples. We present a method, PC-AiR, for robust population structure inference in the presence of known or cryptic relatedness. PC-AiR utilizes genome-screen data and an efficient algorithm to identify a diverse subset of unrelated individuals that is representative of all ancestries in the sample. The PC-AiR method directly performs PCA on the identified ancestry representative subset and then predicts components of variation for all remaining individuals based on genetic similarities. In simulation studies and in applications to real data from Phase III of the HapMap Project, we demonstrate that PC-AiR provides a substantial improvement over existing approaches for population structure inference in related samples. We also demonstrate significant efficiency gains, where a single axis of variation from PC-AiR provides better prediction of ancestry in a variety of structure settings than using 10 (or more) components of variation from widely used PCA and MDS approaches. Finally, we illustrate that PC-AiR can provide improved population stratification correction over existing methods in genetic association studies with population structure and relatedness.

  8. Comparison of algorithms to infer genetic population structure from unlinked molecular markers.

    PubMed

    Peña-Malavera, Andrea; Bruno, Cecilia; Fernandez, Elmer; Balzarini, Monica

    2014-08-01

    Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high FST) and two numbers of sub-populations (K=3 and K=5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence (FST=0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data.

  9. Inferring R0 in emerging epidemics—the effect of common population structure is small

    PubMed Central

    Ball, Frank; Dhersin, Jean-Stéphane; Tran, Viet Chi; Wallinga, Jacco; Britton, Tom

    2016-01-01

    When controlling an emerging outbreak of an infectious disease, it is essential to know the key epidemiological parameters, such as the basic reproduction number R0 and the control effort required to prevent a large outbreak. These parameters are estimated from the observed incidence of new cases and information about the infectious contact structures of the population in which the disease spreads. However, the relevant infectious contact structures for new, emerging infections are often unknown or hard to obtain. Here, we show that, for many common true underlying heterogeneous contact structures, the simplification to neglect such structures and instead assume that all contacts are made homogeneously in the whole population results in conservative estimates for R0 and the required control effort. This means that robust control policies can be planned during the early stages of an outbreak, using such conservative estimates of the required control effort. PMID:27581480

  10. Consistency and inconsistency of consensus methods for inferring species trees from gene trees in the presence of ancestral population structure

    PubMed Central

    DeGiorgio, Michael; Rosenberg, Noah A.

    2016-01-01

    In the last few years, several statistically consistent consensus methods for species tree inference have been devised that are robust to the gene tree discordance caused by incomplete lineage sorting in unstructured ancestral populations. One source of gene tree discordance that has only recently been identified as a potential obstacle for phylogenetic inference is ancestral population structure. In this article, we describe a general model of ancestral population structure, and by relying on a single carefully constructed example scenario, we show that the consensus methods Democratic Vote, STEAC, STAR, R* Consensus, Rooted Triple Consensus, Minimize Deep Coalescences, and Majority-Rule Consensus are statistically inconsistent under the model. We find that among the consensus methods evaluated, the only method that is statistically consistent in the presence of ancestral population structure is GLASS/Maximum Tree. We use simulations to evaluate the behavior of the various consensus methods in a model with ancestral population structure, showing that as the number of gene trees increases, estimates on the basis of GLASS/Maximum Tree approach the true species tree topology irrespective of the level of population structure, whereas estimates based on the remaining methods only approach the true species tree topology if the level of structure is low. However, through simulations using species trees both with and without ancestral population structure, we show that GLASS/Maximum Tree performs unusually poorly on gene trees inferred from alignments with little information. This practical limitation of GLASS/Maximum Tree together with the inconsistency of other methods prompts the need for both further testing of additional existing methods and development of novel methods under conditions that incorporate ancestral population structure. PMID:27086043

  11. Length Distribution of Ancestral Tracks under a General Admixture Model and Its Applications in Population History Inference

    PubMed Central

    Ni, Xumin; Yang, Xiong; Guo, Wei; Yuan, Kai; Zhou, Ying; Ma, Zhiming; Xu, Shuhua

    2016-01-01

    The length of ancestral tracks decays with the passing of generations which can be used to infer population admixture histories. Previous studies have shown the power in recovering the histories of admixed populations via the length distributions of ancestral tracks even under simple models. We believe that the deduction of length distributions under a general model will greatly elevate the power. Here we first deduced the length distributions under a general model and proposed general principles in parameter estimation and model selection with the deduced length distributions. Next, we focused on studying the length distributions and its applications under three typical special cases. Extensive simulations showed that the length distributions of ancestral tracks were well predicted by our theoretical framework. We further developed a new method, AdmixInfer, based on the length distributions and good performance was observed when it was applied to infer population histories under the three typical models. Notably, our method was insensitive to demographic history, sample size and threshold to discard short tracks. Finally, good performance was also observed when applied to some real datasets of African Americans, Mexicans and South Asian populations from the HapMap project and the Human Genome Diversity Project. PMID:26818889

  12. Length Distribution of Ancestral Tracks under a General Admixture Model and Its Applications in Population History Inference.

    PubMed

    Ni, Xumin; Yang, Xiong; Guo, Wei; Yuan, Kai; Zhou, Ying; Ma, Zhiming; Xu, Shuhua

    2016-01-28

    The length of ancestral tracks decays with the passing of generations which can be used to infer population admixture histories. Previous studies have shown the power in recovering the histories of admixed populations via the length distributions of ancestral tracks even under simple models. We believe that the deduction of length distributions under a general model will greatly elevate the power. Here we first deduced the length distributions under a general model and proposed general principles in parameter estimation and model selection with the deduced length distributions. Next, we focused on studying the length distributions and its applications under three typical special cases. Extensive simulations showed that the length distributions of ancestral tracks were well predicted by our theoretical framework. We further developed a new method, AdmixInfer, based on the length distributions and good performance was observed when it was applied to infer population histories under the three typical models. Notably, our method was insensitive to demographic history, sample size and threshold to discard short tracks. Finally, good performance was also observed when applied to some real datasets of African Americans, Mexicans and South Asian populations from the HapMap project and the Human Genome Diversity Project.

  13. WKB theory of large deviations in stochastic populations

    NASA Astrophysics Data System (ADS)

    Assaf, Michael; Meerson, Baruch

    2017-06-01

    Stochasticity can play an important role in the dynamics of biologically relevant populations. These span a broad range of scales: from intra-cellular populations of molecules to population of cells and then to groups of plants, animals and people. Large deviations in stochastic population dynamics—such as those determining population extinction, fixation or switching between different states—are presently in a focus of attention of statistical physicists. We review recent progress in applying different variants of dissipative WKB approximation (after Wentzel, Kramers and Brillouin) to this class of problems. The WKB approximation allows one to evaluate the mean time and/or probability of population extinction, fixation and switches resulting from either intrinsic (demographic) noise, or a combination of the demographic noise and environmental variations, deterministic or random. We mostly cover well-mixed populations, single and multiple, but also briefly consider populations on heterogeneous networks and spatial populations. The spatial setting also allows one to study large fluctuations of the speed of biological invasions. Finally, we briefly discuss possible directions of future work.

  14. Low-Pass Genome-Wide Sequencing and Variant Inference Using Identity-by-Descent in an Isolated Human Population

    PubMed Central

    Gusev, A.; Shah, M. J.; Kenny, E. E.; Ramachandran, A.; Lowe, J. K.; Salit, J.; Lee, C. C.; Levandowsky, E. C.; Weaver, T. N.; Doan, Q. C.; Peckham, H. E.; McLaughlin, S. F.; Lyons, M. R.; Sheth, V. N.; Stoffel, M.; De La Vega, F. M.; Friedman, J. M.; Breslow, J. L.

    2012-01-01

    Whole-genome sequencing in an isolated population with few founders directly ascertains variants from the population bottleneck that may be rare elsewhere. In such populations, shared haplotypes allow imputation of variants in unsequenced samples without resorting to complex statistical methods as in studies of outbred cohorts. We focus on an isolated population cohort from the Pacific Island of Kosrae, Micronesia, where we previously collected SNP array and rich phenotype data for the majority of the population. We report identification of long regions with haplotypes co-inherited between pairs of individuals and methodology to leverage such shared genetic content for imputation. Our estimates show that sequencing as few as 40 personal genomes allows for inference in up to 60% of the 3000-person cohort at the average locus. We ascertained a pilot data set of whole-genome sequences from seven Kosraean individuals, with average 5× coverage. This assay identified 5,735,306 unique sites of which 1,212,831 were previously unknown. Additionally, these variants are unusually enriched for alleles that are rare in other populations when compared to geographic neighbors (published Korean genome SJK). We used the presence of shared haplotypes between the seven Kosraen individuals to estimate expected imputation accuracy of known and novel homozygous variants at 99.6% and 97.3%, respectively. This study presents whole-genome analysis of a homogenous isolate population with emphasis on optimal rare variant inference. PMID:22135348

  15. Southeast Asian origins of five Hill Tribe populations and correlation of genetic to linguistic relationships inferred with genome-wide SNP data.

    PubMed

    Listman, J B; Malison, R T; Sanichwankul, K; Ittiwut, C; Mutirangura, A; Gelernter, J

    2011-02-01

    In Thailand, the term Hill Tribe is used to describe populations whose members traditionally practice slash and burn agriculture and reside in the mountains. These tribes are thought to have migrated throughout Asia for up to 5,000 years, including migrations through Southern China and/or Southeast Asia. There have been continuous migrations southward from China into Thailand for approximately the past thousand years and the present geographic range of any given tribe straddles multiple political borders. As none of these populations have autochthonous scripts, written histories have until recently, been externally produced. Northern Asian, Tibetan, and Siberian origins of Hill Tribes have been proposed. All purport endogamy and have nonmutually intelligible languages. To test hypotheses regarding the geographic origins of these populations, relatedness and migrations among them and neighboring populations, and whether their genetic relationships correspond with their linguistic relationships, we analyzed 2,445 genome-wide SNP markers in 118 individuals from five Thai Hill Tribe populations (Akha, Hmong, Karen, Lahu, and Lisu), 90 individuals from majority Thai populations, and 826 individuals from Asian and Oceanean HGDP and HapMap populations using a Bayesian clustering method. Considering these results within the context of results ofrecent large-scale studies of Asian geographic genetic variation allows us to infer a shared Southeast Asian origin of these five Hill Tribe populations as well ancestry components that distinguish among them seen in successive levels of clustering. In addition, the inferred level of shared ancestry among the Hill Tribes corresponds well to relationships among their languages.

  16. Southeast Asian origins of five Hill Tribe populations and correlation of genetic to linguistic relationships inferred with genome-wide SNP data

    PubMed Central

    Listman, JB; Malison, RT; Sanichwankul, K; Ittiwut, C; Mutirangura, A; Gelernter, J

    2010-01-01

    In Thailand, the term Hill Tribe is used to describe populations whose members traditionally practice slash and burn agriculture and reside in the mountains. These tribes are thought to have migrated throughout Asia for up to 5,000 years, including migrations through Southern China and/or Southeast Asia. There have been continuous migrations southward from China into Thailand for approximately the past thousand years and the present geographic range of any given tribe straddles multiple political borders. As none of these populations have autochthonous scripts, written histories have until recently, been externally produced. Northern Asian, Tibetan, and Siberian origins of Hill Tribes have been proposed. All purport endogamy and have non-mutually intelligible languages. In order to test hypotheses regarding the geographic origins of these populations, relatedness and migrations among them and neighboring populations, and whether their genetic relationships correspond with their linguistic relationships, we analyzed 2445 genome-wide SNP markers in 118 individuals from five Thai Hill Tribe populations (Akha, Hmong, Karen, Lahu, and Lisu), 90 individuals from majority Thai populations, and 826 individuals from Asian and Oceanean HGDP and HapMap populations using a Bayesian clustering method. Considering these results within the context of results of recent large-scale studies of Asian geographic genetic variation allows us to infer a shared Southeast Asian origin of these five Hill Tribe populations as well ancestry components that distinguish among them seen in successive levels of clustering. In addition, the inferred level of shared ancestry among the Hill Tribes corresponds well to relationships among their languages. PMID:20979205

  17. A statistical model for brain networks inferred from large-scale electrophysiological signals.

    PubMed

    Obando, Catalina; De Vico Fallani, Fabrizio

    2017-03-01

    Network science has been extensively developed to characterize the structural properties of complex systems, including brain networks inferred from neuroimaging data. As a result of the inference process, networks estimated from experimentally obtained biological data represent one instance of a larger number of realizations with similar intrinsic topology. A modelling approach is therefore needed to support statistical inference on the bottom-up local connectivity mechanisms influencing the formation of the estimated brain networks. Here, we adopted a statistical model based on exponential random graph models (ERGMs) to reproduce brain networks, or connectomes, estimated by spectral coherence between high-density electroencephalographic (EEG) signals. ERGMs are made up by different local graph metrics, whereas the parameters weight the respective contribution in explaining the observed network. We validated this approach in a dataset of N = 108 healthy subjects during eyes-open (EO) and eyes-closed (EC) resting-state conditions. Results showed that the tendency to form triangles and stars, reflecting clustering and node centrality, better explained the global properties of the EEG connectomes than other combinations of graph metrics. In particular, the synthetic networks generated by this model configuration replicated the characteristic differences found in real brain networks, with EO eliciting significantly higher segregation in the alpha frequency band (8-13 Hz) than EC. Furthermore, the fitted ERGM parameter values provided complementary information showing that clustering connections are significantly more represented from EC to EO in the alpha range, but also in the beta band (14-29 Hz), which is known to play a crucial role in cortical processing of visual input and externally oriented attention. Taken together, these findings support the current view of the functional segregation and integration of the brain in terms of modules and hubs, and provide a

  18. Scalable Inference and Learning in Very Large Graphical Models Patterned after the Primate Visual Cortex

    DTIC Science & Technology

    2008-04-07

    entitled "A Computational Model of the Cerebral Cortex" [5]. The paper (he scribed a graphical mo(lcl of the visual cortex inspired bY David Niumford’s...Proceedings of the ninth IEEE International Conference on Computer Vision, volume 1, pages 432-439, 2003. 117] Tai Sing Lee and David ’Mumford...Hierarchical Bayesian inference in the visual cortex. Journal of the Opthcal Socicty of America, 2(7):1434-1,148, July 2003. 16~] David Lowe. Distitict ivi

  19. Large-scale spatial population databases in infectious disease research

    PubMed Central

    2012-01-01

    Modelling studies on the spatial distribution and spread of infectious diseases are becoming increasingly detailed and sophisticated, with global risk mapping and epidemic modelling studies now popular. Yet, in deriving populations at risk of disease estimates, these spatial models must rely on existing global and regional datasets on population distribution, which are often based on outdated and coarse resolution data. Moreover, a variety of different methods have been used to model population distribution at large spatial scales. In this review we describe the main global gridded population datasets that are freely available for health researchers and compare their construction methods, and highlight the uncertainties inherent in these population datasets. We review their application in past studies on disease risk and dynamics, and discuss how the choice of dataset can affect results. Moreover, we highlight how the lack of contemporary, detailed and reliable data on human population distribution in low income countries is proving a barrier to obtaining accurate large-scale estimates of population at risk and constructing reliable models of disease spread, and suggest research directions required to further reduce these barriers. PMID:22433126

  20. ddClone: joint statistical inference of clonal populations from single cell and bulk tumour sequencing data.

    PubMed

    Salehi, Sohrab; Steif, Adi; Roth, Andrew; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P

    2017-03-01

    Next-generation sequencing (NGS) of bulk tumour tissue can identify constituent cell populations in cancers and measure their abundance. This requires computational deconvolution of allelic counts from somatic mutations, which may be incapable of fully resolving the underlying population structure. Single cell sequencing (SCS) is a more direct method, although its replacement of NGS is impeded by technical noise and sampling limitations. We propose ddClone, which analytically integrates NGS and SCS data, leveraging their complementary attributes through joint statistical inference. We show on real and simulated datasets that ddClone produces more accurate results than can be achieved by either method alone.

  1. Large pseudocounts and L2-norm penalties are necessary for the mean-field inference of Ising and Potts models

    NASA Astrophysics Data System (ADS)

    Barton, J. P.; Cocco, S.; De Leonardis, E.; Monasson, R.

    2014-07-01

    The mean-field (MF) approximation offers a simple, fast way to infer direct interactions between elements in a network of correlated variables, a common, computationally challenging problem with practical applications in fields ranging from physics and biology to the social sciences. However, MF methods achieve their best performance with strong regularization, well beyond Bayesian expectations, an empirical fact that is poorly understood. In this work, we study the influence of pseudocount and L2-norm regularization schemes on the quality of inferred Ising or Potts interaction networks from correlation data within the MF approximation. We argue, based on the analysis of small systems, that the optimal value of the regularization strength remains finite even if the sampling noise tends to zero, in order to correct for systematic biases introduced by the MF approximation. Our claim is corroborated by extensive numerical studies of diverse model systems and by the analytical study of the m-component spin model for large but finite m. Additionally, we find that pseudocount regularization is robust against sampling noise and often outperforms L2-norm regularization, particularly when the underlying network of interactions is strongly heterogeneous. Much better performances are generally obtained for the Ising model than for the Potts model, for which only couplings incoming onto medium-frequency symbols are reliably inferred.

  2. Large pseudocounts and L2-norm penalties are necessary for the mean-field inference of Ising and Potts models.

    PubMed

    Barton, J P; Cocco, S; De Leonardis, E; Monasson, R

    2014-07-01

    The mean-field (MF) approximation offers a simple, fast way to infer direct interactions between elements in a network of correlated variables, a common, computationally challenging problem with practical applications in fields ranging from physics and biology to the social sciences. However, MF methods achieve their best performance with strong regularization, well beyond Bayesian expectations, an empirical fact that is poorly understood. In this work, we study the influence of pseudocount and L(2)-norm regularization schemes on the quality of inferred Ising or Potts interaction networks from correlation data within the MF approximation. We argue, based on the analysis of small systems, that the optimal value of the regularization strength remains finite even if the sampling noise tends to zero, in order to correct for systematic biases introduced by the MF approximation. Our claim is corroborated by extensive numerical studies of diverse model systems and by the analytical study of the m-component spin model for large but finite m. Additionally, we find that pseudocount regularization is robust against sampling noise and often outperforms L(2)-norm regularization, particularly when the underlying network of interactions is strongly heterogeneous. Much better performances are generally obtained for the Ising model than for the Potts model, for which only couplings incoming onto medium-frequency symbols are reliably inferred.

  3. Fine-scale variation in meiotic recombination in Mimulus inferred from population shotgun sequencing

    SciTech Connect

    Hellsten, Uffe; Wright, Kevin M.; Jenkins, Jerry; Shu, Shengqiang; Yuan, Yao-Wu; Wessler, Susan R.; Schmutz, Jeremy; Willis, John H.; Rokhsar, Daniel S.

    2013-11-13

    Meiotic recombination rates can vary widely across genomes, with hotspots of intense activity interspersed among cold regions. In yeast, hotspots tend to occur in promoter regions of genes, whereas in humans and mice hotspots are largely defined by binding sites of the PRDM9 protein. To investigate the detailed recombination pattern in a flowering plant we use shotgun resequencing of a wild population of the monkeyflower Mimulus guttatus to precisely locate over 400,000 boundaries of historic crossovers or gene conversion tracts. Their distribution defines some 13,000 hotspots of varying strengths, interspersed with cold regions of undetectably low recombination. Average recombination rates peak near starts of genes and fall off sharply, exhibiting polarity. Within genes, recombination tracts are more likely to terminate in exons than in introns. The general pattern is similar to that observed in yeast, as well as in PRDM9-knockout mice, suggesting that recombination initiation described here in Mimulus may reflect ancient and conserved eukaryotic mechanisms

  4. Mutation load and the extinction of large populations

    NASA Astrophysics Data System (ADS)

    Bernardes, A. T.

    1996-02-01

    In the time evolution of finite populations, the accumulation of harmful mutations in further generations might lead to a temporal decay in the mean fitness of the whole population that, after sufficient time, would reduce population size and so lead to extinction. This joint action of mutation load and population reduction is called Mutational Meltdown and is usually considered only to occur in small asexual or very small sexual populations. However, the problem of extinction cannot be discussed in a proper way if one previously assumes the existence of an equilibrium state, as initially discussed in this paper. By performing simulations in a genetically inspired model for time-changing populations, we show that mutational meltdown also occurs in large asexual populations and that the mean time to extinction is a nonmonotonic function of the selection coefficient. The stochasticity of the extinction process is also discussed. The extinction of small sexual N ∼ 700 populations is shown and our results confirm the assumption that the existence of recombination might be a powerful mechanism to avoid extinction.

  5. Approximate Inference for Time-Varying Interactions and Macroscopic Dynamics of Neural Populations

    PubMed Central

    Obermayer, Klaus

    2017-01-01

    The models in statistical physics such as an Ising model offer a convenient way to characterize stationary activity of neural populations. Such stationary activity of neurons may be expected for recordings from in vitro slices or anesthetized animals. However, modeling activity of cortical circuitries of awake animals has been more challenging because both spike-rates and interactions can change according to sensory stimulation, behavior, or an internal state of the brain. Previous approaches modeling the dynamics of neural interactions suffer from computational cost; therefore, its application was limited to only a dozen neurons. Here by introducing multiple analytic approximation methods to a state-space model of neural population activity, we make it possible to estimate dynamic pairwise interactions of up to 60 neurons. More specifically, we applied the pseudolikelihood approximation to the state-space model, and combined it with the Bethe or TAP mean-field approximation to make the sequential Bayesian estimation of the model parameters possible. The large-scale analysis allows us to investigate dynamics of macroscopic properties of neural circuitries underlying stimulus processing and behavior. We show that the model accurately estimates dynamics of network properties such as sparseness, entropy, and heat capacity by simulated data, and demonstrate utilities of these measures by analyzing activity of monkey V4 neurons as well as a simulated balanced network of spiking neurons. PMID:28095421

  6. The use of microsatellite variation to infer population structure and demographic history in a natural model system.

    PubMed

    Goldstein, D B; Roemer, G W; Smith, D A; Reich, D E; Bergman, A; Wayne, R K

    1999-02-01

    To assess the reliability of genetic markers it is important to compare inferences that are based on them to a priori expectations. In this article we present an analysis of microsatellite variation within and among populations of island foxes (Urocyon littoralis) on California's Channel Islands. We first show that microsatellite variation at a moderate number of loci (19) can provide an essentially perfect description of the boundaries between populations and an accurate representation of their historical relationships. We also show that the pattern of variation across unlinked microsatellite loci can be used to test whether population size has been constant or increasing. Application of these approaches to the island fox system indicates that microsatellite variation may carry considerably more information about population history than is currently being used.

  7. Urothelial cancer gene regulatory networks inferred from large-scale RNAseq, Bead and Oligo gene expression data.

    PubMed

    de Matos Simoes, Ricardo; Dalleau, Sabine; Williamson, Kate E; Emmert-Streib, Frank

    2015-05-14

    Urothelial pathogenesis is a complex process driven by an underlying network of interconnected genes. The identification of novel genomic target regions and gene targets that drive urothelial carcinogenesis is crucial in order to improve our current limited understanding of urothelial cancer (UC) on the molecular level. The inference of genome-wide gene regulatory networks (GRN) from large-scale gene expression data provides a promising approach for a detailed investigation of the underlying network structure associated to urothelial carcinogenesis. In our study we inferred and compared three GRNs by the application of the BC3Net inference algorithm to large-scale transitional cell carcinoma gene expression data sets from Illumina RNAseq (179 samples), Illumina Bead arrays (165 samples) and Affymetrix Oligo microarrays (188 samples). We investigated the structural and functional properties of GRNs for the identification of molecular targets associated to urothelial cancer. We found that the urothelial cancer (UC) GRNs show a significant enrichment of subnetworks that are associated with known cancer hallmarks including cell cycle, immune response, signaling, differentiation and translation. Interestingly, the most prominent subnetworks of co-located genes were found on chromosome regions 5q31.3 (RNAseq), 8q24.3 (Oligo) and 1q23.3 (Bead), which all represent known genomic regions frequently deregulated or aberated in urothelial cancer and other cancer types. Furthermore, the identified hub genes of the individual GRNs, e.g., HID1/DMC1 (tumor development), RNF17/TDRD4 (cancer antigen) and CYP4A11 (angiogenesis/ metastasis) are known cancer associated markers. The GRNs were highly dataset specific on the interaction level between individual genes, but showed large similarities on the biological function level represented by subnetworks. Remarkably, the RNAseq UC GRN showed twice the proportion of significant functional subnetworks. Based on our analysis of inferential

  8. Inferring the higher-order phylogeny of mosses (Bryophyta) and relatives using a large, multigene plastid data set.

    PubMed

    Chang, Ying; Graham, Sean W

    2011-05-01

    Investigating the early diversification of major clades requires well-corroborated and accurate phylogenetic inferences. We examined the performance of a large set of plastid genes for inferring the broad phylogenetic backbone of mosses-the second largest major clade of land plants-and their nearest relatives. We surveyed 14-17 plastid genes from a broadly representative taxonomic sampling of the major bryophyte lineages, including all major lines of non-peristomate mosses. We examined how well these new data corroborated or contradicted the findings of other studies, and investigated the effect of removing rapidly evolving characters. KEY RESULT: We inferred major clades with at least as strong support as other studies that used more taxa. We corroborated current views of overall embryophyte relationships, i.e., (liverworts, (mosses, (hornworts, tracheophytes))), with strong maximum likelihood (ML) bootstrap support, and also placed Zygnematales as the sister group of embryophytes with moderate ML bootstrap support. Within mosses, we confirmed Oedipodiaceae as the sister group of the large clade of peristomate taxa. Likelihood analysis also firmly placed Takakiaceae as the sister group of all other mosses, a strong conflict with parsimony results. Parsimony converged on the Takakia-sister result when rapidly evolving characters were removed, depending on the tree used to classify the site rates. Our findings broadly support the utility of a 14-gene set from the plastome for future, more densely sampled phylogenetic studies of mosses and relatives, potentially complementing anticipated whole-plastome studies. Likelihood and parsimony conflicts flag possible instances of long-branch attraction, including one involving the earliest split in moss phylogeny.

  9. A viterbi approach to topology inference for large scale endomicroscopy video mosaicing.

    PubMed

    Vercauteren, Tom; Rosa, Benoît; Dauguet, Julien

    2013-01-01

    Endomicroscopy allows in vivo and in situ imaging with cellular resolution. One limitation of endomicroscopy is the small field of view which can however be extended using mosaicing techniques. In this paper, we describe a methodological framework aiming to reconstruct a mosaic of endomicroscopic images acquired following a noisy robotized spiral trajectory. First, we infer the topology of the frames, that is the map of neighbors for every frame in the spiral. For this, we use a Viterbi algorithm considering every new acquired frame in the current branch of the spiral as an observation and the index of the best neighboring frame from the previous branch as the underlying state. Second, the estimated transformation between each spatial pair previously found is assessed. Mosaicing is performed based only on the pairs of frames for which the registration is considered successful. We tested our method on 3 spiral endomicroscopy videos each including more than 200 frames: a printed grid, an ex vivo tissue sample and an in vivo animal trial. Results were statistically significantly improved compared to reconstruction where only registration between successive frames was used.

  10. Phylogenetic Inferences Reveal a Large Extent of Novel Biodiversity in Chemically Rich Tropical Marine Cyanobacteria

    PubMed Central

    Gunasekera, Sarath P.; Gerwick, William H.

    2013-01-01

    Benthic marine cyanobacteria are known for their prolific biosynthetic capacities to produce structurally diverse secondary metabolites with biomedical application and their ability to form cyanobacterial harmful algal blooms. In an effort to provide taxonomic clarity to better guide future natural product drug discovery investigations and harmful algal bloom monitoring, this study investigated the taxonomy of tropical and subtropical natural product-producing marine cyanobacteria on the basis of their evolutionary relatedness. Our phylogenetic inferences of marine cyanobacterial strains responsible for over 100 bioactive secondary metabolites revealed an uneven taxonomic distribution, with a few groups being responsible for the vast majority of these molecules. Our data also suggest a high degree of novel biodiversity among natural product-producing strains that was previously overlooked by traditional morphology-based taxonomic approaches. This unrecognized biodiversity is primarily due to a lack of proper classification systems since the taxonomy of tropical and subtropical, benthic marine cyanobacteria has only recently been analyzed by phylogenetic methods. This evolutionary study provides a framework for a more robust classification system to better understand the taxonomy of tropical and subtropical marine cyanobacteria and the distribution of natural products in marine cyanobacteria. PMID:23315747

  11. Inference of Super-exponential Human Population Growth via Efficient Computation of the Site Frequency Spectrum for Generalized Models.

    PubMed

    Gao, Feng; Keinan, Alon

    2016-01-01

    The site frequency spectrum (SFS) and other genetic summary statistics are at the heart of many population genetic studies. Previous studies have shown that human populations have undergone a recent epoch of fast growth in effective population size. These studies assumed that growth is exponential, and the ensuing models leave an excess amount of extremely rare variants. This suggests that human populations might have experienced a recent growth with speed faster than exponential. Recent studies have introduced a generalized growth model where the growth speed can be faster or slower than exponential. However, only simulation approaches were available for obtaining summary statistics under such generalized models. In this study, we provide expressions to accurately and efficiently evaluate the SFS and other summary statistics under generalized models, which we further implement in a publicly available software. Investigating the power to infer deviation of growth from being exponential, we observed that adequate sample sizes facilitate accurate inference; e.g., a sample of 3000 individuals with the amount of data expected from exome sequencing allows observing and accurately estimating growth with speed deviating by ≥10% from that of exponential. Applying our inference framework to data from the NHLBI Exome Sequencing Project, we found that a model with a generalized growth epoch fits the observed SFS significantly better than the equivalent model with exponential growth (P-value [Formula: see text]). The estimated growth speed significantly deviates from exponential (P-value [Formula: see text]), with the best-fit estimate being of growth speed 12% faster than exponential.

  12. Inference of functional properties from large-scale analysis of enzyme superfamilies.

    PubMed

    Brown, Shoshana D; Babbitt, Patricia C

    2012-01-02

    As increasingly large amounts of data from genome and other sequencing projects become available, new approaches are needed to determine the functions of the proteins these genes encode. We show how large-scale computational analysis can help to address this challenge by linking functional information to sequence and structural similarities using protein similarity networks. Network analyses using three functionally diverse enzyme superfamilies illustrate the use of these approaches for facile updating and comparison of available structures for a large superfamily, for creation of functional hypotheses for metagenomic sequences, and to summarize the limits of our functional knowledge about even well studied superfamilies.

  13. Inferring Population Decline and Expansion From Microsatellite Data: A Simulation-Based Evaluation of the Msvar Method

    PubMed Central

    Girod, Christophe; Vitalis, Renaud; Leblois, Raphaël; Fréville, Hélène

    2011-01-01

    Reconstructing the demographic history of populations is a central issue in evolutionary biology. Using likelihood-based methods coupled with Monte Carlo simulations, it is now possible to reconstruct past changes in population size from genetic data. Using simulated data sets under various demographic scenarios, we evaluate the statistical performance of Msvar, a full-likelihood Bayesian method that infers past demographic change from microsatellite data. Our simulation tests show that Msvar is very efficient at detecting population declines and expansions, provided the event is neither too weak nor too recent. We further show that Msvar outperforms two moment-based methods (the M-ratio test and Bottleneck) for detecting population size changes, whatever the time and the severity of the event. The same trend emerges from a compilation of empirical studies. The latest version of Msvar provides estimates of the current and the ancestral population size and the time since the population started changing in size. We show that, in the absence of prior knowledge, Msvar provides little information on the mutation rate, which results in biased estimates and/or wide credibility intervals for each of the demographic parameters. However, scaling the population size parameters with the mutation rate and scaling the time with current population size, as coalescent theory requires, significantly improves the quality of the estimates for contraction but not for expansion scenarios. Finally, our results suggest that Msvar is robust to moderate departures from a strict stepwise mutation model. PMID:21385729

  14. Inferring Passenger Denial Behavior of Taxi Drivers from Large-Scale Taxi Traces.

    PubMed

    Zhang, Sihai; Wang, Zhiyang

    2016-01-01

    How to understand individual human actions is a fundamental question to modern science, which drives and incurs many social, technological, racial, religious and economic phenomena. Human dynamics tries to reveal the temporal pattern and internal mechanism of human actions in letter or electronic communications, from the perspective of continuous interactions among friends or acquaintances. For interactions between stranger to stranger, taxi industry provide fruitful phenomina and evidence to investigate the action decisions. In fact, one striking disturbing events commonly reported in taxi industry is passenger refusing or denial, whose reasons vary, including skin color, blind passenger, being a foreigner or too close destination, religion reasons and anti specific nationality, so that complaints about taxi passenger refusing have to be concerned and processed carefully by local governments. But more universal factors for this phenomena are of great significance, which might be fulfilled by big data research to obtain novel insights in this question. In this paper, we demonstrate the big data analytics application in revealing novel insights from massive taxi trace data, which, for the first time, validates the passengers denial in taxi industry and estimates the denial ratio in Beijing city. We first quantify the income differentiation facts among taxi drivers. Then we find out that choosing the drop-off places also contributes to the high income for taxi drivers, compared to the previous explanation of mobility intelligence. Moreover, we propose the pick-up, drop-off and grid diversity concepts and related diversity analysis suggest that, high income taxi drivers will deny passengers in some situations, so as to choose the passengers' destination they prefer. Finally we design an estimation method for denial ratio and infer that high income taxi drivers will deny passengers with 8.52% likelihood in Beijing. Our work exhibits the power of big data analysis in

  15. Inferring Passenger Denial Behavior of Taxi Drivers from Large-Scale Taxi Traces

    PubMed Central

    Zhang, Sihai; Wang, Zhiyang

    2016-01-01

    How to understand individual human actions is a fundamental question to modern science, which drives and incurs many social, technological, racial, religious and economic phenomena. Human dynamics tries to reveal the temporal pattern and internal mechanism of human actions in letter or electronic communications, from the perspective of continuous interactions among friends or acquaintances. For interactions between stranger to stranger, taxi industry provide fruitful phenomina and evidence to investigate the action decisions. In fact, one striking disturbing events commonly reported in taxi industry is passenger refusing or denial, whose reasons vary, including skin color, blind passenger, being a foreigner or too close destination, religion reasons and anti specific nationality, so that complaints about taxi passenger refusing have to be concerned and processed carefully by local governments. But more universal factors for this phenomena are of great significance, which might be fulfilled by big data research to obtain novel insights in this question. In this paper, we demonstrate the big data analytics application in revealing novel insights from massive taxi trace data, which, for the first time, validates the passengers denial in taxi industry and estimates the denial ratio in Beijing city. We first quantify the income differentiation facts among taxi drivers. Then we find out that choosing the drop-off places also contributes to the high income for taxi drivers, compared to the previous explanation of mobility intelligence. Moreover, we propose the pick-up, drop-off and grid diversity concepts and related diversity analysis suggest that, high income taxi drivers will deny passengers in some situations, so as to choose the passengers’ destination they prefer. Finally we design an estimation method for denial ratio and infer that high income taxi drivers will deny passengers with 8.52% likelihood in Beijing. Our work exhibits the power of big data analysis in

  16. The Use of a Satellite Climatological Data Set to Infer Large Scale Three Dimensional Flow Characteristics

    NASA Technical Reports Server (NTRS)

    Lerner, Jeffrey A.; Jedlovec, Gary J.; Atkinson, Robert J.

    1998-01-01

    Ever since the first satellite image loops from the 6.3 micron water vapor channel on the METEOSAT-1 in 1978, there have been numerous efforts (many to a great degree of success) to relate the water vapor radiance patterns to familiar atmospheric dynamic quantities. The realization of these efforts is becoming evident with the merging of satellite derived winds into predictive models (Velden et al., 1997; Swadley and Goerss, 1989). Another parameter that has been quantified from satellite water vapor channel measurements is upper tropospheric relative humidity (UTH) (e.g., Soden and Bretherton, 1996; Schmetz and Turpeinen, 1988). These humidity measurements, in turn, can be used to quantify upper tropospheric water vapor and its transport to more accurately diagnose climate changes (Lerner et al., 1998; Schmetz et al. 1995a) and quantify radiative processes in the upper troposphere. Also apparent in water vapor imagery animations are regions of subsiding and ascending air flow. Indeed, a component of the translated motions we observe are due to vertical velocities. The few attempts at exploiting this information have been met with a fair degree of success. Picon and Desbois (1990) statistically related Meteosat monthly mean water vapor radiances to six standard pressure levels of the European Centre for Medium Range Weather Forecast (ECMWF) model vertical velocities and found correlation coefficients of about 0.50 or less. This paper presents some preliminary results of viewing climatological satellite water vapor data in a different fashion. Specifically, we attempt to infer the three dimensional flow characteristics of the mid- to upper troposphere as portrayed by GOES VAS during the warm ENSO event (1987) and a subsequent cold period in 1998.

  17. Models of population-based analyses for data collected from large extended families.

    PubMed

    Wang, Wenyu; Lee, Elisa T; Howard, Barbara V; Fabsitz, Richard R; Devereux, Richard B; MacCluer, Jean W; Laston, Sandra; Comuzzie, Anthony G; Shara, Nawar M; Welty, Thomas K

    2010-12-01

    Large studies of extended families usually collect valuable phenotypic data that may have scientific value for purposes other than testing genetic hypotheses if the families were not selected in a biased manner. These purposes include assessing population-based associations of diseases with risk factors/covariates and estimating population characteristics such as disease prevalence and incidence. Relatedness among participants however, violates the traditional assumption of independent observations in these classic analyses. The commonly used adjustment method for relatedness in population-based analyses is to use marginal models, in which clusters (families) are assumed to be independent (unrelated) with a simple and identical covariance (family) structure such as those called independent, exchangeable and unstructured covariance structures. However, using these simple covariance structures may not be optimally appropriate for outcomes collected from large extended families, and may under- or over-estimate the variances of estimators and thus lead to uncertainty in inferences. Moreover, the assumption that families are unrelated with an identical family structure in a marginal model may not be satisfied for family studies with large extended families. The aim of this paper is to propose models incorporating marginal models approaches with a covariance structure for assessing population-based associations of diseases with their risk factors/covariates and estimating population characteristics for epidemiological studies while adjusting for the complicated relatedness among outcomes (continuous/categorical, normally/non-normally distributed) collected from large extended families. We also discuss theoretical issues of the proposed models and show that the proposed models and covariance structure are appropriate for and capable of achieving the aim.

  18. Cu and Zn in different stellar populations: Inferring their astrophysical origin

    NASA Astrophysics Data System (ADS)

    Bisterzo, S.; Pompeia, L.; Gallino, R.; Pignatari, M.; Cunha, K.; Heger, A.; Smith, V.

    2005-07-01

    Copper and Zinc behave differently in unevolved stars of various metallicities and stellar populations. Current hypotheses on the astrophysical origin of both elements are highly debated. It has been advanced in previous works ([Matteucci, F., Raiteri, C., Busso, M., Gallino, R. and Gratton, R. A&A 272 (1993) 421,Mishenina, T.V. et al. A&A 396 (2002) 189]) that most solar Cu and Zn were synthesized in Type la Supernovae, although present theory of SNIe explosions predicts very little contribution to both elements [Thielemann, F.-K., Nomoto, K. and Yokoi, K. A&A 158 (1986) 17]. We have collected a large sample of recent high-resolution spectroscopic observations of unevolved stars in the Galactic halo, thick-disk and thin-disk, in bulge-like stars, globular clusters, Omega Cen, and Dwarf Spheroidal systems. Then we compare spectroscopic observations of Cu and Zn with present stellar nucleosynthesis theory. Cu is the best signature of a secondary-like production in massive stars by neutron captures with a small primary contribution by explosive nucleosynthesis. Zn needs a more complex description. No need of extra contribution by SNIa is required.

  19. Fine-scale variation in meiotic recombination in Mimulus inferred from population shotgun sequencing.

    PubMed

    Hellsten, Uffe; Wright, Kevin M; Jenkins, Jerry; Shu, Shengqiang; Yuan, Yaowu; Wessler, Susan R; Schmutz, Jeremy; Willis, John H; Rokhsar, Daniel S

    2013-11-26

    Meiotic recombination rates can vary widely across genomes, with hotspots of intense activity interspersed among cold regions. In yeast, hotspots tend to occur in promoter regions of genes, whereas in humans and mice, hotspots are largely defined by binding sites of the positive-regulatory domain zinc finger protein 9. To investigate the detailed recombination pattern in a flowering plant, we use shotgun resequencing of a wild population of the monkeyflower Mimulus guttatus to precisely locate over 400,000 boundaries of historic crossovers or gene conversion tracts. Their distribution defines some 13,000 hotspots of varying strengths, interspersed with cold regions of undetectably low recombination. Average recombination rates peak near starts of genes and fall off sharply, exhibiting polarity. Within genes, recombination tracts are more likely to terminate in exons than in introns. The general pattern is similar to that observed in yeast, as well as in positive-regulatory domain zinc finger protein 9-knockout mice, suggesting that recombination initiation described here in Mimulus may reflect ancient and conserved eukaryotic mechanisms.

  20. Population bottleneck and effective size in Bonamia ostreae-resistant populations of Ostrea edulis as inferred by microsatellite markers.

    PubMed

    Launey, S; Barre, M; Gerard, A; Naciri-Graven, Y

    2001-12-01

    Genetic variability at five microsatellite loci was analysed in three hatchery-propagated populations of the flat oyster, Ostrea edulis. These populations were part of a selection programme for resistance to the protozoan parasite Bonamia ostreae and were produced by mass spawns, without control of the genealogy. Evidence for population bottlenecks and inbreeding was sought. A reduction in the number of alleles, mainly due to the loss of rare alleles, was observed in all selected populations, relative to the natural population from which they were derived. Heterozygote excesses were observed in two populations, and were attributed to substructuring of the population into a small number of families. Pedigree reconstruction showed that these two populations were produced by at most two spawning events involving a limited number of parents. Most individuals within these populations are half or full-sib, as shown by relatedness coefficients. The occurrence of population bottlenecks was supported by estimates of effective number of breeders derived by three methods: temporal variance in allelic frequencies, heterozygote excess, and a new method based on reduction in the number of alleles. The estimates from the different methods were consistent. The evidence for bottleneck and small effective number of breeders are expected to lead to increasing inbreeding, and have important consequences for the future management of the three O. edulis selected populations.

  1. Application of a time-dependent coalescence process for inferring the history of population size changes from DNA sequence data

    PubMed Central

    Polanski, Andrzej; Kimmel, Marek; Chakraborty, Ranajit

    1998-01-01

    Distribution of pairwise differences of nucleotides from data on a sample of DNA sequences from a given segment of the genome has been used in the past to draw inferences about the past history of population size changes. However, all earlier methods assume a given model of population size changes (such as sudden expansion), parameters of which (e.g., time and amplitude of expansion) are fitted to the observed distributions of nucleotide differences among pairwise comparisons of all DNA sequences in the sample. Our theory indicates that for any time-dependent population size, N(τ) (in which time τ is counted backward from present), a time-dependent coalescence process yields the distribution, p(τ), of the time of coalescence between two DNA sequences randomly drawn from the population. Prediction of p(τ) and N(τ) requires the use of a reverse Laplace transform known to be unstable. Nevertheless, simulated data obtained from three models of monotone population change (stepwise, exponential, and logistic) indicate that the pattern of a past population size change leaves its signature on the pattern of DNA polymorphism. Application of the theory to the published mtDNA sequences indicates that the current mtDNA sequence variation is not inconsistent with a logistic growth of the human population. PMID:9576903

  2. Large-scale atmospheric carbon and surface water dynamics inferred from satellite-based observations

    NASA Astrophysics Data System (ADS)

    Jensen, K.; McDonald, K. C.; Krakauer, N.; Schroeder, R.

    2013-12-01

    The sensitivity of Earth's wetlands to observed shifts in global precipitation and temperature patterns and their ability to produce large quantities of climate-active gases are key global change questions. Surface inundation is a crucial state variable that affects the rate of land-atmosphere carbon exchange and the partitioning of carbon between CO2 and CH4. Ground observation networks of large-scale inundation patterns are sparse because they require large fiscal, technological and human resources. Thus, satellite remote sensing products for global inundation dynamics, as well as total water storage and atmospheric carbon, can provide a complete synoptic view of past and current carbon - surface water dynamics over large areas that otherwise could not be assessed. We present results from a correlative analysis between spaceborne measurements of CO2 and CH4 as observed by SCIAMACHY and AIRS, water storage (derived from gravity anomalies provided by NASA's GRACE mission), and inundated water fraction derived from a combination of active and passive microwave remote sensing datasets. A general assessment is conducted globally, and further time-series analysis is focused on four regions of interest: North Amazon, Congo, Ob, and Ganges-Brahmaputra river basins. This analysis was supported by a grant from the NASA Terrestrial Ecology Program and the development of the inundation datasets was supported by the NASA MEaSUREs program.

  3. Argentine Population Genetic Structure: Large Variance in Amerindian Contribution

    PubMed Central

    Seldin, Michael F.; Tian, Chao; Shigeta, Russell; Scherbarth, Hugo R.; Silva, Gabriel; Belmont, John W.; Kittles, Rick; Gamron, Susana; Allevi, Alberto; Palatnik, Simon A.; Alvarellos, Alejandro; Paira, Sergio; Caprarulo, Cesar; Guillerón, Carolina; Catoggio, Luis J.; Prigione, Cristina; Berbotto, Guillermo A.; García, Mercedes A.; Perandones, Carlos E.; Pons-Estel, Bernardo A.; Alarcon-Riquelme, Marta E.

    2011-01-01

    Argentine population genetic structure was examined using a set of 78 ancestry informative markers (AIMs) to assess the contributions of European, Amerindian, and African ancestry in 94 individuals members of this population. Using the Bayesian clustering algorithm STRUCTURE, the mean European contribution was 78%, the Amerindian contribution was 19.4%, and the African contribution was 2.5%. Similar results were found using weighted least mean square method: European, 80.2%; Amerindian, 18.1%; and African, 1.7%. Consistent with previous studies the current results showed very few individuals (four of 94) with greater than 10% African admixture. Notably, when individual admixture was examined, the Amerindian and European admixture showed a very large variance and individual Amerindian contribution ranged from 1.5 to 84.5% in the 94 individual Argentine subjects. These results indicate that admixture must be considered when clinical epidemiology or case control genetic analyses are studied in this population. Moreover, the current study provides a set of informative SNPs that can be used to ascertain or control for this potentially hidden stratification. In addition, the large variance in admixture proportions in individual Argentine subjects shown by this study suggests that this population is appropriate for future admixture mapping studies. PMID:17177183

  4. Final Report: Large-Scale Optimization for Bayesian Inference in Complex Systems

    SciTech Connect

    Ghattas, Omar

    2013-10-15

    The SAGUARO (Scalable Algorithms for Groundwater Uncertainty Analysis and Robust Optimiza- tion) Project focuses on the development of scalable numerical algorithms for large-scale Bayesian inversion in complex systems that capitalize on advances in large-scale simulation-based optimiza- tion and inversion methods. Our research is directed in three complementary areas: efficient approximations of the Hessian operator, reductions in complexity of forward simulations via stochastic spectral approximations and model reduction, and employing large-scale optimization concepts to accelerate sampling. Our efforts are integrated in the context of a challenging testbed problem that considers subsurface reacting flow and transport. The MIT component of the SAGUARO Project addresses the intractability of conventional sampling methods for large-scale statistical inverse problems by devising reduced-order models that are faithful to the full-order model over a wide range of parameter values; sampling then employs the reduced model rather than the full model, resulting in very large computational savings. Results indicate little effect on the computed posterior distribution. On the other hand, in the Texas-Georgia Tech component of the project, we retain the full-order model, but exploit inverse problem structure (adjoint-based gradients and partial Hessian information of the parameter-to- observation map) to implicitly extract lower dimensional information on the posterior distribution; this greatly speeds up sampling methods, so that fewer sampling points are needed. We can think of these two approaches as "reduce then sample" and "sample then reduce." In fact, these two approaches are complementary, and can be used in conjunction with each other. Moreover, they both exploit deterministic inverse problem structure, in the form of adjoint-based gradient and Hessian information of the underlying parameter-to-observation map, to achieve their speedups.

  5. Overcoming autocorrelation biases for causal inference in large nonlinear geoscientific time series datasets

    NASA Astrophysics Data System (ADS)

    Runge, Jakob; Sejdinovic, Dino; Flaxman, Seth

    2017-04-01

    Causal discovery methods for geoscientific time series datasets aim at detecting potentially causal statistical associations that cannot be explained by other variables in the dataset. A large-scale complex system like the Earth presents major challenges for methods such as Granger causality. In particular, its high dimensionality and strong autocorrelations lead to low detection power, distorting biases, and unreliable hypothesis tests. Here we introduce a reliable method that outperforms current approaches in detection power and overcomes detection biases, making it suitable to detect even weak causal signals in large-scale geoscientific datasets. We illustrate the method's capabilities on the global surface-pressure system where we unravel spurious associations and find several potentially causal links that are difficult to detect with standard methods, focusing in particular on drivers of the NAO.

  6. Very Large Graphs for Information Extraction (VLG) Detection and Inference in the Presence of Uncertainty

    DTIC Science & Technology

    2015-09-21

    on a large supercomputing cluster . The outcomes of the experiments and data characterizations lead to the recommendation that future work use models...of clustering coefficient, PageRank and eigencentrality, as these are the most prominent differences among the several real datasets analyzed. iii...commodity computing cluster . In a second one-year study, MIT LL considered the impact of data uncertainty and corruption on detection performance using

  7. Inferring single-cell behaviour from large-scale epithelial sheet migration patterns.

    PubMed

    Lee, Rachel M; Yue, Haicen; Rappel, Wouter-Jan; Losert, Wolfgang

    2017-05-01

    Cell migration plays an important role in a wide variety of biological processes and can incorporate both individual cell motion and collective behaviour. The emergent properties of collective migration are receiving increasing attention as collective motion's role in diseases such as metastatic cancer becomes clear. Yet, how individual cell behaviour influences large-scale, multi-cell collective motion remains unclear. In this study, we provide insight into the mechanisms behind collective migration by studying cell migration in a spreading monolayer of epithelial MCF10A cells. We quantify migration using particle image velocimetry and find that cell groups have features of motion that span multiple length scales. Comparing our experimental results to a model of collective cell migration, we find that cell migration within the monolayer can be affected in qualitatively different ways by cell motion at the boundary, yet it is not necessary to introduce leader cells at the boundary or specify other large-scale features to recapitulate this large-scale phenotype in simulations. Instead, in our model, collective motion can be enhanced by increasing the overall activity of the cells or by giving the cells a stronger coupling between their motion and polarity. This suggests that investigating the activity and polarity persistence of individual cells will add insight into the collective migration phenotypes observed during development and disease. © 2017 The Author(s).

  8. Genetic diversity in the Homosporous Fern Ophioglossum vulgatum (Ophioglossaceae) from South Korea: inference of mating system and population history.

    PubMed

    Chung, Mi Yoon; López-Pujol, Jordi; Chung, Jae Min; Moon, Myung-Ok; Chung, Myong Gi

    2013-03-01

    It is generally believed that the members of Ophioglossaceae have subterranean, potentially bisexual gametophytes, which favor intragametophytic selfing. In Ophioglossaceae, previous allozyme studies revealed substantial inbreeding within Botrychium species and Mankyua chejuense. However, little is known about the mating system in species of the genus Ophioglossum. Molecular marker analyses can provide insights into the relative occurrence of selfing versus cross-fertilization in the species of Ophioglossum. We investigated allozyme variation in 8 Korean populations of the homosporous fern Ophioglossum vulgatum to infer its mating system and to get some insight into the population-establishment history in South Korea. We detected homozygous genotypes for alternative alleles at several loci, which suggest the occurrence of intragametophytic self-fertilization. Populations harbor low within-population variation (% P = 7.2, A = 1.08, and H (e) = 0.026) and a high among-population differentiation (F (ST) = 0.733). This, together with the finding that alternative alleles were fixed at several loci, suggests that the number and size of populations of O. vulgatum might have been severely reduced during the last glaciation (i.e., due to its in situ persistence in small, isolated refugia). The combined effects of severe random genetic drift and high rates of intragametophytic selfing are likely responsible for the genetic structure displayed by this homosporous fern. Its low levels of genetic diversity in South Korea justify the implementation of some conservation measures to ensure its long-term preservation.

  9. Recurrent large earthquakes in a fault region: What can be inferred from small and intermediate events?

    NASA Astrophysics Data System (ADS)

    Zoeller, G.; Hainzl, S.; Holschneider, M.

    2008-12-01

    We present a renewal model for the recurrence of large earthquakes in a fault zone consisting of a major fault and surrounding smaller faults with Gutenberg-Richter type seismicity represented by seismic moment release drawn from a truncated power-law distribution. The recurrence times of characteristic earthquakes for the major fault are explored. It is continuously loaded (plate motion) and undergoes positive and negative fluctuations due to adjacent smaller faults, with a large number Neq of such changes between two major earthquakes. Since the distribution has a finite variance, in the limit Neq→∞ the central limit theorem implies that the recurrence times follow a Brownian passage-time (BPT) distribution. This allows to calculate individual recurrence time distributions for specific fault zones without tuning free parameters: the mean recurrence time can be estimated from geological or paleoseismic data, and the standard deviation is determined from the frequency-size distribution, namely the Richter b value, of an earthquake catalog. The approach is demonstrated for the Parkfield segment of the San Andreas fault in California as well as for a long simulation of a numerical fault model. Assuming power-law distributed earthquake magnitudes up to the size of the recurrent Parkfield event (M=6), we find a coefficient of variation that is higher than the value obtained by a direct fit of the BPT distribution to seven large earthquakes. Finally we show that uncertainties in the earthquake magnitudes, e.g. from magnitude grouping, can cause a significant bias in the results. A method to correct for the bias as well as a Baysian technique to account for evolving data are provided.

  10. Recurrent large earthquakes in a fault region: What can be inferred from small and intermediate events?

    NASA Astrophysics Data System (ADS)

    Zöller, G.; Hainzl, S.; Holschneider, M.

    2009-04-01

    We present a renewal model for the recurrence of large earthquakes in a fault zone consisting of a major fault and surrounding smaller faults with Gutenberg-Richter type seismicity represented by seismic moment release drawn from a truncated power-law distribution. The recurrence times of characteristic earthquakes for the major fault are explored. It is continuously loaded (plate motion) and undergoes positive and negative fluctuations due to adjacent smaller faults, with a large number Neq of such changes between two major earthquakes. Since the distribution has a finite variance, in the limit Neq →ž the central limit theorem implies that the recurrence times follow a Brownian passage-time (BPT) distribution. This allows to calculate individual recurrence time distributions for specific fault zones without tuning free parameters: the mean recurrence time can be estimated from geological or paleoseismic data, and the standard deviation is determined from the frequency-size distribution, namely the Richter b value, of an earthquake catalog. The approach is demonstrated for the Parkfield segment of the San Andreas fault in California as well as for a long simulation of a numerical fault model. Assuming power-law distributed earthquake magnitudes up to the size of the recurrent Parkfield event (M = 6), we find a coefficient of variation that is higher than the value obtained by a direct fit of the BPT distribution to seven large earthquakes. Finally we show that uncertainties in the earthquake magnitudes, e.g. from magnitude grouping, can cause a significant bias in the results. A method to correct for the bias as well as a Baysian technique to account for evolving data are provided.

  11. Genetic ancestry of a Moroccan population as inferred from autosomal STRs

    PubMed Central

    Bentayebi, K.; Abada, F.; Ihzmad, H.; Amzazi, S.

    2014-01-01

    Detecting population substructure and ancestry is a critical issue for both association studies of health behaviors and forensic genetics. Determining aspects of a population's genetic history as potential sources of substructure can aid in design of future genetic studies. Within this context, fifteen autosomal short tandem repeat (STR), were used to examine population genetic structure and hypotheses of the origin of the modern Moroccan population from individuals belonging to three different ethnical groups from Morocco (Arab, Berber and Sahrawi), by comparing their autosomal STR variation with that of neighboring and non-neighboring populations in North Africa, Europe and Middle East as well as proposed ancestral populations in Morocco (Berber). We report on the results that the gradient of North African ancestry accounts for previous observations of low levels of sharing with Near East and a substantially increased gene flow especially from Morocco and Spain. PMID:25606427

  12. Assessment of hydration status in a large population.

    PubMed

    Baron, Stephanie; Courbebaisse, Marie; Lepicard, Eve M; Friedlander, Gerard

    2015-01-14

    Both acute and chronic dehydration can have important implications for human behaviour and health. Young children, non-autonomous individuals and the elderly are at a greater risk of dehydration. Mild hypertonic dehydration could be related to less efficient cognitive and physical performance and has been reported to be associated with frequently occurring pathological conditions, especially nephrolithiasis. The assessment of hydration status in a large sample appears to be of interest for conducting epidemiological and large clinical studies aimed at improving preventive and curative care. Especially in large-population studies, methods that are used have to be accurate, cheap, quick and require no technical expertise. Body weight change is widely used to determine acute hydration changes, but seems to be insufficiently accurate in longitudinal studies. Bioimpedance analysis methods enable the assessment of total body water content, but their use is still under debate. Because plasma osmolality directly reflects intracellular osmolality, it constitutes a good marker to assess acute hydration changes, but not chronic hydration status because it changes constantly. Moreover, venepuncture is considered to be invasive and is not suitable for a large-sample study, especially in children. Urinary markers appear to be good alternatives for assessing hydration status in large populations. Collection of urine samples is non-invasive and cheap. High technical expertise is not required to perform urinary marker measurements and these measurements can be carried out quickly. Thus, methods based on urinary markers are very well suited for field studies. Urine colour is probably the least sensitive marker despite its high specificity. Urine osmolality and especially urine specific gravity could be easily used for determining hydration status in large-sample studies.

  13. Genetic structure of Mesoamerican populations of Big-leaf mahogany (Swietenia macrophylla) inferred from microsatellite analysis.

    PubMed

    Novick, Rachel Roth; Dick, Christopher W; Lemes, Maristerra R; Navarro, Carlos; Caccone, Adalgisa; Bermingham, Eldredge

    2003-11-01

    While microsatellites have been used to examine genetic structure in local populations of Neotropical trees, genetic studies based on such high-resolution markers have not been carried out for Mesoamerica as a whole. Here we assess the genetic structure of the Mesoamerican mahogany Swietenia macrophylla King (big-leaf mahogany), a Neotropical tree species recently listed as endangered in CITES which is commercially extinct through much of its native range. We used seven variable microsatellite loci to assess genetic diversity and population structure in eight naturally established mahogany populations from six Mesoamerican countries. Measures of genetic differentiation (FST and RST) indicated significant differences between most populations. Unrooted dendrograms based on genetic distances between populations provide evidence of strong phylogeographic structure in Mesoamerican mahogany. The two populations on the Pacific coasts of Costa Rica and Panama were genetically distant from all the others, and from one another. The remaining populations formed two clusters, one comprised of the northern populations of Mexico, Belize and Guatemala and the other containing the southern Atlantic populations of Nicaragua and Costa Rica. Significant correlation was found between geographical distance and all pairwise measures of genetic divergence, suggesting the importance of regional biogeography and isolation by distance in Mesoamerican mahogany. The results of this study demonstrate greater phylogeographic structure than has been found across Amazon basin S. macrophylla. Our findings suggest a relatively complex Mesoamerican biogeographic history and lead to the prediction that other Central American trees will show similar patterns of regional differentiation.

  14. Hydrothermal fluid flow and deformation in large calderas: Inferences from numerical simulations

    USGS Publications Warehouse

    Hurwitz, S.; Christiansen, L.B.; Hsieh, P.A.

    2007-01-01

    Inflation and deflation of large calderas is traditionally interpreted as being induced by volume change of a discrete source embedded in an elastic or viscoelastic half-space, though it has also been suggested that hydrothermal fluids may play a role. To test the latter hypothesis, we carry out numerical simulations of hydrothermal fluid flow and poroelastic deformation in calderas by coupling two numerical codes: (1) TOUGH2 [Pruess et al., 1999], which simulates flow in porous or fractured media, and (2) BIOT2 [Hsieh, 1996], which simulates fluid flow and deformation in a linearly elastic porous medium. In the simulations, high-temperature water (350??C) is injected at variable rates into a cylinder (radius 50 km, height 3-5 km). A sensitivity analysis indicates that small differences in the values of permeability and its anisotropy, the depth and rate of hydrothermal injection, and the values of the shear modulus may lead to significant variations in the magnitude, rate, and geometry of ground surface displacement, or uplift. Some of the simulated uplift rates are similar to observed uplift rates in large calderas, suggesting that the injection of aqueous fluids into the shallow crust may explain some of the deformation observed in calderas.

  15. Characterization of Arctic Highly Magnetic Domains - the Geophysical Expression of Inferred Large Igneous Province(s)

    NASA Astrophysics Data System (ADS)

    Saltus, R. W.; Oakey, G.; Miller, E. L.; Jackson, R.

    2012-12-01

    The magnetic anomalies of the high arctic are dominated by a large domain (1000 x 1700 km; the High Arctic Magnetic High, HAMH) consisting of numerous high-amplitude magnetic high ridges with a complex set of orientations and by other smaller, but still fundamentally highly magnetic, domains. The magnetic potential anomaly field (also known as pseudogravity) of the HAMH shows a single large intensity high and underscores the crustal-scale thickness of this geophysical feature (which also forms a prominent anomaly on satellite magnetic maps). The seafloor morphology of this region includes the complex linear trends of the Alpha and Mendeleev ridges, but the magnetic expression of this domain extends beyond the complex bathymetry to include areas where Canada Basin sediments have covered the complex basement topography. The calculated magnetic effect of the bathymetric ridges matches some of the observed magnetic anomalies, but not others. We have analyzed and modeled the distinctive HAMH and other smaller magnetic high domains to generate estimates of their volume and to characterize the directionality of their component features. Complimentary processing and modeling of high arctic gravity anomalies allows characterization of the density component of these geophysical features. Spatially, the HAMH encompasses the Alpha and Mendeleev "ridges," that are considered to represent a major mafic igneous province. The term "Alpha-Mendeleev Large Igneous Province" is given to a domain mapped by tracing magnetic anomalies in a recent map published by AAPG (Grantz and others, 2009). On this map the province is described as "alkali basalt with ages between 120 and 90 Ma". New seismic and bathymetric data, collected as part of on-going research efforts for definition of extended continental shelf, are revealing new details about the Alpha ridge. One interesting development is the possible identification of a supervolcano that may represent a major locus of igneous activity. In

  16. From Coexpression to Coregulation: An Approach to Inferring Transcriptional Regulation Among Gene Classes from Large-Scale Expression Data

    NASA Technical Reports Server (NTRS)

    Mjolsness, Eric; Castano, Rebecca; Mann, Tobias; Wold, Barbara

    2000-01-01

    We provide preliminary evidence that existing algorithms for inferring small-scale gene regulation networks from gene expression data can be adapted to large-scale gene expression data coming from hybridization microarrays. The essential steps are (I) clustering many genes by their expression time-course data into a minimal set of clusters of co-expressed genes, (2) theoretically modeling the various conditions under which the time-courses are measured using a continuous-time analog recurrent neural network for the cluster mean time-courses, (3) fitting such a regulatory model to the cluster mean time courses by simulated annealing with weight decay, and (4) analysing several such fits for commonalities in the circuit parameter sets including the connection matrices. This procedure can be used to assess the adequacy of existing and future gene expression time-course data sets for determining transcriptional regulatory relationships such as coregulation.

  17. Large-scale controls of methanogenesis inferred from methane and gravity spaceborne data.

    PubMed

    Bloom, A Anthony; Palmer, Paul I; Fraser, Annemarie; Reay, David S; Frankenberg, Christian

    2010-01-15

    Wetlands are the largest individual source of methane (CH4), but the magnitude and distribution of this source are poorly understood on continental scales. We isolated the wetland and rice paddy contributions to spaceborne CH4 measurements over 2003-2005 using satellite observations of gravity anomalies, a proxy for water-table depth Gamma, and surface temperature analyses TS. We find that tropical and higher-latitude CH4 variations are largely described by Gamma and TS variations, respectively. Our work suggests that tropical wetlands contribute 52 to 58% of global emissions, with the remainder coming from the extra-tropics, 2% of which is from Arctic latitudes. We estimate a 7% rise in wetland CH4 emissions over 2003-2007, due to warming of mid-latitude and Arctic wetland regions, which we find is consistent with recent changes in atmospheric CH4.

  18. Estimating Small-area Populations by Age and Sex Using Spatial Interpolation and Statistical Inference Methods

    SciTech Connect

    Qai, Qiang; Rushton, Gerald; Bhaduri, Budhendra L; Bright, Eddie A; Coleman, Phil R

    2006-01-01

    The objective of this research is to compute population estimates by age and sex for small areas whose boundaries are different from those for which the population counts were made. In our approach, population surfaces and age-sex proportion surfaces are separately estimated. Age-sex population estimates for small areas and their confidence intervals are then computed using a binomial model with the two surfaces as inputs. The approach was implemented for Iowa using a 90 m resolution population grid (LandScan USA) and U.S. Census 2000 population. Three spatial interpolation methods, the areal weighting (AW) method, the ordinary kriging (OK) method, and a modification of the pycnophylactic method, were used on Census Tract populations to estimate the age-sex proportion surfaces. To verify the model, age-sex population estimates were computed for paired Block Groups that straddled Census Tracts and therefore were spatially misaligned with them. The pycnophylactic method and the OK method were more accurate than the AW method. The approach is general and can be used to estimate subgroup-count types of variables from information in existing administrative areas for custom-defined areas used as the spatial basis of support in other applications.

  19. Genetic relatedness and population differentiation of Himalayan hulless barley (Hordeum vulgare L.) landraces inferred with SSRs.

    PubMed

    Pandey, Madhav; Wagner, Carola; Friedt, Wolfgang; Ordon, Frank

    2006-08-01

    A set of 107 hulless barley (Hordeum vulgare L. subsp. vulgare) landraces originally collected from the highlands of Nepal along the Annapurna and Manaslu Himalaya range were studied for genetic relatedness and population differentiation using simple sequence repeats (SSRs). The 44 genome covering barley SSRs applied in this study revealed a high level of genetic diversity among the landraces (diversity index, DI = 0.536) tested. The genetic similarity (GS) based UPGMA clustering and Bayesian Model-based (MB) structure analysis revealed a complex genetic structure of the landraces. Eight genetically distinct populations were identified, of which seven were further studied for diversity and differentiation. The genetic diversity estimated for all and each population separately revealed a hot spot of genetic diversity at Pisang (DI = 0.559). The populations are fairly differentiated (theta = 0.433, R(ST) = 0.445) accounting for > 40% of the genetic variation among the populations. The pairwise population differentiation test confirmed that many of the geographic populations significantly differ from each other but that the differentiation is independent of the geographic distance (r = 0.224, P > 0.05). The high level of genetic diversity and complex population structure detected in Himalayan hulless barley landraces and the relevance of the findings are discussed.

  20. Population genetic divergence corresponds with species-level biodiversity patterns in the large genus Begonia.

    PubMed

    Hughes, M; Hollingsworth, P M

    2008-06-01

    Begonia is one of the largest angiosperm genera, containing over 1500 species. Some aspects of the distribution of biodiversity in the genus, such as the geographical restrictions of monophyletic groups, the rarity and morphological variability of widespread species, and a preponderance of narrow endemics, suggest that restricted gene flow may have been a factor in the formation of so many species. In order to investigate whether this inference based on large-scale patterns is supported by data at the population level, we examined the distribution of genetic variation within Begonia sutherlandii in the indigenous forests of Kwazulu-Natal, South Africa, using microsatellite markers. Despite the species being predominantly outbreeding, we found high and significant levels of population structure (standardized =F'ST= 0.896). Even within individual populations, there was evidence for clear differentiation of subpopulations. There is thus congruence in evolutionary patterns ranging from interspecific phylogeny, the distribution of individual species, to the levels of population differentiation. Despite this species-rich genus showing a pan-tropical distribution, these combined observations suggest that differentiation occurs over very local scales. Although strongly selected allelic variants can maintain species cohesion with only low levels of gene flow, we hypothesize that in Begonia, gene flow levels are often so low, that divergence in allopatry is likely to be a frequent occurrence, and the lack of widespread species may in part be attributable to a lack of a mechanism for holding them together.

  1. Population structure of Tor tor inferred from mitochondrial gene cytochrome b.

    PubMed

    Pasi, Komal Shyamakant; Lakra, W S; Bhatt, J P; Goswami, M; Malakar, A Kr

    2013-06-01

    Tor tor, commonly called as Tor mahseer, is a high-valued food and game fish endemic to trans-Himalayan region. Mitochondrial cytochrome b (cyt b) gene region of 967 bp was used to estimate the population structure of T. tor. Three populations of T. tor were collected from Narmada (Hosangabad), Ken (Madla), and Parbati river (Sheopur) in Madhya Pradesh, India. The sequence analysis revealed that the nucleotide diversity (π) was low, ranging from 0.000 to 0.0150. Haplotype diversity (h) ranged from 0.000 to 1.000. The analysis of molecular variance analysis indicated significant genetic divergence among the three populations of T. tor. Neighboring-joining tree also showed that all individuals from three populations clustered into three distinct clades. The data generated by cyt b marker revealed interesting insight about population structure of T. tor, which would serve as baseline data for conservation and management of mahseer fishery.

  2. Demographic Inference Reveals African and European Admixture in the North American Drosophila melanogaster Population

    PubMed Central

    Duchen, Pablo; Živković, Daniel; Hutter, Stephan; Stephan, Wolfgang; Laurent, Stefan

    2013-01-01

    Drosophila melanogaster spread from sub-Saharan Africa to the rest of the world colonizing new environments. Here, we modeled the joint demography of African (Zimbabwe), European (The Netherlands), and North American (North Carolina) populations using an approximate Bayesian computation (ABC) approach. By testing different models (including scenarios with continuous migration), we found that admixture between Africa and Europe most likely generated the North American population, with an estimated proportion of African ancestry of 15%. We also revisited the demography of the ancestral population (Africa) and found—in contrast to previous work—that a bottleneck fits the history of the population of Zimbabwe better than expansion. Finally, we compared the site-frequency spectrum of the ancestral population to analytical predictions under the estimated bottleneck model. PMID:23150605

  3. Surrogate population models for large-scale neural simulations.

    PubMed

    Tripp, Bryan P

    2015-06-01

    Because different parts of the brain have rich interconnections, it is not possible to model small parts realistically in isolation. However, it is also impractical to simulate large neural systems in detail. This article outlines a new approach to multiscale modeling of neural systems that involves constructing efficient surrogate models of populations. Given a population of neuron models with correlated activity and with specific, nonrandom connections, a surrogate model is constructed in order to approximate the aggregate outputs of the population. The surrogate model requires less computation than the neural model, but it has a clear and specific relationship with the neural model. For example, approximate spike rasters for specific neurons can be derived from a simulation of the surrogate model. This article deals specifically with neural engineering framework (NEF) circuits of leaky-integrate-and-fire point neurons. Weighted sums of spikes are modeled by interpolating over latent variables in the population activity, and linear filters operate on gaussian random variables to approximate spike-related fluctuations. It is found that the surrogate models can often closely approximate network behavior with orders-of-magnitude reduction in computational demands, although there are certain systematic differences between the spiking and surrogate models. Since individual spikes are not modeled, some simulations can be performed with much longer steps sizes (e.g., 20 ms). Possible extensions to non-NEF networks and to more complex neuron models are discussed.

  4. Inference on cancer screening exam accuracy using population-level administrative data.

    PubMed

    Jiang, H; Brown, P E; Walter, S D

    2016-01-15

    This paper develops a model for cancer screening and cancer incidence data, accommodating the partially unobserved disease status, clustered data structures, general covariate effects, and dependence between exams. The true unobserved cancer and detection status of screening participants are treated as latent variables, and a Markov Chain Monte Carlo algorithm is used to estimate the Bayesian posterior distributions of the diagnostic error rates and disease prevalence. We show how the Bayesian approach can be used to draw inferences about screening exam properties and disease prevalence while allowing for the possibility of conditional dependence between two exams. The techniques are applied to the estimation of the diagnostic accuracy of mammography and clinical breast examination using data from the Ontario Breast Screening Program in Canada.

  5. Response of human populations to large-scale emergencies

    NASA Astrophysics Data System (ADS)

    Bagrow, James; Wang, Dashun; Barabási, Albert-László

    2010-03-01

    Until recently, little quantitative data regarding collective human behavior during dangerous events such as bombings and riots have been available, despite its importance for emergency management, safety and urban planning. Understanding how populations react to danger is critical for prediction, detection and intervention strategies. Using a large telecommunications dataset, we study for the first time the spatiotemporal, social and demographic response properties of people during several disasters, including a bombing, a city-wide power outage, and an earthquake. Call activity rapidly increases after an event and we find that, when faced with a truly life-threatening emergency, information rapidly propagates through a population's social network. Other events, such as sports games, do not exhibit this propagation.

  6. Large-scale natural disturbance alters genetic population structure of the sailfin molly, Poecilia latipinna.

    PubMed

    Apodaca, Joseph J; Trexler, Joel C; Jue, Nathaniel K; Schrader, Matthew; Travis, Joseph

    2013-02-01

    Many inferences about contemporary rates of gene flow are based on the assumption that the observed genetic structure among populations is stable. Recent studies have uncovered several cases in which this assumption is tenuous. Most of those studies have focused on the effects that regular environmental fluctuations can have on genetic structure and gene flow patterns. Occasional catastrophic disturbances could also alter either the distribution of habitat or the spatial distribution of organisms in a way that affects population structure. However, evidence of such effects is sparse in the literature because it is difficult to obtain. Hurricanes, in particular, have the potential to exert dramatic effects on population structure of organisms found on islands or coral reefs or in near shore and coastal habitats. Here we draw on a historic genetic data set and new data to suggest that the genetic structure of sailfin molly (Poecilia latipinna) populations in north Florida was altered dramatically by an unusually large and uncommon type of storm surge associated with Hurricane Dennis in 2005. We compare the spatial pattern of genetic variation in these populations after Hurricane Dennis to the patterns described in an earlier study in this same area. We use comparable genetic data from another region of Florida, collected in the same two periods, to estimate the amount of change expected from typical temporal variation in population structure. The comparative natural history of sailfin mollies in these two regions indicates that the change in population structure produced by the storm surge is not the result of many local extinctions with recolonization from a few refugia but emerged from a pattern of mixing and redistribution.

  7. Inferring possible population divergence in Espeletia pycnophylla (Asteraceae) through morphometric and paleogeographic approaches.

    PubMed

    Benavides, Felipe; Burbano, Jorge; Burbano, Diana; Prieto, Rodrigo; Torres, Carlos

    2010-12-01

    Abstract: The phenotypic structure within and between plant populations is generally influenced by their distribution patterns in space and time; therefore, the study of their divergence is a central issue for the understanding of their microevolutive processes. We boarded the hypothesis that three populations of Espeletia pycnophylla show phenotypic divergence as one of the possible implications of their geographic isolation in the Southern Colombian Andes. We used the Elliptic Fourier Descriptors (leaf shape) and traditional leaf morphometry (leaf size) of 347 leaves to measure inter and intra-population variation and a comparison between a paleogeographic reconstruction with an actual estimate of the distribution areas of E. pycnophylla in order to identify their main changes during the last 14 000 years. The three populations showed significant differences in leaf morphometry and a positive correlation between the matrices of morphometric and geographic dissimilarities, indicating that the inter-population divergence increases between further populations, so that the morphometric structure reflects their spatial distribution. The geographical and paleogeographical estimates evidenced a conspicuous process of reduction and fragmentation of the distribution area of E. pycnophylla since the Late-Glacial until the Holocene. We suggest that these results support possible scenarios of vicariance events, which allow us to approach the divergence of these populations in terms of their historic biogeographic relations. However, genetic analyses are still needed to support these results.

  8. Effect of Drought on Herbivore-Induced Plant Gene Expression: Population Comparison for Range Limit Inferences.

    PubMed

    Gill, Gunbharpur Singh; Haugen, Riston; Matzner, Steven L; Barakat, Abdelali; Siemens, David H

    2016-03-11

    Low elevation "trailing edge" range margin populations typically face increases in both abiotic and biotic stressors that may contribute to range limit development. We hypothesize that selection may act on ABA and JA signaling pathways for more stable expression needed for range expansion, but that antagonistic crosstalk prevents their simultaneous co-option. To test this hypothesis, we compared high and low elevation populations of Boechera stricta that have diverged with respect to constitutive levels of glucosinolate defenses and root:shoot ratios; neither population has high levels of both traits. If constraints imposed by antagonistic signaling underlie this divergence, one would predict that high constitutive levels of traits would coincide with lower plasticity. To test this prediction, we compared the genetically diverged populations in a double challenge drought-herbivory growth chamber experiment. Although a glucosinolate defense response to the generalist insect herbivore Spodoptera exigua was attenuated under drought conditions, the plastic defense response did not differ significantly between populations. Similarly, although several potential drought tolerance traits were measured, only stomatal aperture behavior, as measured by carbon isotope ratios, was less plastic as predicted in the high elevation population. However, RNAseq results on a small subset of plants indicated differential expression of relevant genes between populations as predicted. We suggest that the ambiguity in our results stems from a weaker link between the pathways and the functional traits compared to transcripts.

  9. Effect of Drought on Herbivore-Induced Plant Gene Expression: Population Comparison for Range Limit Inferences

    PubMed Central

    Gill, Gunbharpur Singh; Haugen, Riston; Matzner, Steven L.; Barakat, Abdelali; Siemens, David H.

    2016-01-01

    Low elevation “trailing edge” range margin populations typically face increases in both abiotic and biotic stressors that may contribute to range limit development. We hypothesize that selection may act on ABA and JA signaling pathways for more stable expression needed for range expansion, but that antagonistic crosstalk prevents their simultaneous co-option. To test this hypothesis, we compared high and low elevation populations of Boechera stricta that have diverged with respect to constitutive levels of glucosinolate defenses and root:shoot ratios; neither population has high levels of both traits. If constraints imposed by antagonistic signaling underlie this divergence, one would predict that high constitutive levels of traits would coincide with lower plasticity. To test this prediction, we compared the genetically diverged populations in a double challenge drought-herbivory growth chamber experiment. Although a glucosinolate defense response to the generalist insect herbivore Spodoptera exigua was attenuated under drought conditions, the plastic defense response did not differ significantly between populations. Similarly, although several potential drought tolerance traits were measured, only stomatal aperture behavior, as measured by carbon isotope ratios, was less plastic as predicted in the high elevation population. However, RNAseq results on a small subset of plants indicated differential expression of relevant genes between populations as predicted. We suggest that the ambiguity in our results stems from a weaker link between the pathways and the functional traits compared to transcripts. PMID:27135233

  10. Decline of recent seabirds inferred from a composite 1000-year record of population dynamics

    PubMed Central

    Xu, Liqiang; Liu, Xiaodong; Wu, Libin; Sun, Liguang; Zhao, Jinjun; Chen, Lin

    2016-01-01

    Based on three ornithogenic sediment profiles and seabird subfossils therein from the Xisha Islands, South China Sea, the relative population size of seabirds over the past 1000 years was reconstructed using reflectance spectrum. Here we present an apparent increase and subsequent decline of seabirds on these islands in the South China Sea. Seabird populations peaked during the Little Ice Age (LIA, 1400–1850 AD), implying that the cool climate during the LIA appears to have been more favorable to seabirds on the Xisha Islands in the South China Sea. Climate change partly explains the recent decrease in seabird populations over the past 150 years, but the significant decline and almost complete disappearance thereof on most of the Xisha Islands is probably attributable to human disturbance. Our study reveals the increasing impact of anthropogenic activities on seabird population in recent times. PMID:27748366

  11. Decline of recent seabirds inferred from a composite 1000-year record of population dynamics

    NASA Astrophysics Data System (ADS)

    Xu, Liqiang; Liu, Xiaodong; Wu, Libin; Sun, Liguang; Zhao, Jinjun; Chen, Lin

    2016-10-01

    Based on three ornithogenic sediment profiles and seabird subfossils therein from the Xisha Islands, South China Sea, the relative population size of seabirds over the past 1000 years was reconstructed using reflectance spectrum. Here we present an apparent increase and subsequent decline of seabirds on these islands in the South China Sea. Seabird populations peaked during the Little Ice Age (LIA, 1400–1850 AD), implying that the cool climate during the LIA appears to have been more favorable to seabirds on the Xisha Islands in the South China Sea. Climate change partly explains the recent decrease in seabird populations over the past 150 years, but the significant decline and almost complete disappearance thereof on most of the Xisha Islands is probably attributable to human disturbance. Our study reveals the increasing impact of anthropogenic activities on seabird population in recent times.

  12. Constraints on Intervening Stellar Populations toward the Large Magellanic Cloud

    NASA Astrophysics Data System (ADS)

    Zaritsky, Dennis; Shectman, Stephen A.; Thompson, Ian; Harris, Jason; Lin, D. N. C.

    1999-05-01

    The suggestion by Zaritsky & Lin (ZL) that a vertical extension of the red clump feature (the VRC) in color-magnitude diagrams (CMDs) of the Large Magellanic Cloud is consistent with a significant population of foreground stars to the LMC that could account for the observed microlensing optical depth has been challenged by various investigators. We respond by (1) examining each of the challenges presented, to determine whether any or all of those arguments invalidate the claims made by ZL, and (2) presenting new photometric and spectroscopic data obtained in an attempt to resolve this issue. We systematically discuss why the objections raised so far do not unequivocally refute ZL's claim. We conclude that although the CMD data do not mandate the existence of a foreground population, they are entirely consistent with a foreground population associated with the LMC that contributes significantly (~50%) to the observed microlensing optical depth. From our new data, we conclude that <~40% of the VRC stars are young, massive red clump stars, because (1) synthetic CMDs created using the star formation history derived independently from Hubble Space Telescope data suggest that fewer than 50% of the VRC stars are young, massive red clump stars, (2) the angular distribution of the VRC stars is more uniform than that of the young (<1 Gyr) main-sequence stars, and (3) the velocity dispersion of the VRC stars in the region of the LMC examined by ZL, 18.4+/-2.8 km s^-1 (95% confidence limits), is inconsistent with the expectation for a young disk population. Each of these arguments is predicated on assumptions, and the conclusions are uncertain. Therefore, an exact determination of the contribution to the microlensing optical depth by the various hypothesized foreground populations, and the subsequent conclusions regarding the existence of halo MACHOs, requires a detailed knowledge of many complex astrophysical issues, such as the initial mass function, star formation history

  13. Large-scale Hydrology Inferred From Grace Estimates of Time-variable Gravity

    NASA Astrophysics Data System (ADS)

    Swenson, S.; Wahr, J.; Milly, P. C. D.

    In recent years, a number of techniques for remotely sensing the Earth's surface have been developed, including active and passive radiometers, synthetic aperture radar, and radar and laser altimeters. These instruments have produced measurements of a wide range of phenomena at temporal and spatial scales that are not feasible by in-situ methods. Surface soil moisture, one of the principal components of terrestrial water storage, can be measured by microwave techniques; however, only the moisture in the top few centimeters of soil can be detected. Sub-surface soil moisture and groundwater are thus far undetectable by radiation based remote sensing techniques. Measurements of changes in surface gravity are indicative of changes in the sur- rounding water column. U.S. Geological Survey scientists use repeat surface gravity measurements to monitor groundwater recharge. Surface in-situ measurement is not a practical approach for water storage estimation at large length scales, because of the high spatial variability of terrestrial water storage. Satellite-based gravity mea- surement, however, promises to provide estimates of continental groundwater change, averaged over regions of a few hundred km and larger. The GRACE satellite mission, scheduled for launch at the end of 2001, is expected to deliver global estimates of the Earth's gravity field approximately every 30 days. After removing the effects of the at- mosphere and oceans, GRACE gravity solutions can be inverted to determine changes in continental water storage. We describe methods of extracting the water storage signal from simulated hydrolog- ical gravity solutions constructed using a land surface model based on monthly global precipitation records. Spatial averaging kernels were created to isolate the gravity sig- nal of individual drainage basins without contamination from surrounding hydrologi- cal or oceanic gravity signals. We then estimated the probable accuracy of averaging kernels for basins of

  14. Accounting for imperfect detection is critical for inferring marine turtle nesting population trends.

    PubMed

    Pfaller, Joseph B; Bjorndal, Karen A; Chaloupka, Milani; Williams, Kristina L; Frick, Michael G; Bolten, Alan B

    2013-01-01

    Assessments of population trends based on time-series counts of individuals are complicated by imperfect detection, which can lead to serious misinterpretations of data. Population trends of threatened marine turtles worldwide are usually based on counts of nests or nesting females. We analyze 39 years of nest-count, female-count, and capture-mark-recapture (CMR) data for nesting loggerhead turtles (Caretta caretta) on Wassaw Island, Georgia, USA. Annual counts of nests and females, not corrected for imperfect detection, yield significant, positive trends in abundance. However, multistate open robust design modeling of CMR data that accounts for changes in imperfect detection reveals that the annual abundance of nesting females has remained essentially constant over the 39-year period. The dichotomy could result from improvements in surveys or increased within-season nest-site fidelity in females, either of which would increase detection probability. For the first time in a marine turtle population, we compare results of population trend analyses that do and do not account for imperfect detection and demonstrate the potential for erroneous conclusions. Past assessments of marine turtle population trends based exclusively on count data should be interpreted with caution and re-evaluated when possible. These concerns apply equally to population assessments of all species with imperfect detection.

  15. Genetic structure and gene flow among Komodo dragon populations inferred by microsatellite loci analysis.

    PubMed

    Ciofi, C; Bruford, M W

    1999-12-01

    A general concern for the conservation of endangered species is the maintenance of genetic variation within populations, particularly when they become isolated and reduced in size. Estimates of gene flow and effective population size are therefore important for any conservation initiative directed to the long-term persistence of a species in its natural habitat. In the present study, 10 microsatellite loci were used to assess the level of genetic variability among populations of the Komodo dragon Varanus komodoensis. Effective population size was calculated and gene flow estimates were compared with palaeogeographic data in order to assess the degree of vulnerability of four island populations. Rinca and Flores, currently separated by an isthmus of about 200 m, retained a high level of genetic diversity and showed a high degree of genetic similarity, with gene flow values close to one migrant per generation. The island of Komodo showed by far the highest levels of genetic divergence, and its allelic distinctiveness was considered of great importance in the maintenance of genetic variability within the species. A lack of distinct alleles and low levels of gene flow and genetic variability were found for the small population of Gili Motang island, which was identified as vulnerable to stochastic threats. Our results are potentially important for both the short- and long-term management of the Komodo dragon, and are critical in view of future re-introduction or augmentation in areas where the species is now extinct or depleted.

  16. Genetic diversity in fragmented populations of Populus talassica inferred from microsatellites: implications for conservation.

    PubMed

    Zhu, X H; Cheng, S P; Liao, T; Kang, X Y

    2016-05-25

    Populus talassica Kom. is an ecologically important species endemic to central Asia. In China, its main distribution is restricted to the Ili region in the Xinjiang Autonomous Region. An understanding of genetic diversity and population structure is crucial for the development of a feasible conservation strategy. Twenty-six high-level simple sequence repeat (SSR) markers were screened and used to genotype 220 individuals from three native populations. A high level of genetic diversity and low population differentiation were revealed. We identified 163 alleles, with a mean of 6.269 alleles per locus. The observed and expected heterozygosities ranged from 0.472 to 0.485 (with a mean of 0.477), and from 0.548 to 0.591 (mean 0.569), respectively. Analysis of molecular variance revealed 93% variation within populations and 7% among populations. A model-based population structure analysis divided P. talassica into two groups (optimal K = 2). These genetic data provide crucial insight for conservation management.

  17. Inferring the population structure and demographic history of the tick, Amblyomma americanum Linnaeus.

    PubMed

    Mixson, Tonya R; Lydy, Shari L; Dasch, Gregory A; Real, Leslie A

    2006-06-01

    A hierarchial population genetic study was conducted on 703 individual Amblyomma americanum from nine populations in Georgia, U.S.A. Populations were sampled from the Coastal Plain, midland Piedmont region, and the upper Piedmont region. Twenty-nine distinct haplotypes were found. A minimum spanning tree was constructed that indicated these haplotypes comprised two lineages, the root of which was distinctly star-like. The majority of the variation found was among ticks within each population, indicating high amounts of gene flow and little genetic differentiation between the three regions. An overall F(ST) value of 0.006 supported the lack of genetic structuring between collection sites in Georgia. Mantel regression analysis revealed no isolation by distance. Signatures of population expansion were detected in the shapes of the mismatch distribution and tests of neutrality. The absence of genetic differentiation combined with the rejection of the null model of isolation by distance may indicate recent range expansion in Georgia or insufficient time to reach an equilibrium where genetic drift may have affected allele frequencies. Alternatively, the high degree of panmixia found within A. americanum in Georgia may be due to bird-mediated dispersal of ticks increasing the genetic similarity between geographically separated populations.

  18. Infering the Relation of Hydrometeorological Variability on the Durance Watershed (southeastern France) to Large Scale Circulation from Anatem Reconstructed Series

    NASA Astrophysics Data System (ADS)

    Fossa, M.; Mathevet, T.; Gailhard, J.; Massei, N.

    2015-12-01

    Understanding large spatio-temporal hydrometeorological variabilities is critical in the present context of climate change. Large scale information analyses require long and numerous times series as input data. It is often met with difficulty because good quality time series are scarce and often not available over a large area. Reconstructions offer an interesting alternative to alleviate this problem. An original reconstruction method for rainfall and temperature called ANATEM has been developed by Electricité de France in 2013 (Kuentz et al., 2015) combining both a nearby time series (TEM) and a climate field (i.e: geopotential height)(ANA) as predictors. By using large scale information, this method should allow improving on the TEM regression model both in spatial and temporal dimensions. ANATEM was used to reconstruct daily rainfall time series from 25 stations of the Durance watershed in South of France, spanning 1883-2010. This study focused on extracting the large scale information contained in the reconstructed series. Wavelet analyses were used to break down the signal and extract its long-term component (out of 4 different time scales) while composite map analyses enabled to show the links between mean rainfall over the durance and climate fields in the Euro-Atlantic sector. The study showed that ANATEM reconstruction can indeed improve on long term/large scale reconstructions and thus that reconstructions can be used to infer climate processes. Wavelet Multiresolution analysis over the Durance watershed showed a dip in long-term rainfall from 1950 to the end of the 20th century. Composite analysis revealed that rainfall variation (from low to high rainfall) over the Durance watershed is mainly associated with transition from positive NAO-like pattern to negative NAO-like one. The spatial large scale information shows a strong variability with season. In summer, large scale forcings seem less apparent. Long term oscillations showed distinct spatio

  19. Phylogeography and population structure of the red stingray, Dasyatis akajei inferred by mitochondrial control region.

    PubMed

    Li, Ning; Chen, Xiao; Sun, Dianrong; Song, Na; Lin, Qin; Gao, Tianxiang

    2015-08-01

    The red stingray Dasyatis akajei is distributed in both marine and freshwater, but little is known about its phylogeography and population structure. We sampled 107 individuals from one freshwater region and 6 coastal localities within the distribution range of D. akajei. Analyses of the first hypervariable region of mitochondrial DNA control region of 474 bp revealed only 17 polymorphism sites that defined 28 haplotypes, with no unique haplotype for the freshwater population. A high level of haplotype diversity and low nucleotide diversity were observed in both marine (h = 0.9393 ± 0.0104, π = 0.0069 ± 0.0040) and freshwater populations (h = 0.8333 ± 0.2224, π = 0.0084 ± 0.0063). Significant level of genetic structure was detected between four marine populations (TZ, WZ, ND and ZZ) via both hierarchical molecular variance analysis (AMOVA) and pairwise FST (with two exceptions), which is unusual for elasmobranchs detected previously over such short geographical distance. However, limited sampling suggested that the freshwater population was not particularly distinct (p > 0.05), but additional samples would be needed to confirm it. Demersal and slow-moving characters likely have contributed to the genetically heterogeneous population structure. The demographic history of D. akajei examined by mismatch distribution analyses, neutrality tests and Bayesian skyline analyses suggested a sudden population expansion dating to upper Pleistocene. The information on genetic diversity and genetic structure will have implications for the management of fisheries and conservation efforts.

  20. Study of large and highly stratified population datasets by combining iterative pruning principal component analysis and structure.

    PubMed

    Limpiti, Tulaya; Intarapanich, Apichart; Assawamakin, Anunchai; Shaw, Philip J; Wangkumhang, Pongsakorn; Piriyapongsa, Jittima; Ngamphiw, Chumpol; Tongsima, Sissades

    2011-06-23

    The ever increasing sizes of population genetic datasets pose great challenges for population structure analysis. The Tracy-Widom (TW) statistical test is widely used for detecting structure. However, it has not been adequately investigated whether the TW statistic is susceptible to type I error, especially in large, complex datasets. Non-parametric, Principal Component Analysis (PCA) based methods for resolving structure have been developed which rely on the TW test. Although PCA-based methods can resolve structure, they cannot infer ancestry. Model-based methods are still needed for ancestry analysis, but they are not suitable for large datasets. We propose a new structure analysis framework for large datasets. This includes a new heuristic for detecting structure and incorporation of the structure patterns inferred by a PCA method to complement STRUCTURE analysis. A new heuristic called EigenDev for detecting population structure is presented. When tested on simulated data, this heuristic is robust to sample size. In contrast, the TW statistic was found to be susceptible to type I error, especially for large population samples. EigenDev is thus better-suited for analysis of large datasets containing many individuals, in which spurious patterns are likely to exist and could be incorrectly interpreted as population stratification. EigenDev was applied to the iterative pruning PCA (ipPCA) method, which resolves the underlying subpopulations. This subpopulation information was used to supervise STRUCTURE analysis to infer patterns of ancestry at an unprecedented level of resolution. To validate the new approach, a bovine and a large human genetic dataset (3945 individuals) were analyzed. We found new ancestry patterns consistent with the subpopulations resolved by ipPCA. The EigenDev heuristic is robust to sampling and is thus superior for detecting structure in large datasets. The application of EigenDev to the ipPCA algorithm improves the estimation of the number

  1. Study of large and highly stratified population datasets by combining iterative pruning principal component analysis and structure

    PubMed Central

    2011-01-01

    Background The ever increasing sizes of population genetic datasets pose great challenges for population structure analysis. The Tracy-Widom (TW) statistical test is widely used for detecting structure. However, it has not been adequately investigated whether the TW statistic is susceptible to type I error, especially in large, complex datasets. Non-parametric, Principal Component Analysis (PCA) based methods for resolving structure have been developed which rely on the TW test. Although PCA-based methods can resolve structure, they cannot infer ancestry. Model-based methods are still needed for ancestry analysis, but they are not suitable for large datasets. We propose a new structure analysis framework for large datasets. This includes a new heuristic for detecting structure and incorporation of the structure patterns inferred by a PCA method to complement STRUCTURE analysis. Results A new heuristic called EigenDev for detecting population structure is presented. When tested on simulated data, this heuristic is robust to sample size. In contrast, the TW statistic was found to be susceptible to type I error, especially for large population samples. EigenDev is thus better-suited for analysis of large datasets containing many individuals, in which spurious patterns are likely to exist and could be incorrectly interpreted as population stratification. EigenDev was applied to the iterative pruning PCA (ipPCA) method, which resolves the underlying subpopulations. This subpopulation information was used to supervise STRUCTURE analysis to infer patterns of ancestry at an unprecedented level of resolution. To validate the new approach, a bovine and a large human genetic dataset (3945 individuals) were analyzed. We found new ancestry patterns consistent with the subpopulations resolved by ipPCA. Conclusions The EigenDev heuristic is robust to sampling and is thus superior for detecting structure in large datasets. The application of EigenDev to the ipPCA algorithm

  2. Genetic diversity and population structure of Kazakh horses (Equus caballus) inferred from mtDNA sequences.

    PubMed

    Gemingguli, M; Iskhan, K R; Li, Y; Qi, A; Wunirifu, W; Ding, L Y; Wumaierjiang, A

    2016-10-05

    The Kazakh horse is an important old horse breed in Xinjiang. They have contributed greatly to the breeding and improvement of other local horse breeds, yet their genetic diversity and population structure are not well understood. In the present study, we evaluated the genetic diversity of Kazakh horses and their relationship with other horse breeds using the mtDNA D-loop region, Cyt b gene, and a DNA fragment (nps 7974-9963, containing COX3, tRNA-Gly, ND3, and tRNA-Arg). A total of 130 Kazakh horses from 8 populations in China and Kazakhstan were analyzed. A total of 88 haplotypes (haplotype diversity: 0.9895) were identified, in which 3 haplotypes were shared by groups in the two countries. In a median-joining network, 6 haplogroups were found, in which most haplogroups included haplotypes from different populations. Neighbor-joining analysis revealed similar results in that haplotypes in different populations were admixed in most of the 6 clusters. In conclusion, a high level of genetic diversity was found in the Kazakh horses. However, no clear correspondence between haplogroups and geographic origin and no significant differentiation between populations in the two countries were observed. This might have resulted from the frequent contact between the two countries through the Silk Road in the past, or due to long-term outcrossing and hybridization with the introduced horses.

  3. Genetic diversity and population structure in Bactrocera correcta (Diptera: Tephritidae) inferred from mtDNA cox1 and microsatellite markers

    PubMed Central

    Qin, Yu-Jia; Buahom, Nopparat; Krosch, Matthew N.; Du, Yu; Wu, Yi; Malacrida, Anna R.; Deng, Yu-Liang; Liu, Jia-Qi; Jiang, Xiao-Long; Li, Zhi-Hong

    2016-01-01

    Bactrocera correcta is one of the most destructive pests of horticultural crops in tropical and subtropical regions. Despite the economic risk, the population genetics of this pest have remained relatively unexplored. This study explores population genetic structure and contemporary gene flow in B. correcta in Chinese Yunnan Province and attempts to place observed patterns within the broader geographical context of the species’ total range. Based on combined data from mtDNA cox1 sequences and 12 microsatellite loci obtained from 793 individuals located in 7 countries, overall genetic structuring was low. The expansion history of this species, including likely human-mediated dispersal, may have played a role in shaping the observed weak structure. The study suggested a close relationship between Yunnan Province and adjacent countries, with evidence for Western and/or Southern Yunnan as the invasive origin of B. correcta within Yunnan Province. The information gleaned from this analysis of gene flow and population structure has broad implications for quarantine, trade and management of this pest, especially in China where it is expanding northward. Future studies should concentrate effort on sampling South Asian populations, which would enable better inferences of the ancestral location of B. correcta and its invasion history into and throughout Asia. PMID:27929126

  4. Joint inference of identity by descent along multiple chromosomes from population samples.

    PubMed

    Zheng, Chaozhi; Kuhner, Mary K; Thompson, Elizabeth A

    2014-03-01

    There has been much interest in detecting genomic identity by descent (IBD) segments from modern dense genetic marker data and in using them to identify human disease susceptibility loci. Here we present a novel Bayesian framework using Markov chain Monte Carlo (MCMC) realizations to jointly infer IBD states among multiple individuals not known to be related, together with the allelic typing error rate and the IBD process parameters. The data are phased single nucleotide polymorphism (SNP) haplotypes. We model changes in latent IBD state along homologous chromosomes by a continuous time Markov model having the Ewens sampling formula as its stationary distribution. We show by simulation that this model for the IBD process fits quite well with the coalescent predictions. Using simulation data sets of 40 haplotypes over regions of 1 and 10 million base pairs (Mbp), we show that the jointly estimated IBD states are very close to the true values, although the presence of linkage disequilibrium decreases the accuracy. We also present comparisons with the ibd_haplo program, which estimates IBD among sets of four haplotypes. Our new IBD detection method focuses on the scale between genome-wide methods using simple IBD models and complex coalescent-based methods that are limited to short genome segments. At the scale of a few Mbp, our approach offers potentially more power for fine-scale IBD association mapping.

  5. Triallelic Population Genomics for Inferring Correlated Fitness Effects of Same Site Nonsynonymous Mutations.

    PubMed

    Ragsdale, Aaron P; Coffman, Alec J; Hsieh, PingHsun; Struck, Travis J; Gutenkunst, Ryan N

    2016-05-01

    The distribution of mutational effects on fitness is central to evolutionary genetics. Typical univariate distributions, however, cannot model the effects of multiple mutations at the same site, so we introduce a model in which mutations at the same site have correlated fitness effects. To infer the strength of that correlation, we developed a diffusion approximation to the triallelic frequency spectrum, which we applied to data from Drosophila melanogaster We found a moderate positive correlation between the fitness effects of nonsynonymous mutations at the same codon, suggesting that both mutation identity and location are important for determining fitness effects in proteins. We validated our approach by comparing it to biochemical mutational scanning experiments, finding strong quantitative agreement, even between different organisms. We also found that the correlation of mutational fitness effects was not affected by protein solvent exposure or structural disorder. Together, our results suggest that the correlation of fitness effects at the same site is a previously overlooked yet fundamental property of protein evolution.

  6. Genetic Polymorphism of Aedes albopictus Population Inferred From ND5 Gene Variabilities In Subang Jaya, Malaysia.

    PubMed

    Adilah-Amrannudin, Nurul; Hamsidi, Mayamin; Ismail, Nurul-Ain; Ismail, Rodziah; Dom, Nazri Che; Ahmad, Abu Hassan; Mastuki, Mohd Fahmi; Basri, Tengku Shahrul Anuar Tengku Ahmad; Khalid, Adira; Muslim, Mohammad; Daud, Nurul Amalina Ahmad; Camalxaman, Siti Nazrina

    2016-12-01

    This study was performed to establish the genetic variability of Aedes albopictus within Subang Jaya, Selangor, Malaysia, by using the nicotinamide adenine dinucleotide dehydrogenase 5 subunit (ND5) mitochondrial DNA (mtDNA) marker. A total of 90 samples were collected from 9 localities within an area of the Subang Jaya Municipality. Genetic variability was determined through the amplification and sequencing of a fragment of the ND5 gene. Eight distinct mtDNA haplotypes were identified. The evolutionary relationship of the local haplotypes alongside 28 reference strains was used to construct a phylogram, the analysis of which revealed low genetic differentiation in terms of both nucleotide and haplotype diversity. Bayesian method was used to infer the phylogenetic tree, revealing a unique relationship between local isolates. The study corroborates the reliability of ND5 to identify distinct lineages for polymorphism-based studies and supplements the existing body of knowledge regarding its genetic diversity. This in turn could potentially aid existing vector control strategies to help mitigate the risk and spread of the dengue virus.

  7. Working toward a synthesis of archaeological, linguistic, and genetic data for inferring African population history

    PubMed Central

    Scheinfeldt, Laura B.; Soi, Sameer; Tishkoff, Sarah A.

    2010-01-01

    Although Africa is the origin of modern humans, the pattern and distribution of genetic variation and correlations with cultural and linguistic diversity in Africa have been understudied. Recent advances in genomic technology, however, have led to genomewide studies of African samples. In this article, we discuss genetic variation in African populations contextualized with what is known about archaeological and linguistic variation. What emerges from this review is the importance of using independent lines of evidence in the interpretation of genetic and genomic data in the reconstruction of past population histories. PMID:20445100

  8. Large Magellanic Cloud Planetary Nebula Morphology: Probing Stellar Populations and Evolution.

    PubMed

    Stanghellini; Shaw; Balick; Blades

    2000-05-10

    Planetary nebulae (PNe) in the Large Magellanic Cloud (LMC) offer the unique opportunity to study both the population and evolution of low- and intermediate-mass stars, by means of the morphological type of the nebula. Using observations from our LMC PN morphological survey, and including images available in the Hubble Space Telescope Data Archive and published chemical abundances, we find that asymmetry in PNe is strongly correlated with a younger stellar population, as indicated by the abundance of elements that are unaltered by stellar evolution (Ne, Ar, and S). While similar results have been obtained for Galactic PNe, this is the first demonstration of the relationship for extragalactic PNe. We also examine the relation between morphology and abundance of the products of stellar evolution. We found that asymmetric PNe have higher nitrogen and lower carbon abundances than symmetric PNe. Our two main results are broadly consistent with the predictions of stellar evolution if the progenitors of asymmetric PNe have on average larger masses than the progenitors of symmetric PNe. The results bear on the question of formation mechanisms for asymmetric PNe-specifically, that the genesis of PNe structure should relate strongly to the population type, and by inference the mass, of the progenitor star and less strongly on whether the central star is a member of a close binary system.

  9. Inferences on pathogenic fungus population structures from microsatellite data: new insights from spatial genetics approaches.

    PubMed

    Rieux, A; Halkett, F; de Lapeyre de Bellaire, L; Zapater, M-F; Rousset, F; Ravigne, V; Carlier, J

    2011-04-01

    Landscape genetics, which combines population genetics, landscape ecology and spatial statistics, has emerged recently as a new discipline that can be used to assess how landscape features or environmental variables can influence gene flow and spatial genetic variation. We applied this approach to the invasive plant pathogenic fungus Mycosphaerella fijiensis, which causes black leaf streak disease of banana. Around 880 isolates were sampled within a 50 × 50 km area located in a fragmented banana production zone in Cameroon that includes several potential physical barriers to gene flow. Two clustering algorithms and a new F(ST) -based procedure were applied to define the number of genetic entities and their spatial domain without a priori assumptions. Two populations were clearly delineated, and the genetic discontinuity appeared sharp but asymmetric. Interestingly, no landscape features matched this genetic discontinuity, and no isolation by distance (IBD) was found within populations. Our results suggest that the genetic structure observed in this production area reflects the recent history of M. fijiensis expansion in Cameroon rather than resulting from contemporary gene flow. Finally, we discuss the influence of the suspected high effective population size for such an organism on (i) the absence of an IBD signal, (ii) the characterization of contemporary gene-flow events through assignation methods of analysis and (iii) the evolution of the genetic discontinuity detected in this study.

  10. Population structure and genotypic variation of Crataegus pontica inferred by molecular markers.

    PubMed

    Rahmani, Mohammad-Shafie; Shabanian, Naghi; Khadivi-Khub, Abdollah; Woeste, Keith E; Badakhshan, Hedieh; Alikhani, Leila

    2015-11-01

    Information about the natural patterns of genetic variability and their evolutionary bases are of fundamental practical importance for sustainable forest management and conservation. In the present study, the genetic diversity of 164 individuals from fourteen natural populations of Crataegus pontica K.Koch was assessed for the first time using three genome-based molecular techniques; inter-retrotransposon amplified polymorphism (IRAP); inter-simple sequence repeats (ISSR) and start codon targeted (SCoT) polymorphism. IRAP, ISSR and SCoT analyses yielded 126, 254 and 199 scorable amplified bands, respectively, of which 90.48, 93.37 and 83.78% were polymorphic. ISSR revealed efficiency over IRAP and SCoT due to high effective multiplex ratio, marker index and resolving power. The dendrograms based on the markers used and combined data divided individuals into three major clusters. The correlation between the coefficient matrices for the IRAP, ISSR and SCoT data was significant. A higher level of genetic variation was observed within populations than among populations based on the markers used. The lower divergence levels depicted among the studied populations could be seen as evidence of gene flow. The promotion of gene exchange will be very beneficial to conserve and utilize the enormous genetic variability. Copyright © 2015 Elsevier B.V. All rights reserved.

  11. Population genetic analysis infers mMigration pathways of Phytophthora ramorum in US nurseries

    Treesearch

    Erica M. Goss; Meg Larsen; Gary A. Chastagner; Donald R. Givens; Niklaus J. Grünwald; Barbara Jane Howlett

    2009-01-01

    Recently introduced, exotic plant pathogens may exhibit low genetic diversity and be limited to clonal reproduction. However, rapidly mutating molecular markers such as microsatellites can reveal genetic variation within these populations and be used to model putative migration patterns. Phytophthora ramorum is the exotic pathogen, discovered in...

  12. Temporal pattern of africanization in a feral honeybee population from Texas inferred from mitochondrial DNA.

    PubMed

    Pinto, M Alice; Rubink, William L; Coulson, Robert N; Patton, John C; Johnston, J Spencer

    2004-05-01

    The invasion of Africanized honeybees (Apis mellifera L.) in the Americas provides a window of opportunity to study the dynamics of secondary contact of subspecies of bees that evolved in allopatry in ecologically distinctive habitats of the Old World. We report here the results of an 11-year mitochondrial DNA survey of a feral honeybee population from southern United States (Texas). The mitochondrial haplotype (mitotype) frequencies changed radically during the 11-year study period. Prior to immigration of Africanized honeybees, the resident population was essentially of eastern and western European maternal ancestry. Three years after detection of the first Africanized swarm there was a mitotype turnover in the population from predominantly eastern European to predominantly A. m. scutellata (ancestor of Africanized honeybees). This remarkable change in the mitotype composition coincided with arrival of the parasitic mite Varroa destructor, which was likely responsible for severe losses experienced by colonies of European ancestry. From 1997 onward the population stabilized with most colonies of A. m. scutellata maternal origin.

  13. Genome-Wide SNP Discovery, Genotyping and Their Preliminary Applications for Population Genetic Inference in Spotted Sea Bass (Lateolabrax maculatus)

    PubMed Central

    Wang, Juan; Xue, Dong-Xiu; Zhang, Bai-Dong; Li, Yu-Long; Liu, Bing-Jian; Liu, Jin-Xian

    2016-01-01

    Next-generation sequencing and the collection of genome-wide single-nucleotide polymorphisms (SNPs) allow identifying fine-scale population genetic structure and genomic regions under selection. The spotted sea bass (Lateolabrax maculatus) is a non-model species of ecological and commercial importance and widely distributed in northwestern Pacific. A total of 22 648 SNPs was discovered across the genome of L. maculatus by paired-end sequencing of restriction-site associated DNA (RAD-PE) for 30 individuals from two populations. The nucleotide diversity (π) for each population was 0.0028±0.0001 in Dandong and 0.0018±0.0001 in Beihai, respectively. Shallow but significant genetic differentiation was detected between the two populations analyzed by using both the whole data set (FST = 0.0550, P < 0.001) and the putatively neutral SNPs (FST = 0.0347, P < 0.001). However, the two populations were highly differentiated based on the putatively adaptive SNPs (FST = 0.6929, P < 0.001). Moreover, a total of 356 SNPs representing 298 unique loci were detected as outliers putatively under divergent selection by FST-based outlier tests as implemented in BAYESCAN and LOSITAN. Functional annotation of the contigs containing putatively adaptive SNPs yielded hits for 22 of 55 (40%) significant BLASTX matches. Candidate genes for local selection constituted a wide array of functions, including binding, catalytic and metabolic activities, etc. The analyses with the SNPs developed in the present study highlighted the importance of genome-wide genetic variation for inference of population structure and local adaptation in L. maculatus. PMID:27336696

  14. Statistical Inference

    NASA Astrophysics Data System (ADS)

    Khan, Shahjahan

    Often scientific information on various data generating processes are presented in the from of numerical and categorical data. Except for some very rare occasions, generally such data represent a small part of the population, or selected outcomes of any data generating process. Although, valuable and useful information is lurking in the array of scientific data, generally, they are unavailable to the users. Appropriate statistical methods are essential to reveal the hidden "jewels" in the mess of the row data. Exploratory data analysis methods are used to uncover such valuable characteristics of the observed data. Statistical inference provides techniques to make valid conclusions about the unknown characteristics or parameters of the population from which scientifically drawn sample data are selected. Usually, statistical inference includes estimation of population parameters as well as performing test of hypotheses on the parameters. However, prediction of future responses and determining the prediction distributions are also part of statistical inference. Both Classical or Frequentists and Bayesian approaches are used in statistical inference. The commonly used Classical approach is based on the sample data alone. In contrast, increasingly popular Beyesian approach uses prior distribution on the parameters along with the sample data to make inferences. The non-parametric and robust methods are also being used in situations where commonly used model assumptions are unsupported. In this chapter,we cover the philosophical andmethodological aspects of both the Classical and Bayesian approaches.Moreover, some aspects of predictive inference are also included. In the absence of any evidence to support assumptions regarding the distribution of the underlying population, or if the variable is measured only in ordinal scale, non-parametric methods are used. Robust methods are employed to avoid any significant changes in the results due to deviations from the model

  15. Statistical Inference

    NASA Astrophysics Data System (ADS)

    Khan, Shahjahan

    Often scientific information on various data generating processes are presented in the from of numerical and categorical data. Except for some very rare occasions, generally such data represent a small part of the population, or selected outcomes of any data generating process. Although, valuable and useful information is lurking in the array of scientific data, generally, they are unavailable to the users. Appropriate statistical methods are essential to reveal the hidden “jewels” in the mess of the row data. Exploratory data analysis methods are used to uncover such valuable characteristics of the observed data. Statistical inference provides techniques to make valid conclusions about the unknown characteristics or parameters of the population from which scientifically drawn sample data are selected. Usually, statistical inference includes estimation of population parameters as well as performing test of hypotheses on the parameters. However, prediction of future responses and determining the prediction distributions are also part of statistical inference. Both Classical or Frequentists and Bayesian approaches are used in statistical inference. The commonly used Classical approach is based on the sample data alone. In contrast, increasingly popular Beyesian approach uses prior distribution on the parameters along with the sample data to make inferences. The non-parametric and robust methods are also being used in situations where commonly used model assumptions are unsupported. In this chapter,we cover the philosophical andmethodological aspects of both the Classical and Bayesian approaches.Moreover, some aspects of predictive inference are also included. In the absence of any evidence to support assumptions regarding the distribution of the underlying population, or if the variable is measured only in ordinal scale, non-parametric methods are used. Robust methods are employed to avoid any significant changes in the results due to deviations from the model

  16. Limited connectivity and a phylogeographic break characterize populations of the pink anemonefish, Amphiprion perideraion, in the Indo-Malay Archipelago: inferences from a mitochondrial and microsatellite loci

    PubMed Central

    Dohna, Tina A; Timm, Janne; Hamid, Lemia; Kochzius, Marc

    2015-01-01

    To enhance the understanding of larval dispersal in marine organisms, species with a sedentary adult stage and a pelagic larval phase of known duration constitute ideal candidates, because inferences can be made about the role of larval dispersal in population connectivity. Members of the immensely diverse marine fauna of the Indo-Malay Archipelago are of particular importance in this respect, as biodiversity conservation is becoming a large concern in this region. In this study, the genetic population structure of the pink anemonefish, Amphiprion perideraion, is analyzed by applying 10 microsatellite loci as well as sequences of the mitochondrial control region to also allow for a direct comparison of marker-derived results. Both marker systems detected a strong overall genetic structure (ΦST = 0.096, P < 0.0001; mean Dest = 0.17; FST = 0.015, P < 0.0001) and best supported regional groupings (ΦCT = 0.199 P < 0.0001; FCT = 0.018, P < 0.001) that suggested a differentiation of the Java Sea population from the rest of the archipelago. Differentiation of a New Guinea group was confirmed by both markers, but disagreed over the affinity of populations from west New Guinea. Mitochondrial data suggest higher connectivity among populations with fewer signals of regional substructure than microsatellite data. Considering the homogenizing effect of only a few migrants per generation on genetic differentiation between populations, marker-specific results have important implications for conservation efforts concerning this and similar species. PMID:25937914

  17. Comparative population genomics: power and principles for the inference of functionality

    PubMed Central

    Lawrie, David S.; Petrov, Dmitri A.

    2014-01-01

    The availability of sequenced genomes from multiple related organisms allows the detection and localization of functional genomic elements based on the idea that such elements evolve more slowly than neutral sequences. Although such comparative genomics methods have proven useful in discovering functional elements and ascertaining levels of functional constraint in the genome as a whole, here we outline limitations intrinsic to this approach that cannot be overcome by sequencing more species. We argue that it is essential to supplement comparative genomics with ultra-deep sampling of populations from closely related species to enable substantially more powerful genomic scans for functional elements. The convergence of sequencing technology and population genetics theory has made such projects feasible and has exciting implications for functional genomics. PMID:24656563

  18. Comparative population genomics: power and principles for the inference of functionality.

    PubMed

    Lawrie, David S; Petrov, Dmitri A

    2014-04-01

    The availability of sequenced genomes from multiple related organisms allows the detection and localization of functional genomic elements based on the idea that such elements evolve more slowly than neutral sequences. Although such comparative genomics methods have proven useful in discovering functional elements and ascertaining levels of functional constraint in the genome as a whole, here we outline limitations intrinsic to this approach that cannot be overcome by sequencing more species. We argue that it is essential to supplement comparative genomics with ultra-deep sampling of populations from closely related species to enable substantially more powerful genomic scans for functional elements. The convergence of sequencing technology and population genetics theory has made such projects feasible and has exciting implications for functional genomics. Copyright © 2014 Elsevier Ltd. All rights reserved.

  19. Population maintenance among tropical reef fishes: Inferences from small-island endemics

    PubMed Central

    Robertson, D. Ross

    2001-01-01

    To what extent do local populations of tropical reef fishes persist through the recruitment of pelagic larvae to their natal reef? Endemics from small, isolated islands can help answer that question by indicating whether special biological attributes are needed for long-term survival under enforced localization in high-risk situations. Taxonomically and biologically, the endemics from seven such islands are broadly representative of their regional faunas. As natal-site recruitment occurs among reef fishes in much less isolated situations, these characteristics of island endemics indicate that a wide range of reef fishes could have persistent self-sustaining local populations. Because small islands regularly support substantial reef fish faunas, regional systems of small reserves could preserve much of the diversity of these fishes. PMID:11331752

  20. REJECTOR: software for population history inference from genetic data via a rejection algorithm

    PubMed Central

    Jobin, Matthew J.; Mountain, Joanna L.

    2008-01-01

    Summary: We introduce REJECTOR, software for parameter estimation and comparison of alternate models of population history from genetic data via a rejection algorithm. Through coalescent simulation, REJECTOR generates numerous gene genealogies, and hence simulated data, under a model of population history specified by the user. Summary statistics derived from such simulated data are compared with observed statistics, leading to acceptance or rejection of a given set of parameter values. We performed tests of the software using known parameter values in order to assess the inferential power provided by each summary statistic. The tests demonstrate the precision and accuracy of estimation made possible using this approach. Availability: http://www.rejector.org Contact: mjobin@stanford.edu Supplementary Information: Supplementary data are available at http://www.rejector.org/guide.pdf PMID:18936052

  1. Inference of population structure of purebred dairy and beef cattle using high-density genotype data.

    PubMed

    Kelleher, M M; Berry, D P; Kearney, J F; McParland, S; Buckley, F; Purfield, D C

    2017-01-01

    Information on the genetic diversity and population structure of cattle breeds is useful when deciding the most optimal, for example, crossbreeding strategies to improve phenotypic performance by exploiting heterosis. The present study investigated the genetic diversity and population structure of the most prominent dairy and beef breeds used in Ireland. Illumina high-density genotypes (777 962 single nucleotide polymorphisms; SNPs) were available on 4623 purebred bulls from nine breeds; Angus (n=430), Belgian Blue (n=298), Charolais (n=893), Hereford (n=327), Holstein-Friesian (n=1261), Jersey (n=75), Limousin (n=943), Montbéliarde (n=33) and Simmental (n=363). Principal component analysis revealed that Angus, Hereford, and Jersey formed non-overlapping clusters, representing distinct populations. In contrast, overlapping clusters suggested geographical proximity of origin and genetic similarity between Limousin, Simmental and Montbéliarde and to a lesser extent between Holstein, Friesian and Belgian Blue. The observed SNP heterozygosity averaged across all loci was 0.379. The Belgian Blue had the greatest mean observed heterozygosity (HO=0.389) among individuals within breed while the Holstein-Friesian and Jersey populations had the lowest mean heterozygosity (HO=0.370 and 0.376, respectively). The correlation between the genomic-based and pedigree-based inbreeding coefficients was weak (r=0.171; P<0.001). Mean genomic inbreeding estimates were greatest for Jersey (0.173) and least for Hereford (0.051). The pair-wise breed fixation index (F st) ranged from 0.049 (Limousin and Charolais) to 0.165 (Hereford and Jersey). In conclusion, substantial genetic variation exists among breeds commercially used in Ireland. Thus custom-mating strategies would be successful in maximising the exploitation of heterosis in crossbreeding strategies.

  2. Population genomics of C. melanopterus using target gene capture data: demographic inferences and conservation perspectives

    PubMed Central

    Maisano Delser, Pierpaolo; Corrigan, Shannon; Hale, Matthew; Li, Chenhong; Veuille, Michel; Planes, Serge; Naylor, Gavin; Mona, Stefano

    2016-01-01

    Population genetics studies on non-model organisms typically involve sampling few markers from multiple individuals. Next-generation sequencing approaches open up the possibility of sampling many more markers from fewer individuals to address the same questions. Here, we applied a target gene capture method to deep sequence ~1000 independent autosomal regions of a non-model organism, the blacktip reef shark (Carcharhinus melanopterus). We devised a sampling scheme based on the predictions of theoretical studies of metapopulations to show that sampling few individuals, but many loci, can be extremely informative to reconstruct the evolutionary history of species. We collected data from a single deme (SID) from Northern Australia and from a scattered sampling representing various locations throughout the Indian Ocean (SCD). We explored the genealogical signature of population dynamics detected from both sampling schemes using an ABC algorithm. We then contrasted these results with those obtained by fitting the data to a non-equilibrium finite island model. Both approaches supported an Nm value ~40, consistent with philopatry in this species. Finally, we demonstrate through simulation that metapopulations exhibit greater resilience to recent changes in effective size compared to unstructured populations. We propose an empirical approach to detect recent bottlenecks based on our sampling scheme. PMID:27651217

  3. Single nucleotide polymorphism coverage and inference of N-acetyltransferase-2 acetylator phenotypes in wordwide population groups.

    PubMed

    Suarez-Kurtz, Guilherme; Fuchshuber-Moraes, Mateus; Struchiner, Claudio J; Parra, Esteban J

    2016-08-01

    Several algorithms have been proposed to reduce the genotyping effort and cost, while retaining the accuracy of N-acetyltransferase-2 (NAT2) phenotype prediction. Data from the 1000 Genomes (1KG) project and an admixed cohort of Black Brazilians were used to assess the accuracy of NAT2 phenotype prediction using algorithms based on paired single nucleotide polymorphisms (SNPs) (rs1041983 and rs1801280) or a tag SNP (rs1495741). NAT2 haplotypes comprising SNPs rs1801279, rs1041983, rs1801280, rs1799929, rs1799930, rs1208 and rs1799931 were assigned according to the arylamine N-acetyltransferases database. Contingency tables were used to visualize the agreement between the NAT2 acetylator phenotypes on the basis of these haplotypes versus phenotypes inferred by the prediction algorithms. The paired and tag SNP algorithms provided more than 96% agreement with the 7-SNP derived phenotypes in Europeans, East Asians, South Asians and Admixed Americans, but discordance of phenotype prediction occurred in 30.2 and 24.8% 1KG Africans and in 14.4 and 18.6% Black Brazilians, respectively. Paired SNP panel misclassification occurs in carriers of NATs haplotypes *13A (282T alone), *12B (282T and 803G), *6B (590A alone) and *14A (191A alone), whereas haplotype *14, defined by the 191A allele, is the major culprit of misclassification by the tag allele. Both the paired SNP and the tag SNP algorithms may be used, with economy of scale, to infer NAT2 acetylator phenotypes, including the ultra-slow phenotype, in European, East Asian, South Asian and American populations represented in the 1KG cohort. Both algorithms, however, perform poorly in populations of predominant African descent, including admixed African-Americans, African Caribbeans and Black Brazilians.

  4. Spatially explicit models for inference about density in unmarked or partially marked populations

    USGS Publications Warehouse

    Chandler, Richard B.; Royle, J. Andrew

    2013-01-01

    Recently developed spatial capture–recapture (SCR) models represent a major advance over traditional capture–recapture (CR) models because they yield explicit estimates of animal density instead of population size within an unknown area. Furthermore, unlike nonspatial CR methods, SCR models account for heterogeneity in capture probability arising from the juxtaposition of animal activity centers and sample locations. Although the utility of SCR methods is gaining recognition, the requirement that all individuals can be uniquely identified excludes their use in many contexts. In this paper, we develop models for situations in which individual recognition is not possible, thereby allowing SCR concepts to be applied in studies of unmarked or partially marked populations. The data required for our model are spatially referenced counts made on one or more sample occasions at a collection of closely spaced sample units such that individuals can be encountered at multiple locations. Our approach includes a spatial point process for the animal activity centers and uses the spatial correlation in counts as information about the number and location of the activity centers. Camera-traps, hair snares, track plates, sound recordings, and even point counts can yield spatially correlated count data, and thus our model is widely applicable. A simulation study demonstrated that while the posterior mean exhibits frequentist bias on the order of 5–10% in small samples, the posterior mode is an accurate point estimator as long as adequate spatial correlation is present. Marking a subset of the population substantially increases posterior precision and is recommended whenever possible. We applied our model to avian point count data collected on an unmarked population of the northern parula (Parula americana) and obtained a density estimate (posterior mode) of 0.38 (95% CI: 0.19–1.64) birds/ha. Our paper challenges sampling and analytical conventions in ecology by demonstrating

  5. Interrogating multiple aspects of variation in a full resequencing data set to infer human population size changes

    PubMed Central

    Voight, Benjamin F.; Adams, Alison M.; Frisse, Linda A.; Qian, Yudong; Hudson, Richard R.; Di Rienzo, Anna

    2005-01-01

    We present an expanded data set of 50 unlinked autosomal noncoding regions, resequenced in samples of Hausa from Cameroon, Italians, and Chinese. We use these data to make inferences about human demographic history by using a technique that combines multiple aspects of genetic data, including levels of polymorphism, the allele frequency spectrum, and linkage disequilibrium. We explore an extensive range of demographic parameters and demonstrate that our method of combining multiple aspects of the data results in a significant reduction of the compatible parameter space. In agreement with previous reports, we find that the Hausa data are compatible with demographic equilibrium as well as a set of recent population expansion models. In contrast to the Hausa, when multiple aspects of the data are considered jointly, the non-Africans depart from an equilibrium model of constant population size and are compatible with a range of simple bottleneck models, including a 50–90% reduction in effective population size occurring some time after the appearance of modern humans in Africa 160,000–120,000 years ago. PMID:16352722

  6. Large scale structure of the globular cluster population in Coma

    NASA Astrophysics Data System (ADS)

    Gagliano, Alexander T.; O'Neill, Conor; Madrid, Juan P.

    2016-01-01

    A search for globular cluster candidates in the Coma Cluster was carried out using Hubble Space Telescope data taken with the Advanced Camera for Surveys. We combine different observing programs including the Coma Treasury Survey in order to obtain the large scale distribution of globular clusters in Coma. Globular cluster candidates were selected through careful morphological inspection and a detailed analysis of their magnitude and colors in the two available wavebands, F475W (Sloan g) and F814W (I). Color Magnitude Diagrams, radial density plots and density maps were then created to characterize the globular cluster population in Coma. Preliminary results show the structure of the intergalactic globular cluster system throughout Coma, among the largest globular clusters catalogues to date. The spatial distribution of globular clusters shows clear overdensities, or bridges, between Coma galaxies. It also becomes evident that galaxies of similar luminosity have vastly different numbers of associated globular clusters.

  7. Critical behavior of large maximally informative neural populations

    NASA Astrophysics Data System (ADS)

    Berkowitz, John; Sharpee, Tatyana

    We consider maximally informative encoding of scalar signals by neural populations. In a small time window, neural responses are binary, with spiking probability that follows a sigmoidal tuning curve. The width of the tuning curve represents effective noise in neural transmission. Previous analyses of this problem for relatively small numbers of neurons with identical noise parameters indicated the presence of multiple bifurcations that occurred with decreasing noise value. For very high noise values, maximal information is achieved when all neurons have the same threshold values. With decreasing noise, the threshold values split into two or more groups via a series of bifurcations, until finally each neuron has a different threshold. Analyzing this problem in the large N limit, we found instead that there is a single phase transition from redundant coding to coding based on distributed thresholds. The order parameter of this transition is the threshold standard deviation across the population; differences in noise parameter from the mean are analogous to local magnetic fields. Near the bifurcation point, information transmitted follows a Landau expansion. We use this expansion to quantify the scaling of the order parameter with noise and effective magnetic field. NSF CAREER Award IIS-1254123, NSF Ideas Lab Collaborative Research IOS 1556388.

  8. Bayesian Inference of Galaxy Morphology

    NASA Astrophysics Data System (ADS)

    Yoon, Ilsang; Weinberg, M.; Katz, N.

    2011-01-01

    Reliable inference on galaxy morphology from quantitative analysis of ensemble galaxy images is challenging but essential ingredient in studying galaxy formation and evolution, utilizing current and forthcoming large scale surveys. To put galaxy image decomposition problem in broader context of statistical inference problem and derive a rigorous statistical confidence levels of the inference, I developed a novel galaxy image decomposition tool, GALPHAT (GALaxy PHotometric ATtributes) that exploits recent developments in Bayesian computation to provide full posterior probability distributions and reliable confidence intervals for all parameters. I will highlight the significant improvements in galaxy image decomposition using GALPHAT, over the conventional model fitting algorithms and introduce the GALPHAT potential to infer the statistical distribution of galaxy morphological structures, using ensemble posteriors of galaxy morphological parameters from the entire galaxy population that one studies.

  9. Phylogeographic Structure in Anastrepha ludens (Diptera: Tephritidae) Populations Inferred With mtDNA Sequencing.

    PubMed

    Ruiz-Arce, Raul; Owen, Christopher L; Thomas, Donald B; Barr, Norman B; McPheron, Bruce A

    2015-06-01

    Anastrepha ludens (Loew) (Diptera: Tephritidae), the Mexican fruit fly, is a major pest of citrus and mango. It has a wide distribution in Mexico and Central America, with infestations occurring in Texas, California, and Florida with origins believed to have been centered in northeastern Mexico. This research evaluates the utility of a sequence-based approach for two mitochondrial (COI and ND6) gene regions. We use these markers to examine genetic diversity, estimate population structure, and identify diagnostic information for A. ludens populations. We analyzed 543 individuals from 67 geographic collections and found one predominant haplotype occurring in the majority of specimens. We observed 68 haplotypes in all and see differences among haplotypes belonging to northern and southern collections. Mexico haplotypes differ by few bases possibly as a result of a recent bottleneck event. In contrast to the hypothesis suggesting northeastern Mexico as the origin of this species, we see that specimens from two southern collections show high genetic variability delineating three mitochondrial groups. These data suggest that Central America is the origin for A. ludens. We show that COI and ND6 are useful for phylogeographic studies of A. ludens. Published by Oxford University Press on behalf of Entomological Society of America 2015. This work is written by US Government employees and is in the public domain in the US.

  10. Bayesian coalescent inference reveals high evolutionary rates and diversification of Zika virus populations.

    PubMed

    Fajardo, Alvaro; Soñora, Martín; Moreno, Pilar; Moratorio, Gonzalo; Cristina, Juan

    2016-10-01

    Zika virus (ZIKV) is a member of the family Flaviviridae. In 2015, ZIKV triggered an epidemic in Brazil and spread across Latin America. By May of 2016, the World Health Organization warns over spread of ZIKV beyond this region. Detailed studies on the mode of evolution of ZIKV strains are extremely important for our understanding of the emergence and spread of ZIKV populations. In order to gain insight into these matters, a Bayesian coalescent Markov Chain Monte Carlo analysis of complete genome sequences of recently isolated ZIKV strains was performed. The results of these studies revealed a mean rate of evolution of 1.20 × 10(-3) nucleotide substitutions per site per year (s/s/y) for ZIKV strains enrolled in this study. Several variants isolated in China are grouped together with all strains isolated in Latin America. Another genetic group composed exclusively by Chinese strains were also observed, suggesting the co-circulation of different genetic lineages in China. These findings indicate a high level of diversification of ZIKV populations. Strains isolated from microcephaly cases do not share amino acid substitutions, suggesting that other factors besides viral genetic differences may play a role for the proposed pathogenesis caused by ZIKV infection. J. Med. Virol. 88:1672-1676, 2016. © 2016 Wiley Periodicals, Inc.

  11. Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase.

    PubMed Central

    Clark, A G; Weiss, K M; Nickerson, D A; Taylor, S L; Buchanan, A; Stengård, J; Salomaa, V; Vartiainen, E; Perola, M; Boerwinkle, E; Sing, C F

    1998-01-01

    Allelic variation in 9.7 kb of genomic DNA sequence from the human lipoprotein lipase gene (LPL) was scored in 71 healthy individuals (142 chromosomes) from three populations: African Americans (24) from Jackson, MS; Finns (24) from North Karelia, Finland; and non-Hispanic Whites (23) from Rochester, MN. The sequences had a total of 88 variable sites, with a nucleotide diversity (site-specific heterozygosity) of .002+/-.001 across this 9.7-kb region. The frequency spectrum of nucleotide variation exhibited a slight excess of heterozygosity, but, in general, the data fit expectations of the infinite-sites model of mutation and genetic drift. Allele-specific PCR helped resolve linkage phases, and a total of 88 distinct haplotypes were identified. For 1,410 (64%) of the 2,211 site pairs, all four possible gametes were present in these haplotypes, reflecting a rich history of past recombination. Despite the strong evidence for recombination, extensive linkage disequilibrium was observed. The number of haplotypes generally is much greater than the number expected under the infinite-sites model, but there was sufficient multisite linkage disequilibrium to reveal two major clades, which appear to be very old. Variation in this region of LPL may depart from the variation expected under a simple, neutral model, owing to complex historical patterns of population founding, drift, selection, and recombination. These data suggest that the design and interpretation of disease-association studies may not be as straightforward as often is assumed. PMID:9683608

  12. ARG-walker: inference of individual specific strengths of meiotic recombination hotspots by population genomics analysis

    PubMed Central

    2015-01-01

    Background Meiotic recombination hotspots play important roles in various aspects of genomics, but the underlying mechanisms for regulating the locations and strengths of recombination hotspots are not yet fully revealed. Most existing algorithms for estimating recombination rates from sequence polymorphism data can only output average recombination rates of a population, although there is evidence for the heterogeneity in recombination rates among individuals. For genome-wide association studies (GWAS) of recombination hotspots, an efficient algorithm that estimates the individualized strengths of recombination hotspots is highly desirable. Results In this work, we propose a novel graph mining algorithm named ARG-walker, based on random walks on ancestral recombination graphs (ARG), to estimate individual-specific recombination hotspot strengths. Extensive simulations demonstrate that ARG-walker is able to distinguish the hot allele of a recombination hotspot from the cold allele. Integrated with output of ARG-walker, we performed GWAS on the phased haplotype data of the 22 autosome chromosomes of the HapMap Asian population samples of Chinese and Japanese (JPT+CHB). Significant cis-regulatory signals have been detected, which is corroborated by the enrichment of the well-known 13-mer motif CCNCCNTNNCCNC of PRDM9 protein. Moreover, two new DNA motifs have been identified in the flanking regions of the significantly associated SNPs (single nucleotide polymorphisms), which are likely to be new cis-regulatory elements of meiotic recombination hotspots of the human genome. Conclusions Our results on both simulated and real data suggest that ARG-walker is a promising new method for estimating the individual recombination variations. In the future, it could be used to uncover the mechanisms of recombination regulation and human diseases related with recombination hotspots. PMID:26679564

  13. Introgression and phenotypic assimilation in Zimmerius flycatchers (Tyrannidae): population genetic and phylogenetic inferences from genome-wide SNPs.

    PubMed

    Rheindt, Frank E; Fujita, Matthew K; Wilton, Peter R; Edwards, Scott V

    2014-03-01

    Genetic introgression is pervasive in nature and may lead to large-scale phenotypic assimilation and/or admixture of populations, but there is limited knowledge on whether large phenotypic changes are typically accompanied by high levels of introgression throughout the genome. Using bioacoustic, biometric, and spectrophotometric data from a flycatcher (Tyrannidae) system in the Neotropical genus Zimmerius, we document a mosaic pattern of phenotypic admixture in which a population of Zimmerius viridiflavus in northern Peru (henceforth "mosaic") is vocally and biometrically similar to conspecifics to the south but shares plumage characteristics with a different species (Zimmerius chrysops) to the north. To clarify the origins of the mosaic population, we used the RAD-seq approach to generate a data set of 37,361 genome-wide single nucleotide polymorphisms (SNPs). A range of population-genetic diagnostics shows that the genome of the mosaic population is largely indistinguishable from southern Z. viridiflavus and distinct from northern Z. chrysops, and the application of parsimony and species tree methods to the genome-wide SNP data set confirms the close affinity of the mosaic population with southern Z. viridiflavus. Even so, using a subset of 2710 SNPs found across all sampled lineages in configurations appropriate for a recently proposed statistical ("ABBA/BABA") test that distinguishes gene flow from incomplete lineage sorting, we detected low levels of gene flow from northern Z. chrysops into the mosaic population. Mapping the candidate loci for introgression from Z. chrysops into the mosaic population to the zebra finch genome reveals close linkage with genes significantly enriched in functions involving cell projection and plasma membranes. Introgression of key alleles may have led to phenotypic assimilation in the plumage of mosaic birds, suggesting that selection may have been a key factor facilitating introgression.

  14. Mapping the social network: tracking lice in a wild primate (Microcebus rufus) population to infer social contacts and vector potential

    PubMed Central

    2012-01-01

    Background Studies of host-parasite interactions have the potential to provide insights into the ecology of both organisms involved. We monitored the movement of sucking lice (Lemurpediculus verruculosus), parasites that require direct host-host contact to be transferred, in their host population of wild mouse lemurs (Microcebus rufus). These lemurs live in the rainforests of Madagascar, are small (40 g), arboreal, nocturnal, solitary foraging primates for which data on population-wide interactions are difficult to obtain. We developed a simple, cost effective method exploiting the intimate relationship between louse and lemur, whereby individual lice were marked, without removal from their host, with an individualized code, and tracked throughout the lemur population. We then tested the hypotheses that 1) the frequency of louse transfers, and thus interactions, would decrease with increasing distance between paired individual lemurs; 2) due to host polygynandry, social interactions and hence louse transfers would increase during the onset of the breeding season; and 3) individual mouse lemurs would vary in their contributions to the spread of lice. Results We show that louse transfers involved 43.75% of the studied lemur population, exclusively males. Louse transfers peaked during the breeding season, perhaps due to increased social interactions between lemurs. Although trap-based individual lemur ranging patterns are restricted, louse transfer rate does not correlate with the distance between lemur trapping locales, indicating wider host ranging behavior and a greater risk of rapid population-wide pathogen transmission than predicted by standard trapping data alone. Furthermore, relatively few lemur individuals contributed disproportionately to the rapid spread of lice throughout the population. Conclusions Using a simple method, we were able to visualize exchanges of lice in a population of cryptic wild primates. This method not only provided insight into the

  15. Complex population structure in African village dogs and its implications for inferring dog domestication history.

    PubMed

    Boyko, Adam R; Boyko, Ryan H; Boyko, Corin M; Parker, Heidi G; Castelhano, Marta; Corey, Liz; Degenhardt, Jeremiah D; Auton, Adam; Hedimbi, Marius; Kityo, Robert; Ostrander, Elaine A; Schoenebeck, Jeffrey; Todhunter, Rory J; Jones, Paul; Bustamante, Carlos D

    2009-08-18

    High genetic diversity of East Asian village dogs has recently been used to argue for an East Asian origin of the domestic dog. However, global village dog genetic diversity and the extent to which semiferal village dogs represent distinct, indigenous populations instead of admixtures of various dog breeds has not been quantified. Understanding these issues is critical to properly reconstructing the timing, number, and locations of dog domestication. To address these questions, we sampled 318 village dogs from 7 regions in Egypt, Uganda, and Namibia, measuring genetic diversity >680 bp of the mitochondrial D-loop, 300 SNPs, and 89 microsatellite markers. We also analyzed breed dogs, including putatively African breeds (Afghan hounds, Basenjis, Pharaoh hounds, Rhodesian ridgebacks, and Salukis), Puerto Rican street dogs, and mixed breed dogs from the United States. Village dogs from most African regions appear genetically distinct from non-native breed and mixed-breed dogs, although some individuals cluster genetically with Puerto Rican dogs or United States breed mixes instead of with neighboring village dogs. Thus, African village dogs are a mosaic of indigenous dogs descended from early migrants to Africa, and non-native, breed-admixed individuals. Among putatively African breeds, Pharaoh hounds, and Rhodesian ridgebacks clustered with non-native rather than indigenous African dogs, suggesting they have predominantly non-African origins. Surprisingly, we find similar mtDNA haplotype diversity in African and East Asian village dogs, potentially calling into question the hypothesis of an East Asian origin for dog domestication.

  16. Complex population structure in African village dogs and its implications for inferring dog domestication history

    PubMed Central

    Boyko, Adam R.; Boyko, Ryan H.; Boyko, Corin M.; Parker, Heidi G.; Castelhano, Marta; Corey, Liz; Degenhardt, Jeremiah D.; Auton, Adam; Hedimbi, Marius; Kityo, Robert; Ostrander, Elaine A.; Schoenebeck, Jeffrey; Todhunter, Rory J.; Jones, Paul; Bustamante, Carlos D.

    2009-01-01

    High genetic diversity of East Asian village dogs has recently been used to argue for an East Asian origin of the domestic dog. However, global village dog genetic diversity and the extent to which semiferal village dogs represent distinct, indigenous populations instead of admixtures of various dog breeds has not been quantified. Understanding these issues is critical to properly reconstructing the timing, number, and locations of dog domestication. To address these questions, we sampled 318 village dogs from 7 regions in Egypt, Uganda, and Namibia, measuring genetic diversity >680 bp of the mitochondrial D-loop, 300 SNPs, and 89 microsatellite markers. We also analyzed breed dogs, including putatively African breeds (Afghan hounds, Basenjis, Pharaoh hounds, Rhodesian ridgebacks, and Salukis), Puerto Rican street dogs, and mixed breed dogs from the United States. Village dogs from most African regions appear genetically distinct from non-native breed and mixed-breed dogs, although some individuals cluster genetically with Puerto Rican dogs or United States breed mixes instead of with neighboring village dogs. Thus, African village dogs are a mosaic of indigenous dogs descended from early migrants to Africa, and non-native, breed-admixed individuals. Among putatively African breeds, Pharaoh hounds, and Rhodesian ridgebacks clustered with non-native rather than indigenous African dogs, suggesting they have predominantly non-African origins. Surprisingly, we find similar mtDNA haplotype diversity in African and East Asian village dogs, potentially calling into question the hypothesis of an East Asian origin for dog domestication. PMID:19666600

  17. Phylogeography and population structure of the biologically invasive phytopathogen Erwinia amylovora inferred using minisatellites.

    PubMed

    Bühlmann, Andreas; Dreo, Tanja; Rezzonico, Fabio; Pothier, Joël F; Smits, Theo H M; Ravnikar, Maja; Frey, Jürg E; Duffy, Brion

    2014-07-01

    Erwinia amylovora causes a major disease of pome fruit trees worldwide, and is regulated as a quarantine organism in many countries. While some diversity of isolates has been observed, molecular epidemiology of this bacterium is hindered by a lack of simple molecular typing techniques with sufficiently high resolution. We report a molecular typing system of E. amylovora based on variable number of tandem repeats (VNTR) analysis. Repeats in the E. amylovora genome were identified with comparative genomic tools, and VNTR markers were developed and validated. A Multiple-Locus VNTR Analysis (MLVA) was applied to E. amylovora isolates from bacterial collections representing global and regional distribution of the pathogen. Based on six repeats, MLVA allowed the distinction of 227 haplotypes among a collection of 833 isolates of worldwide origin. Three geographically separated groups were recognized among global isolates using Bayesian clustering methods. Analysis of regional outbreaks confirmed presence of diverse haplotypes but also high representation of certain haplotypes during outbreaks. MLVA analysis is a practical method for epidemiological studies of E. amylovora, identifying previously unresolved population structure within outbreaks. Knowledge of such structure can increase our understanding on how plant diseases emerge and spread over a given geographical region.

  18. A Stenotrophomonas maltophilia Multilocus Sequence Typing Scheme for Inferring Population Structure▿ †

    PubMed Central

    Kaiser, Sabine; Biehler, Klaus; Jonas, Daniel

    2009-01-01

    Stenotrophomonas maltophilia is an opportunistic, highly resistant, and ubiquitous pathogen. Strains have been assigned to genogroups using amplified fragment length polymorphism. Hence, isolates of environmental and clinical origin predominate in different groups. A multilocus sequence typing (MLST) scheme was developed using a highly diverse selection of 70 strains of various ecological origins from seven countries on all continents including strains of the 10 previously defined genogroups. Sequence data were assigned to 54 sequence types (ST) based on seven loci. Indices of association for all isolates and clinical isolates of 2.498 and 2.562 indicated a significant linkage disequilibrium, as well as high congruence of tree topologies from different loci. Potential recombination events were detected in one-sixth of all ST. Calculation of the mean divergence between and within predicted clusters confirmed previously defined groups and revealed five additional groups. Consideration of the different ecological origins showed that 18 out of 31 respiratory tract isolates, including 12 out of 19 isolates from cystic fibrosis (CF) patients, belonged to genogroup 6. In contrast, 16 invasive strains isolated from blood cultures were distributed among nine different genogroups. Three genogroups contained isolates of strictly environmental origin that also featured high sequence distances to other genogroups, including the S. maltophilia type strain. On the basis of this MLST scheme, isolates can be assigned to the genogroups of this species in order to further scrutinize the population structure of this species and to unravel the uneven distribution of environmental and clinical isolates obtained from infected, colonized, or CF patients. PMID:19251858

  19. Population genetic analysis of a parasitic mycovirus to infer the invasion history of its fungal host.

    PubMed

    Schoebel, Corine N; Botella, Leticia; Lygis, Vaidotas; Rigling, Daniel

    2017-02-04

    Hymenoscyphus fraxineus mitovirus 1 (HfMV1) occurs in the fungus Hymenoscyphus fraxineus, an introduced plant pathogen responsible for the devastating ash dieback epidemic in Europe. Here, we explored the prevalence and genetic structure of HfMV1 to elucidate the invasion history of both the virus and the fungal host. A total of 1298 H. fraxineus isolates (181 from Japan and 1117 from Europe) were screened for the presence of this RNA virus and 301 virus-positive isolates subjected to partial sequence analysis of the viral RNA polymerase gene. Our results indicate a high mean prevalence (78.7%) of HfMV1 across European H. fraxineus isolates, which is supported by the observed high transmission rate (average 83.8%) of the mitovirus into sexual spores of its host. In accordance with an expected founder effect in the introduced population in Europe, only 1.1% of the Japanese isolates were tested virus positive. In Europe, HfMV1 shows low nucleotide diversity but a high number of haplotypes, which seem to be subject to strong purifying selection. Phylogenetic and clustering analysis detected two genetically distinct HfMV1 groups, both present throughout Europe. This pattern supports the hypothesis that only two (mitovirus-carrying) H. fraxineus individuals were introduced into Europe as previously suggested from the bi-allelic nature of the fungus. Moreover, our data points to reciprocal mating events between the two introduced individuals, which presumably initiated the ash dieback epidemic in Europe.

  20. Galactic globular cluster NGC 6752 and its stellar population as inferred from multicolor photometry

    SciTech Connect

    Kravtsov, Valery; Alcaíno, Gonzalo; Marconi, Gianni; Alvarado, Franklin E-mail: inewton@terra.cl E-mail: gmarconi@eso.org

    2014-03-01

    This paper is devoted to photometric study of the Galactic globular cluster (GGC) NGC 6752 in UBVI, focusing on the multiplicity of its stellar population. We emphasize that our U passband is (1) narrower than the standard one due to its smaller extension blueward and (2) redshifted by ∼300 Å relative to its counterparts, such as the HST F336W filter. Accordingly, both the spectral features encompassed by it and photometric effects of the multiplicity revealed in our study are somewhat different than in recent studies of NGC 6752. Main sequence stars bluer in U – B are less centrally concentrated, as red giants are. We find a statistically significant increasing luminosity of the red giant branch (RGB) bump of ΔU ≈ 0.2 mag toward the cluster outskirts with no so obvious effect in V. The photometric results are correlated with spectroscopic data: the bluer RGB stars in U – B have lower nitrogen abundances. We draw attention to a larger width of the RGB than the blue horizontal branch (BHB) in U – B. This seems to agree with the effects predicted to be caused by molecular bands produced by nitrogen-containing molecules. We find that brighter BHB stars, especially the brightest ones, are more centrally concentrated. This implies that red giants that are redder in U – B, i.e., more nitrogen enriched and centrally concentrated, are the main progenitors of the brighter BHB stars. However, such a progenitor-progeny relationship disagrees with theoretical predictions and with the results on the elemental abundances in horizontal branch stars. We isolated the asymptotic giant branch clump and estimated the parameter ΔV{sub ZAHB}{sup clump} = 0.98 ± 0.12.

  1. Huge populations and old species of Costa Rican and Panamanian dirt frogs inferred from mitochondrial and nuclear gene sequences.

    PubMed

    Crawford, A J

    2003-10-01

    Molecular genetic data were used to investigate population sizes and ages of Eleutherodactylus (Anura: Leptodactylidae), a species-rich group of small leaf-litter frogs endemic to Central America. Population genetic structure and divergence was investigated for four closely related species surveyed across nine localities in Costa Rica and Panama. DNA sequence data were collected from a mitochondrial gene (ND2) and a nuclear gene (c-myc). Phylogenetic analyses yielded concordant results between loci, with reciprocal monophyly of mitochondrial DNA haplotypes for all species and of c-myc haplotypes for three of the four species. Estimates of genetic differentiation among populations (FST) based upon mitochondrial data were always higher than nuclear-based FST estimates, even after correcting for the expected fourfold lower effective population size (Ne) of the mitochondrial genome. Comparing within-population variation and the relative mutation rates of the two genes revealed that the Ne of the mitochondrial genome was 15-fold lower than the estimate of the nuclear genome based on c-myc. Nuclear FST estimates were approximately 0 for the most proximal pairs of populations, but ranged from 0.5 to 1.0 for all other pairs, even within the same nominal species. The nuclear locus yielded estimates of Ne within localities on the order of 105. This value is two to three orders of magnitude larger than any previous Ne estimate from frogs, but is nonetheless consistent with published demographic data. Applying a molecular clock model suggested that morphologically indistinguishable populations within one species may be 107 years old. These results demonstrate that even a geologically young and dynamic region of the tropics can support very old lineages that harbour great levels of genetic diversity within populations. The association of high nucleotide diversity within populations, large divergence between populations, and high species diversity is also discussed in light of

  2. Population histories of right whales (Cetacea: Eubalaena) inferred from mitochondrial sequence diversities and divergences of their whale lice (Amphipoda: Cyamus).

    PubMed

    Kaliszewska, Zofia A; Seger, Jon; Rowntree, Victoria J; Barco, Susan G; Benegas, Rafael; Best, Peter B; Brown, Moira W; Brownell, Robert L; Carribero, Alejandro; Harcourt, Robert; Knowlton, Amy R; Marshall-Tilas, Kim; Patenaude, Nathalie J; Rivarola, Mariana; Schaeff, Catherine M; Sironi, Mariano; Smith, Wendy A; Yamada, Tadasu K

    2005-10-01

    Right whales carry large populations of three 'whale lice' (Cyamus ovalis, Cyamus gracilis, Cyamus erraticus) that have no other hosts. We used sequence variation in the mitochondrial COI gene to ask (i) whether cyamid population structures might reveal associations among right whale individuals and subpopulations, (ii) whether the divergences of the three nominally conspecific cyamid species on North Atlantic, North Pacific, and southern right whales (Eubalaena glacialis, Eubalaena japonica, Eubalaena australis) might indicate their times of separation, and (iii) whether the shapes of cyamid gene trees might contain information about changes in the population sizes of right whales. We found high levels of nucleotide diversity but almost no population structure within oceans, indicating large effective population sizes and high rates of transfer between whales and subpopulations. North Atlantic and Southern Ocean populations of all three species are reciprocally monophyletic, and North Pacific C. erraticus is well separated from North Atlantic and southern C. erraticus. Mitochondrial clock calibrations suggest that these divergences occurred around 6 million years ago (Ma), and that the Eubalaena mitochondrial clock is very slow. North Pacific C. ovalis forms a clade inside the southern C. ovalis gene tree, implying that at least one right whale has crossed the equator in the Pacific Ocean within the last 1-2 million years (Myr). Low-frequency polymorphisms are more common than expected under neutrality for populations of constant size, but there is no obvious signal of rapid, interspecifically congruent expansion of the kind that would be expected if North Atlantic or southern right whales had experienced a prolonged population bottleneck within the last 0.5 Myr.

  3. Inferring Large-Scale Terrestrial Water Storage Through GRACE and GPS Data Fusion in Cloud Computing Environments

    NASA Astrophysics Data System (ADS)

    Rude, C. M.; Li, J. D.; Gowanlock, M.; Herring, T.; Pankratius, V.

    2016-12-01

    Surface subsidence due to depletion of groundwater can lead to permanent compaction of aquifers and damaged infrastructure. However, studies of such effects on a large scale are challenging and compute intensive because they involve fusing a variety of data sets beyond direct measurements from groundwater wells, such as gravity change measurements from the Gravity Recovery and Climate Experiment (GRACE) or surface displacements measured by GPS receivers. Our work therefore leverages Amazon cloud computing to enable these types of analyses spanning the entire continental US. Changes in groundwater storage are inferred from surface displacements measured by GPS receivers stationed throughout the country. Receivers located on bedrock are anti-correlated with changes in water levels from elastic deformation due to loading, while stations on aquifers correlate with groundwater changes due to poroelastic expansion and compaction. Correlating linearly detrended equivalent water thickness measurements from GRACE with linearly detrended and Kalman filtered vertical displacements of GPS stations located throughout the United States helps compensate for the spatial and temporal limitations of GRACE. Our results show that the majority of GPS stations are negatively correlated with GRACE in a statistically relevant way, as most GPS stations are located on bedrock in order to provide stable reference locations and measure geophysical processes such as tectonic deformations. Additionally, stations located on the Central Valley California aquifer show statistically significant positive correlations. Through the identification of positive and negative correlations, deformation phenomena can be classified as loading or poroelastic expansion due to changes in groundwater. This method facilitates further studies of terrestrial water storage on a global scale. This work is supported by NASA AIST-NNX15AG84G (PI: V. Pankratius) and Amazon.

  4. Genetic structure in Orchesella cincta (Collembola): strong subdivision of European populations inferred from mtDNA and AFLP markers.

    PubMed

    Timmermans, M J T N; Ellers, J; Mariën, J; Verhoef, S C; Ferwerda, E B; VAN Straalen, N M

    2005-06-01

    Population genetic structure is determined both by current processes and historical events. Current processes include gene flow, which is largely influenced by the migration capacity of a species. Historical events are, for example, glaciation periods, which have had a major impact on the distribution of many species. Species with a low capacity or tendency to move about or disperse often exhibit clear spatial genetic structures, whereas mobile species mostly show less spatial genetic differentiation. In this paper we report on the genetic structure of a small, wingless arthropod species (Orchesella cincta: Collembola) in Europe. For this purpose we used mtDNA COII sequences and AFLP markers. We show that large genetic differences exist between populations of O. cincta, as expected from O. cincta's winglessness and sedentary lifestyle. Despite the fact that most variability was observed within populations (59%), a highly significant amount of AFLP variation (25%) was observed between populations from northwestern Europe, central Europe and Italy. This suggests that gene flow among regions is extremely low, which is additionally supported by the lack of shared mtDNA alleles between regions. Based on the genetic variation and sequence differences observed we conclude that the subdivision occurred long before the last glaciation periods. Although the populations still interbreed in the lab, we assume that in the long term the genetic isolation of these regions may lead to speciation processes.

  5. Locative inferences in medical texts.

    PubMed

    Mayer, P S; Bailey, G H; Mayer, R J; Hillis, A; Dvoracek, J E

    1987-06-01

    Medical research relies on epidemiological studies conducted on a large set of clinical records that have been collected from physicians recording individual patient observations. These clinical records are recorded for the purpose of individual care of the patient with little consideration for their use by a biostatistician interested in studying a disease over a large population. Natural language processing of clinical records for epidemiological studies must deal with temporal, locative, and conceptual issues. This makes text understanding and data extraction of clinical records an excellent area for applied research. While much has been done in making temporal or conceptual inferences in medical texts, parallel work in locative inferences has not been done. This paper examines the locative inferences as well as the integration of temporal, locative, and conceptual issues in the clinical record understanding domain by presenting an application that utilizes two key concepts in its parsing strategy--a knowledge-based parsing strategy and a minimal lexicon.

  6. How large was the founding population of Darwin's finches?

    PubMed Central

    Vincek, V.; O'Huigin, C.; Satta, Y.; Takahata, Y.; Boag, P. T.; Grant, P. R.; Grant, B. R.; Klein, J.

    1997-01-01

    A key assumption of many allopatric speciation models is that evolution in peripheral or isolated populations is facilitated by drastic reductions in population size. Population bottlenecks are believed to lead to rapid changes in gene frequencies through genetic drift, to facilitate rapid emergence of novel phenotypes, and to enhance reproductive isolation via genetic revolutions. For such effects to occur, founding populations must be very small, and remain small for some time after founding. This assumption has, however, rarely been tested in nature. One approach is to exploit the polymorphism of the major histocompatibility complex (Mhc) to obtain information about the founding population. Here, we use the Mhc polymorphism to estimate the size of the founding population of Darwin's finches in the Galápagos Archipelago. The results indicate that the population could not have been smaller than 30 individuals.

  7. Seasonal rainfall forecasting by adaptive network-based fuzzy inference system (ANFIS) using large scale climate signals

    NASA Astrophysics Data System (ADS)

    Mekanik, F.; Imteaz, M. A.; Talei, A.

    2016-05-01

    Accurate seasonal rainfall forecasting is an important step in the development of reliable runoff forecast models. The large scale climate modes affecting rainfall in Australia have recently been proven useful in rainfall prediction problems. In this study, adaptive network-based fuzzy inference systems (ANFIS) models are developed for the first time for southeast Australia in order to forecast spring rainfall. The models are applied in east, center and west Victoria as case studies. Large scale climate signals comprising El Nino Southern Oscillation (ENSO), Indian Ocean Dipole (IOD) and Inter-decadal Pacific Ocean (IPO) are selected as rainfall predictors. Eight models are developed based on single climate modes (ENSO, IOD, and IPO) and combined climate modes (ENSO-IPO and ENSO-IOD). Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Pearson correlation coefficient (r) and root mean square error in probability (RMSEP) skill score are used to evaluate the performance of the proposed models. The predictions demonstrate that ANFIS models based on individual IOD index perform superior in terms of RMSE, MAE and r to the models based on individual ENSO indices. It is further discovered that IPO is not an effective predictor for the region and the combined ENSO-IOD and ENSO-IPO predictors did not improve the predictions. In order to evaluate the effectiveness of the proposed models a comparison is conducted between ANFIS models and the conventional Artificial Neural Network (ANN), the Predictive Ocean Atmosphere Model for Australia (POAMA) and climatology forecasts. POAMA is the official dynamic model used by the Australian Bureau of Meteorology. The ANFIS predictions certify a superior performance for most of the region compared to ANN and climatology forecasts. POAMA performs better in regards to RMSE and MAE in east and part of central Victoria, however, compared to ANFIS it shows weaker results in west Victoria in terms of prediction errors and RMSEP skill

  8. Engaging a state: Facebook comments on a large population biobank.

    PubMed

    Platt, Tevah; Platt, Jodyn; Thiel, Daniel; Kardia, Sharon L R

    2017-04-05

    Scholarship on newborn screening, dried bloodspot retention, and large population biobanking call consistently for improved public engagement. Communication with participants likely occurs only in the context of collection, consent, or notification, if at all. We ran an 11-week advertising campaign to inform Michigan Facebook users unlikely to know that their or their children's dried bloodspots (DBSs) were stored in a state biobank. We investigated the pattern and content of comments posted during the campaign, focusing on users' questions, attitudes and concerns, and the role the moderator played in addressing them. We used Facebook data to quantitatively assess engagement and employed conventional content analysis to investigate themes, attitudes, and social dynamics among user and moderator comments. Five ad sets elicited comments during campaign weeks 4-8, reaching ∼800,000 Facebook users ($6000). Gravitating around broad, underlying ethical, legal, and social issues, 180 posts from 129 unique users related to newborn screening or biobanking. Thirty six conveyed negative attitudes and 33 conveyed positive attitudes; 53 posed questions. The most prevalent themes identified were consent, privacy, bloodspot use, identifiability, inclusion criteria, research benefits, (mis)trust, genetics, DBS destruction, awareness, and the role of government. The moderator's 81 posts were responsive-answering questions, correcting or clarifying information, or providing information about opting out. Facebook ad campaigns can improve engagement by pushing out relevant content and creating dynamic, responsive, visible forums for discussion. Reduced control over messaging may be worth the trade-off for creating accessible, transparent, people-centered engagement on public health issues that are sensitive and complex.

  9. Adaptation in Coding by Large Populations of Neurons in the Retina

    NASA Astrophysics Data System (ADS)

    Ioffe, Mark L.

    A comprehensive theory of neural computation requires an understanding of the statistical properties of the neural population code. The focus of this work is the experimental study and theoretical analysis of the statistical properties of neural activity in the tiger salamander retina. This is an accessible yet complex system, for which we control the visual input and record from a substantial portion--greater than a half--of the ganglion cell population generating the spiking output. Our experiments probe adaptation of the retina to visual statistics: a central feature of sensory systems which have to adjust their limited dynamic range to a far larger space of possible inputs. In Chapter 1 we place our work in context with a brief overview of the relevant background. In Chapter 2 we describe the experimental methodology of recording from 100+ ganglion cells in the tiger salamander retina. In Chapter 3 we first present the measurements of adaptation of individual cells to changes in stimulation statistics and then investigate whether pairwise correlations in fluctuations of ganglion cell activity change across different stimulation conditions. We then transition to a study of the population-level probability distribution of the retinal response captured with maximum-entropy models. Convergence of the model inference is presented in Chapter 4. In Chapter 5 we first test the empirical presence of a phase transition in such models fitting the retinal response to different experimental conditions, and then proceed to develop other characterizations which are sensitive to complexity in the interaction matrix. This includes an analysis of the dynamics of sampling at finite temperature, which demonstrates a range of subtle attractor-like properties in the energy landscape. These are largely conserved when ambient illumination is varied 1000-fold, a result not necessarily apparent from the measured low-order statistics of the distribution. Our results form a consistent

  10. Inference of chromosomal inversion dynamics from Pool-Seq data in natural and laboratory populations of Drosophila melanogaster

    PubMed Central

    Kapun, Martin; van Schalkwyk, Hester; McAllister, Bryant; Flatt, Thomas; Schlötterer, Christian

    2014-01-01

    Sequencing of pools of individuals (Pool-Seq) represents a reliable and cost-effective approach for estimating genome-wide SNP and transposable element insertion frequencies. However, Pool-Seq does not provide direct information on haplotypes so that, for example, obtaining inversion frequencies has not been possible until now. Here, we have developed a new set of diagnostic marker SNPs for seven cosmopolitan inversions in Drosophila melanogaster that can be used to infer inversion frequencies from Pool-Seq data. We applied our novel marker set to Pool-Seq data from an experimental evolution study and from North American and Australian latitudinal clines. In the experimental evolution data, we find evidence that positive selection has driven the frequencies of In(3R)C and In(3R)Mo to increase over time. In the clinal data, we confirm the existence of frequency clines for In(2L)t, In(3L)P and In(3R)Payne in both North America and Australia and detect a previously unknown latitudinal cline for In(3R)Mo in North America. The inversion markers developed here provide a versatile and robust tool for characterizing inversion frequencies and their dynamics in Pool-Seq data from diverse D. melanogaster populations. PMID:24372777

  11. Inference of chromosomal inversion dynamics from Pool-Seq data in natural and laboratory populations of Drosophila melanogaster.

    PubMed

    Kapun, Martin; van Schalkwyk, Hester; McAllister, Bryant; Flatt, Thomas; Schlötterer, Christian

    2014-04-01

    Sequencing of pools of individuals (Pool-Seq) represents a reliable and cost-effective approach for estimating genome-wide SNP and transposable element insertion frequencies. However, Pool-Seq does not provide direct information on haplotypes so that, for example, obtaining inversion frequencies has not been possible until now. Here, we have developed a new set of diagnostic marker SNPs for seven cosmopolitan inversions in Drosophila melanogaster that can be used to infer inversion frequencies from Pool-Seq data. We applied our novel marker set to Pool-Seq data from an experimental evolution study and from North American and Australian latitudinal clines. In the experimental evolution data, we find evidence that positive selection has driven the frequencies of In(3R)C and In(3R)Mo to increase over time. In the clinal data, we confirm the existence of frequency clines for In(2L)t, In(3L)P and In(3R)Payne in both North America and Australia and detect a previously unknown latitudinal cline for In(3R)Mo in North America. The inversion markers developed here provide a versatile and robust tool for characterizing inversion frequencies and their dynamics in Pool-Seq data from diverse D. melanogaster populations.

  12. Population structure of the salamander Hynobius retardatus inferred from a partial sequence of the mitochondrial DNA control region.

    PubMed

    Azuma, Noriko; Hangui, Jun-ichi; Wakahara, Masami; Michimae, Hirofumi

    2013-01-01

    We investigated population structure of the salamander Hynobius retardatus in Hokkaido, Japan using partial sequences of the mitochondrial DNA control region (490 bp) from 105 individuals. The salamanders were collected from 28 localities representing the entire regional distribution of this species. Twenty different haplotypes distributed across three haplotype groups were identified. Group 1 was widely distributed in central, northern, and eastern Hokkaido, except Erimo; Groups 2 and 3 appeared exclusively in Erimo and southern Hokkaido, respectively. The genetic distance between the three groups was not very large, but the distributions of the groups never overlapped spatially, indicating a hierarchical population structure comprising three regional groups, which was also supported by analysis of molecular variance. The results suggest that the present population structure is affected by current genetic barriers, as well as by historical transitions of climate and landscape.

  13. A Method for Inferring an Individual’s Genetic Ancestry and Degree of Admixture Associated with Six Major Continental Populations

    PubMed Central

    Libiger, Ondrej; Schork, Nicholas J.

    2013-01-01

    The determination of the ancestry and genetic backgrounds of the subjects in genetic and general epidemiology studies is a crucial component in the analysis of relevant outcomes or associations. Although there are many methods for differentiating ancestral subgroups among individuals based on genetic markers only a few of these methods provide actual estimates of the fraction of an individual’s genome that is likely to be associated with different ancestral populations. We propose a method for assigning ancestry that works in stages to refine estimates of ancestral population contributions to individual genomes. The method leverages genotype data in the public domain obtained from individuals with known ancestries. Although we showcase the method in the assessment of ancestral genome proportions leveraging largely continental populations, the strategy can be used for assessing within-continent or more subtle ancestral origins with the appropriate data. PMID:23335941

  14. Genetic diversity and structure of wild populations of Carica papaya in Northern Mesoamerica inferred by nuclear microsatellites and chloroplast markers.

    PubMed

    Chávez-Pesqueira, Mariana; Núñez-Farfán, Juan

    2016-12-01

    Few studies have evaluated the genetic structure and evolutionary history of wild varieties of important crop species. The wild papaya (Carica papaya) is a key element of early successional tropical and sub-tropical forests in Mexico, and constitutes the genetic reservoir for evolutionary potential of the species. In this study we aimed to determine how diverse and structured is the genetic variability of wild populations of C. papaya in Northern Mesoamerica. Moreover, we assessed if genetic structure and evolutionary history coincide with hypothetized (1) pre-Pleistocene events (Isthmus of Tehuantepec sinking), (2) Pleistocene refugia or (3) recent patterns. We used six nuclear and two chloroplast (cp) DNA markers to assess the genetic diversity and phylogeographical structure of 19 wild populations of C. papaya in its natural distribution in Northern Mesoamerica. We found high genetic diversity (Ho = 0·681 for nuclear markers, and h = 0·701 for cpDNA markers) and gene flow between populations of C. papaya (migration r up to 420 km). A lack of phylogeographical structure was found with the cpDNA markers (NST < GST), whereas a recent population structure was inferred with the nuclear markers. Evidence indicates that pre-Pleistocene events or refugia did not play an important role in the genetic structuring of wild papaya. Because of its life history characteristics and lack of an ancient phylogeographical structure found with the cpDNA markers, we suggest that C. papaya was dispersed throughout the lowland rain forests of Mexico (along the coastal plains and foothills of Sierras). This scenario supports the hypothesis that tropical forests in Northern Mesoamerica did not experience important climate fluctuations during the Pleistocene, and that the life history of C. papaya could have promoted long-distance dispersal and rapid colonization of lowland rainforests. Moreover, the results obtained with the nuclear markers suggest recent human disturbances. The

  15. Antarctic krill population genomics: apparent panmixia, but genome complexity and large population size muddy the water.

    PubMed

    Deagle, Bruce E; Faux, Cassandra; Kawaguchi, So; Meyer, Bettina; Jarman, Simon N

    2015-10-01

    Antarctic krill (Euphausia superba; hereafter krill) are an incredibly abundant pelagic crustacean which has a wide, but patchy, distribution in the Southern Ocean. Several studies have examined the potential for population genetic structuring in krill, but DNA-based analyses have focused on a limited number of markers and have covered only part of their circum-Antarctic range. We used mitochondrial DNA and restriction site-associated DNA sequencing (RAD-seq) to investigate genetic differences between krill from five sites, including two from East Antarctica. Our mtDNA results show no discernible genetic structuring between sites separated by thousands of kilometres, which is consistent with previous studies. Using standard RAD-seq methodology, we obtained over a billion sequences from >140 krill, and thousands of variable nucleotides were identified at hundreds of loci. However, downstream analysis found that markers with sufficient coverage were primarily from multicopy genomic regions. Careful examination of these data highlights the complexity of the RAD-seq approach in organisms with very large genomes. To characterize the multicopy markers, we recorded sequence counts from variable nucleotide sites rather than the derived genotypes; we also examined a small number of manually curated genotypes. Although these analyses effectively fingerprinted individuals, and uncovered a minor laboratory batch effect, no population structuring was observed. Overall, our results are consistent with panmixia of krill throughout their distribution. This result may indicate ongoing gene flow. However, krill's enormous population size creates substantial panmictic inertia, so genetic differentiation may not occur on an ecologically relevant timescale even if demographically separate populations exist. © 2015 John Wiley & Sons Ltd.

  16. Times to extinction for small populations of large birds.

    PubMed

    Pimm, S L; Diamond, J; Reed, T M; Russell, G J; Verner, J

    1993-11-15

    A major practical problem in conservation biology is to predict the survival times-"lifetimes"-for small populations under alternative proposed management regimes. Examples in the United States include the 'Alala (Hawaiian Crow; Corvus hawaiiensis) and Northern Spotted Owl (Strix occidentalis caurina). To guide such decisions, we analyze counts of all crow, owl, and hawk species in the most complete available data set: counts of bird breeding pairs on 14 European islands censused for 29-66 consecutive years. The data set yielded 129 records for analysis. We define the population ceiling as the highest number of breeding pairs observed from colonization to extinction, within a consecutive series of counts for a given species on a given island. The resulting distributions of population lifetimes as a function of population size prove to be highly skewed: most small populations disappear quickly, but a few last for a long time. Median (i.e., 50th percentile) lifetimes are calculated as only 1-5 yr for hawk, owl, and crow populations with ceilings of one or two breeding pairs. As expected if demographic accidents are the main cause of extinction for small populations, lifetimes rise by a factor of 3-4 for each additional pair up to three pairs. They rise more slowly thereafter. These observations suggest that lifetimes of the 'Alala (now reduced to about three pairs in the wild), and of populations of Northern Spotted Owl in the smallest forest fragments, will be short unless active management is implemented.

  17. Population Explosion in the Yellow-Spined Bamboo Locust Ceracris kiangsu and Inferences for the Impact of Human Activity

    PubMed Central

    Fan, Zhou; Jiang, Guo-Fang; Liu, Yu-Xiang; He, Qi-Xin; Blanchard, Benjamin

    2014-01-01

    Geographic distance and geographical barriers likely play a considerable role in structuring genetic variation in species, although some migratory species may have less phylogeographic structure on a smaller spatial scale. Here, genetic diversity and the phylogenetic structure among geographical populations of the yellow-spined bamboo locust, Ceracris kiangsu, were examined with 16S rDNA and amplified fragment length polymorphisms (AFLPs). In this study, no conspicuous phylogeographical structure was discovered from either Maximum parsimony (MP) and Neighbor-joining (NJ) phylogenetic analyses. The effect of geographical isolation was not conspicuous on a large spatial scale.At smaller spatial scales local diversity of some populations within mountainous areas were detected using Nei's genetic distance and AMOVA. There is a high level of genetic diversity and a low genetic differentiation among populations in the C. kiangsu of South and Southeast China. Our analyses indicate that C. kiangsu is a monophyletic group. Our results also support the hypothesis that the C. kiangsu population is in a primary differentiation stage. Given the mismatch distribution, it is likely that a population expansion in C. kiangsu occurred about 0.242 Ma during the Quaternary interglaciation. Based on historical reports, we conjecture that human activities had significant impacts on the C. kiangsu gene flow. PMID:24603526

  18. Combining phylogenetic and demographic inferences to assess the origin of the genetic diversity in an isolated wolf population

    PubMed Central

    Fabbri, Elena; Ahmed, Atidje; Bolfíková, Barbora Černá; Czarnomska, Sylwia D.; Galov, Ana; Godinho, Raquel; Hindrikson, Maris; Hulva, Pavel; Jędrzejewska, Bogumiła; Jelenčič, Maja; Kutal, Miroslav; Saarma, Urmas; Skrbinšek, Tomaž; Randi, Ettore

    2017-01-01

    The survival of isolated small populations is threatened by both demographic and genetic factors. Large carnivores declined for centuries in most of Europe due to habitat changes, overhunting of their natural prey and direct persecution. However, the current rewilding trends are driving many carnivore populations to expand again, possibly reverting the erosion of their genetic diversity. In this study we reassessed the extent and origin of the genetic variation of the Italian wolf population, which is expanding after centuries of decline and isolation. We genotyped wolves from Italy and other nine populations at four mtDNA regions (control-region, ATP6, COIII and ND4) and 39 autosomal microsatellites. Results of phylogenetic analyses and assignment procedures confirmed in the Italian wolves a second private mtDNA haplotype, which belongs to a haplogroup distributed mostly in southern Europe. Coalescent analyses showed that the unique mtDNA haplotypes in the Italian wolves likely originated during the late Pleistocene. ABC simulations concordantly showed that the extant wolf populations in Italy and in south-western Europe started to be isolated and declined right after the last glacial maximum. Thus, the standing genetic variation in the Italian wolves principally results from the historical isolation south of the Alps. PMID:28489863

  19. Population history and gene dispersal inferred from spatial genetic structure of a Central African timber tree, Distemonanthus benthamianus (Caesalpinioideae)

    PubMed Central

    Debout, G D G; Doucet, J-L; Hardy, O J

    2011-01-01

    African rainforests have undergone major distribution range shifts during the Quaternary, but few studies have investigated their impact on the genetic diversity of plant species and we lack knowledge on the extent of gene flow to predict how plant species can cope with such environmental changes. Analysis of the spatial genetic structure (SGS) of a species is an effective method to determine major directions of the demographic history of its populations and to estimate the extent of gene dispersal. This study characterises the SGS of an African tropical timber tree species, Distemonanthus benthamianus, at various spatial scales in Cameroon and Gabon. Displaying a large continuous distribution in the Lower Guinea domain, this is a model species to detect signs of past population fragmentation and recolonization, and to estimate the extent of gene dispersal. Ten microsatellite loci were used to genotype 295 adult trees sampled from eight populations. Three clearly differentiated gene pools were resolved at this regional scale and could be linked to the biogeographical history of the region, rather than to physical barriers to gene flow. A comparison with the distribution of gene pools observed for two other tree species living in the same region invalidates the basic assumption that all species share the same Quaternary refuges and recolonization pathways. In four populations, significant and similar patterns of SGS were detected. Indirect estimates of gene dispersal distances (sigma) obtained for three populations ranged from 400 to 1200 m, whereas neighbourhood size estimates ranged from 50 to 110. PMID:20389306

  20. The goat domestication process inferred from large-scale mitochondrial DNA analysis of wild and domestic individuals.

    PubMed

    Naderi, Saeid; Rezaei, Hamid-Reza; Pompanon, François; Blum, Michael G B; Negrini, Riccardo; Naghash, Hamid-Reza; Balkiz, Ozge; Mashkour, Marjan; Gaggiotti, Oscar E; Ajmone-Marsan, Paolo; Kence, Aykut; Vigne, Jean-Denis; Taberlet, Pierre

    2008-11-18

    The emergence of farming during the Neolithic transition, including the domestication of livestock, was a critical point in the evolution of human kind. The goat (Capra hircus) was one of the first domesticated ungulates. In this study, we compared the genetic diversity of domestic goats to that of the modern representatives of their wild ancestor, the bezoar, by analyzing 473 samples collected over the whole distribution range of the latter species. This partly confirms and significantly clarifies the goat domestication scenario already proposed by archaeological evidence. All of the mitochondrial DNA haplogroups found in current domestic goats have also been found in the bezoar. The geographic distribution of these haplogroups in the wild ancestor allowed the localization of the main domestication centers. We found no haplotype that could have been domesticated in the eastern half of the Iranian Plateau, nor further to the east. A signature of population expansion in bezoars of the C haplogroup suggests an early domestication center on the Central Iranian Plateau (Yazd and Kerman Provinces) and in the Southern Zagros (Fars Province), possibly corresponding to the management of wild flocks. However, the contribution of this center to the current domestic goat population is rather low (1.4%). We also found a second domestication center covering a large area in Eastern Anatolia, and possibly in Northern and Central Zagros. This last domestication center is the likely origin of almost all domestic goats today. This finding is consistent with archaeological data identifying Eastern Anatolia as an important domestication center.

  1. Large Martian regolith water content implied by rampart crater population

    NASA Astrophysics Data System (ADS)

    Stewart, S. T.; Ahrens, T. J.; O'Keefe, J. D.

    2001-12-01

    We estimate the global regolith water content using a new model for rampart crater formation (Stewart et al.~LPSC 2001). The Martian surface has a high fraction (probably significantly >20%) of craters with so-called fluidized ejecta blankets, characterized by the appearance of ground-hugging flow terminating in one or more continuous distal ramparts. While rampart craters have long held the promise of revealing information about the water content of the Martian regolith, the lack of a comprehensive physical model for the formation of fluidized ejecta blankets has hindered quantitative studies. We have developed a model for rampart crater formation based on ice shock data obtained at Martian temperatures and numerical simulations of impacts onto ice-rock mixtures under Martian conditions. We find that significant quantities of liquid water may be produced by an impact event and that the excavation process is modified by the presence of interstitial ice. As a result, single or multiple rampart ejecta blankets do not require the presence of pre-existing water in the liquid phase. A few to several volume percent of shock-produced liquid water may be incorporated into the continuous ejecta blanket for average impact conditions and reasonable regolith pore space assumptions, e.g. 15~vol% ice-filled near-surface pores. For a given diameter rampart crater, we calculate the associated minimum regolith ice content. Using the Viking-based rampart crater database by Barlow and Bradley (1990), the observed rampart crater population ( ~20% of all craters) implies a minimum regolith ice content of order 0.1~m global layer equivalent. The Mars Orbiter Laser Altimeter (MOLA) data suggest that a much larger fraction of craters, especially in the northern plains, may have rampart ejecta features. To derive the implied global regolith ice content, we correct for the impact flux rate over the past ~3~Ga using a number density for 1-10~km diameter craters, the peak rampart crater size

  2. Inferring the impact of linguistic boundaries on population differentiation: application to the Afro-Asiatic-Indo-European case.

    PubMed

    Dupanloup de Ceuninck, I; Schneider, S; Langaney, A; Excoffier, L

    2000-10-01

    We present here a quantitative way to assess the impact of language-family boundaries on population differentiation and to evaluate the homogeneity of the genetic processes along these boundaries. Our estimator (delta a) of the impact of the boundary is based on an isolation by distance (IBD) model and measures the added genetic distance between populations located on different sides of the boundary. We compare this statistic with another estimator of group differentiation (F(CT)) computed under an analysis of variance framework that does not assume any particular spatial structure of the populations. Monte Carlo simulations are used to study the behaviour of these statistics under a two-dimensional stepping-stone model. Simulations show that F(CT) can suggest the existence of a frontier when populations only differ because of IBD. This spurious behaviour is much less frequent for the delta a statistic. However, the large variance associated with the delta a statistic, and the fact that it should only be computed in the presence of IBD, may limit the use of this statistic. Overall, the origin and the effect of the boundary is best understood by comparing different statistics and by testing for the presence of IBD on each side of the boundary as well as across the boundary. We illustrate our approach by examining the boundary between Afro-Asiatic and Indo-European populations. These populations are globally genetically differentiated, but the effect of the linguistic boundary on gene flow seems geographically very heterogeneous. This boundary appears to be the result of a secondary contact between two differentiation centres rather than an enhancer of population differentiation.

  3. Inference on population history and model checking using DNA sequence and microsatellite data with the software DIYABC (v1.0).

    PubMed

    Cornuet, Jean-Marie; Ravigné, Virgine; Estoup, Arnaud

    2010-07-28

    Approximate Bayesian computation (ABC) is a recent flexible class of Monte-Carlo algorithms increasingly used to make model-based inference on complex evolutionary scenarios that have acted on natural populations. The software DIYABC offers a user-friendly interface allowing non-expert users to consider population histories involving any combination of population divergences, admixtures and population size changes. We here describe and illustrate new developments of this software that mainly include (i) inference from DNA sequence data in addition or separately to microsatellite data, (ii) the possibility to analyze five categories of loci considering balanced or non balanced sex ratios: autosomal diploid, autosomal haploid, X-linked, Y-linked and mitochondrial, and (iii) the possibility to perform model checking computation to assess the "goodness-of-fit" of a model, a feature of ABC analysis that has been so far neglected. We used controlled simulated data sets generated under evolutionary scenarios involving various divergence and admixture events to evaluate the effect of mixing autosomal microsatellite, mtDNA and/or nuclear autosomal DNA sequence data on inferences. This evaluation included the comparison of competing scenarios and the quantification of their relative support, and the estimation of parameter posterior distributions under a given scenario. We also considered a set of scenarios often compared when making ABC inferences on the routes of introduction of invasive species to illustrate the interest of the new model checking option of DIYABC to assess model misfit. Our new developments of the integrated software DIYABC should be particularly useful to make inference on complex evolutionary scenarios involving both recent and ancient historical events and using various types of molecular markers in diploid or haploid organisms. They offer a handy way for non-expert users to achieve model checking computation within an ABC framework, hence filling up a

  4. Inferring Population Genetic Structure in Widely and Continuously Distributed Carnivores: The Stone Marten (Martes foina) as a Case Study

    PubMed Central

    Vergara, María; Basto, Mafalda P.; Madeira, María José; Gómez-Moliner, Benjamín J.; Santos-Reis, Margarida; Fernandes, Carlos; Ruiz-González, Aritz

    2015-01-01

    The stone marten is a widely distributed mustelid in the Palaearctic region that exhibits variable habitat preferences in different parts of its range. The species is a Holocene immigrant from southwest Asia which, according to fossil remains, followed the expansion of the Neolithic farming cultures into Europe and possibly colonized the Iberian Peninsula during the Early Neolithic (ca. 7,000 years BP). However, the population genetic structure and historical biogeography of this generalist carnivore remains essentially unknown. In this study we have combined mitochondrial DNA (mtDNA) sequencing (621 bp) and microsatellite genotyping (23 polymorphic markers) to infer the population genetic structure of the stone marten within the Iberian Peninsula. The mtDNA data revealed low haplotype and nucleotide diversities and a lack of phylogeographic structure, most likely due to a recent colonization of the Iberian Peninsula by a few mtDNA lineages during the Early Neolithic. The microsatellite data set was analysed with a) spatial and non-spatial Bayesian individual-based clustering (IBC) approaches (STRUCTURE, TESS, BAPS and GENELAND), and b) multivariate methods [discriminant analysis of principal components (DAPC) and spatial principal component analysis (sPCA)]. Additionally, because isolation by distance (IBD) is a common spatial genetic pattern in mobile and continuously distributed species and it may represent a challenge to the performance of the above methods, the microsatellite data set was tested for its presence. Overall, the genetic structure of the stone marten in the Iberian Peninsula was characterized by a NE-SW spatial pattern of IBD, and this may explain the observed disagreement between clustering solutions obtained by the different IBC methods. However, there was significant indication for contemporary genetic structuring, albeit weak, into at least three different subpopulations. The detected subdivision could be attributed to the influence of the

  5. Inferring Population Genetic Structure in Widely and Continuously Distributed Carnivores: The Stone Marten (Martes foina) as a Case Study.

    PubMed

    Vergara, María; Basto, Mafalda P; Madeira, María José; Gómez-Moliner, Benjamín J; Santos-Reis, Margarida; Fernandes, Carlos; Ruiz-González, Aritz

    2015-01-01

    The stone marten is a widely distributed mustelid in the Palaearctic region that exhibits variable habitat preferences in different parts of its range. The species is a Holocene immigrant from southwest Asia which, according to fossil remains, followed the expansion of the Neolithic farming cultures into Europe and possibly colonized the Iberian Peninsula during the Early Neolithic (ca. 7,000 years BP). However, the population genetic structure and historical biogeography of this generalist carnivore remains essentially unknown. In this study we have combined mitochondrial DNA (mtDNA) sequencing (621 bp) and microsatellite genotyping (23 polymorphic markers) to infer the population genetic structure of the stone marten within the Iberian Peninsula. The mtDNA data revealed low haplotype and nucleotide diversities and a lack of phylogeographic structure, most likely due to a recent colonization of the Iberian Peninsula by a few mtDNA lineages during the Early Neolithic. The microsatellite data set was analysed with a) spatial and non-spatial Bayesian individual-based clustering (IBC) approaches (STRUCTURE, TESS, BAPS and GENELAND), and b) multivariate methods [discriminant analysis of principal components (DAPC) and spatial principal component analysis (sPCA)]. Additionally, because isolation by distance (IBD) is a common spatial genetic pattern in mobile and continuously distributed species and it may represent a challenge to the performance of the above methods, the microsatellite data set was tested for its presence. Overall, the genetic structure of the stone marten in the Iberian Peninsula was characterized by a NE-SW spatial pattern of IBD, and this may explain the observed disagreement between clustering solutions obtained by the different IBC methods. However, there was significant indication for contemporary genetic structuring, albeit weak, into at least three different subpopulations. The detected subdivision could be attributed to the influence of the

  6. Improving Large Area Population Mapping Using Geotweet Densities

    PubMed Central

    Stevens, Forrest R.; Huang, Zhuojie; Gaughan, Andrea E.; Elyazar, Iqbal; Tatem, Andrew J.

    2016-01-01

    Abstract Many different methods are used to disaggregate census data and predict population densities to construct finer scale, gridded population data sets. These methods often involve a range of high resolution geospatial covariate datasets on aspects such as urban areas, infrastructure, land cover and topography; such covariates, however, are not directly indicative of the presence of people. Here we tested the potential of geo‐located tweets from the social media application, Twitter, as a covariate in the production of population maps. The density of geo‐located tweets in 1x1 km grid cells over a 2‐month period across Indonesia, a country with one of the highest Twitter usage rates in the world, was input as a covariate into a previously published random forests‐based census disaggregation method. Comparison of internal measures of accuracy and external assessments between models built with and without the geotweets showed that increases in population mapping accuracy could be obtained using the geotweet densities as a covariate layer. The work highlights the potential for such social media‐derived data in improving our understanding of population distributions and offers promise for more dynamic mapping with such data being continually produced and freely available. PMID:28515661

  7. Population Genetic Structure of the Cotton Bollworm Helicoverpa armigera (Hübner) (Lepidoptera: Noctuidae) in India as Inferred from EPIC-PCR DNA Markers

    PubMed Central

    Behere, Gajanan Tryambak; Tay, Wee Tek; Russell, Derek Alan; Kranthi, Keshav Raj; Batterham, Philip

    2013-01-01

    Helicoverpa armigera is an important pest of cotton and other agricultural crops in the Old World. Its wide host range, high mobility and fecundity, and the ability to adapt and develop resistance against all common groups of insecticides used for its management have exacerbated its pest status. An understanding of the population genetic structure in H. armigera under Indian agricultural conditions will help ascertain gene flow patterns across different agricultural zones. This study inferred the population genetic structure of Indian H. armigera using five Exon-Primed Intron-Crossing (EPIC)-PCR markers. Nested alternative EPIC markers detected moderate null allele frequencies (4.3% to 9.4%) in loci used to infer population genetic structure but the apparently genome-wide heterozygote deficit suggests in-breeding or a Wahlund effect rather than a null allele effect. Population genetic analysis of the 26 populations suggested significant genetic differentiation within India but especially in cotton-feeding populations in the 2006–07 cropping season. In contrast, overall pair-wise FST estimates from populations feeding on food crops indicated no significant population substructure irrespective of cropping seasons. A Baysian cluster analysis was used to assign the genetic make-up of individuals to likely membership of population clusters. Some evidence was found for four major clusters with individuals in two populations from cotton in one year (from two populations in northern India) showing especially high homogeneity. Taken as a whole, this study found evidence of population substructure at host crop, temporal and spatial levels in Indian H. armigera, without, however, a clear biological rationale for these structures being evident. PMID:23326431

  8. Population genetic structure of the cotton bollworm Helicoverpa armigera (Hübner) (Lepidoptera: Noctuidae) in India as inferred from EPIC-PCR DNA markers.

    PubMed

    Behere, Gajanan Tryambak; Tay, Wee Tek; Russell, Derek Alan; Kranthi, Keshav Raj; Batterham, Philip

    2013-01-01

    Helicoverpa armigera is an important pest of cotton and other agricultural crops in the Old World. Its wide host range, high mobility and fecundity, and the ability to adapt and develop resistance against all common groups of insecticides used for its management have exacerbated its pest status. An understanding of the population genetic structure in H. armigera under Indian agricultural conditions will help ascertain gene flow patterns across different agricultural zones. This study inferred the population genetic structure of Indian H. armigera using five Exon-Primed Intron-Crossing (EPIC)-PCR markers. Nested alternative EPIC markers detected moderate null allele frequencies (4.3% to 9.4%) in loci used to infer population genetic structure but the apparently genome-wide heterozygote deficit suggests in-breeding or a Wahlund effect rather than a null allele effect. Population genetic analysis of the 26 populations suggested significant genetic differentiation within India but especially in cotton-feeding populations in the 2006-07 cropping season. In contrast, overall pair-wise F(ST) estimates from populations feeding on food crops indicated no significant population substructure irrespective of cropping seasons. A Baysian cluster analysis was used to assign the genetic make-up of individuals to likely membership of population clusters. Some evidence was found for four major clusters with individuals in two populations from cotton in one year (from two populations in northern India) showing especially high homogeneity. Taken as a whole, this study found evidence of population substructure at host crop, temporal and spatial levels in Indian H. armigera, without, however, a clear biological rationale for these structures being evident.

  9. Migration behaviour of silver eels (Anguilla anguilla) in a large estuary of Western Europe inferred from acoustic telemetry

    NASA Astrophysics Data System (ADS)

    Bultel, Elise; Lasne, Emilien; Acou, Anthony; Guillaudeau, Julien; Bertier, Christine; Feunteun, Eric

    2014-01-01

    Despite intensive research on eels, the behaviour of silver eels in estuaries during their migration remains poorly documented which creates serious gaps in planning the restoration of the European eel population. Estuaries are complex environments that can be exposed to large human pressures which could impede, delay migration or impact fish reproductive potential. This study investigated the estuarine migration of female silver eels in the Loire River using an acoustic telemetry system. An array of 31 hydrophones was deployed in the Loire estuary and 51 female seaward migrants were tagged with acoustic transmitters and released 20 km upstream of the estuary, at 100 km from the river mouth. 94% of the silver eels could be followed down to the river mouth. Mean global estuarine speed was 4.5 km days-1, i.e., 0.05 m s-1 and residence times varied significantly between upstream and lower compartments. Mean directional migration speed was found to be 48.6 km days-1, i.e., 0.56 m s-1, and appeared correlated with total length and body weight. Also, daily escapement rate was highly influenced by river flow.

  10. Inferences about population dynamics from count data using multistate models: a comparison to capture–recapture approaches

    PubMed Central

    Zipkin, Elise F; Sillett, T Scott; Grant, Evan H Campbell; Chandler, Richard B; Royle, J Andrew

    2014-01-01

    intensive data collection efforts (such as capture–recapture). Integrated population models that combine data from both intensive and extensive sources are likely to be the most efficient approach for estimating demographic rates at large spatial and temporal scales. PMID:24634726

  11. Inferences about population dynamics from count data using multi-state models: A comparison to capture-recapture approaches

    USGS Publications Warehouse

    Grant, Evan H. Campbell; Zipkin, Elise; Scott, Sillett T.; Chandler, Richard; Royle, J. Andrew

    2014-01-01

    data collection efforts (such as capture–recapture). Integrated population models that combine data from both intensive and extensive sources are likely to be the most efficient approach for estimating demographic rates at large spatial and temporal scales.

  12. Distinct human stem cell populations in small and large intestine.

    PubMed

    Cramer, Julie M; Thompson, Timothy; Geskin, Albert; LaFramboise, William; Lagasse, Eric

    2015-01-01

    The intestine is composed of an epithelial layer containing rapidly proliferating cells that mature into two regions, the small and the large intestine. Although previous studies have identified stem cells as the cell-of-origin for intestinal epithelial cells, no studies have directly compared stem cells derived from these anatomically distinct regions. Here, we examine intrinsic differences between primary epithelial cells isolated from human fetal small and large intestine, after in vitro expansion, using the Wnt agonist R-spondin 2. We utilized flow cytometry, fluorescence-activated cell sorting, gene expression analysis and a three-dimensional in vitro differentiation assay to characterize their stem cell properties. We identified stem cell markers that separate subpopulations of colony-forming cells in the small and large intestine and revealed important differences in differentiation, proliferation and disease pathways using gene expression analysis. Single cells from small and large intestine cultures formed organoids that reflect the distinct cellular hierarchy found in vivo and respond differently to identical exogenous cues. Our characterization identified numerous differences between small and large intestine epithelial stem cells suggesting possible connections to intestinal disease.

  13. Estimating population effects of vaccination using large, routinely collected data.

    PubMed

    Halloran, M Elizabeth; Hudgens, Michael G

    2017-07-19

    Vaccination in populations can have several kinds of effects. Establishing that vaccination produces population-level effects beyond the direct effects in the vaccinated individuals can have important consequences for public health policy. Formal methods have been developed for study designs and analysis that can estimate the different effects of vaccination. However, implementing field studies to evaluate the different effects of vaccination can be expensive, of limited generalizability, or unethical. It would be advantageous to use routinely collected data to estimate the different effects of vaccination. We consider how different types of data are needed to estimate different effects of vaccination. The examples include rotavirus vaccination of young children, influenza vaccination of elderly adults, and a targeted influenza vaccination campaign in schools. Directions for future research are discussed. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  14. Echo Behavior in Large Populations of Chemical Oscillators

    NASA Astrophysics Data System (ADS)

    Chen, Tianran; Tinsley, Mark R.; Ott, Edward; Showalter, Kenneth

    2016-10-01

    Experimental and theoretical studies are reported, for the first time, on the observation and characterization of echo phenomena in oscillatory chemical reactions. Populations of uncoupled and coupled oscillators are globally perturbed. The macroscopic response to this perturbation dies out with time: At some time τ after the perturbation (where τ is long enough that the response has died out), the system is again perturbed, and the initial response to this second perturbation again dies out. Echoes can potentially appear as responses that arise at 2 τ ,3 τ ,... after the first perturbation. The phase-resetting character of the chemical oscillators allows a detailed analysis, offering insights into the origin of the echo in terms of an intricate structure of phase relationships. Groups of oscillators experiencing different perturbations are analyzed with a geometric approach and in an analytical theory. The characterization of echo phenomena in populations of chemical oscillators reinforces recent theoretical studies of the behavior in populations of phase oscillators [E. Ott et al., Chaos 18, 037115 (2008)]. This indicates the generality of the behavior, including its likely occurrence in biological systems.

  15. A generative inference framework for analysing patterns of cultural change in sparse population data with evidence for fashion trends in LBK culture.

    PubMed

    Kandler, Anne; Shennan, Stephen

    2015-12-06

    Cultural change can be quantified by temporal changes in frequency of different cultural artefacts and it is a central question to identify what underlying cultural transmission processes could have caused the observed frequency changes. Observed changes, however, often describe the dynamics in samples of the population of artefacts, whereas transmission processes act on the whole population. Here we develop a modelling framework aimed at addressing this inference problem. To do so, we firstly generate population structures from which the observed sample could have been drawn randomly and then determine theoretical samples at a later time t2 produced under the assumption that changes in frequencies are caused by a specific transmission process. Thereby we also account for the potential effect of time-averaging processes in the generation of the observed sample. Subsequent statistical comparisons (e.g. using Bayesian inference) of the theoretical and observed samples at t2 can establish which processes could have produced the observed frequency data. In this way, we infer underlying transmission processes directly from available data without any equilibrium assumption. We apply this framework to a dataset describing pottery from settlements of some of the first farmers in Europe (the LBK culture) and conclude that the observed frequency dynamic of different types of decorated pottery is consistent with age-dependent selection, a preference for 'young' pottery types which is potentially indicative of fashion trends.

  16. A generative inference framework for analysing patterns of cultural change in sparse population data with evidence for fashion trends in LBK culture

    PubMed Central

    Kandler, Anne; Shennan, Stephen

    2015-01-01

    Cultural change can be quantified by temporal changes in frequency of different cultural artefacts and it is a central question to identify what underlying cultural transmission processes could have caused the observed frequency changes. Observed changes, however, often describe the dynamics in samples of the population of artefacts, whereas transmission processes act on the whole population. Here we develop a modelling framework aimed at addressing this inference problem. To do so, we firstly generate population structures from which the observed sample could have been drawn randomly and then determine theoretical samples at a later time t2 produced under the assumption that changes in frequencies are caused by a specific transmission process. Thereby we also account for the potential effect of time-averaging processes in the generation of the observed sample. Subsequent statistical comparisons (e.g. using Bayesian inference) of the theoretical and observed samples at t2 can establish which processes could have produced the observed frequency data. In this way, we infer underlying transmission processes directly from available data without any equilibrium assumption. We apply this framework to a dataset describing pottery from settlements of some of the first farmers in Europe (the LBK culture) and conclude that the observed frequency dynamic of different types of decorated pottery is consistent with age-dependent selection, a preference for ‘young’ pottery types which is potentially indicative of fashion trends. PMID:26674195

  17. Assessing Health Issues in States with Large Minority Populations.

    PubMed

    Long, Michelle; Menifield, Charles E; Fletcher, Audwin B

    2015-09-01

    Health care spending is often addressed in discussions of budgeting and deficits in the United States. It is important to many Americans that funds allocated for health care spending be allocated and spent in the most efficient and effective manner, leading to improved health outcomes, particularly for underserved populations. Many studies address health care spending, but few address the issue of spending as it relates to societal well-being, or certain health outcomes that adversely impact communities. This study seeks to expand the available literature by analyzing data from national sources at the state level.

  18. Quantifying prion disease penetrance using large population control cohorts.

    PubMed

    Minikel, Eric Vallabh; Vallabh, Sonia M; Lek, Monkol; Estrada, Karol; Samocha, Kaitlin E; Sathirapongsasuti, J Fah; McLean, Cory Y; Tung, Joyce Y; Yu, Linda P C; Gambetti, Pierluigi; Blevins, Janis; Zhang, Shulin; Cohen, Yvonne; Chen, Wei; Yamada, Masahito; Hamaguchi, Tsuyoshi; Sanjo, Nobuo; Mizusawa, Hidehiro; Nakamura, Yosikazu; Kitamoto, Tetsuyuki; Collins, Steven J; Boyd, Alison; Will, Robert G; Knight, Richard; Ponto, Claudia; Zerr, Inga; Kraus, Theo F J; Eigenbrod, Sabina; Giese, Armin; Calero, Miguel; de Pedro-Cuesta, Jesús; Haïk, Stéphane; Laplanche, Jean-Louis; Bouaziz-Amar, Elodie; Brandel, Jean-Philippe; Capellari, Sabina; Parchi, Piero; Poleggi, Anna; Ladogana, Anna; O'Donnell-Luria, Anne H; Karczewski, Konrad J; Marshall, Jamie L; Boehnke, Michael; Laakso, Markku; Mohlke, Karen L; Kähler, Anna; Chambert, Kimberly; McCarroll, Steven; Sullivan, Patrick F; Hultman, Christina M; Purcell, Shaun M; Sklar, Pamela; van der Lee, Sven J; Rozemuller, Annemieke; Jansen, Casper; Hofman, Albert; Kraaij, Robert; van Rooij, Jeroen G J; Ikram, M Arfan; Uitterlinden, André G; van Duijn, Cornelia M; Daly, Mark J; MacArthur, Daniel G

    2016-01-20

    More than 100,000 genetic variants are reported to cause Mendelian disease in humans, but the penetrance-the probability that a carrier of the purported disease-causing genotype will indeed develop the disease-is generally unknown. We assess the impact of variants in the prion protein gene (PRNP) on the risk of prion disease by analyzing 16,025 prion disease cases, 60,706 population control exomes, and 531,575 individuals genotyped by 23andMe Inc. We show that missense variants in PRNP previously reported to be pathogenic are at least 30 times more common in the population than expected on the basis of genetic prion disease prevalence. Although some of this excess can be attributed to benign variants falsely assigned as pathogenic, other variants have genuine effects on disease susceptibility but confer lifetime risks ranging from <0.1 to ~100%. We also show that truncating variants in PRNP have position-dependent effects, with true loss-of-function alleles found in healthy older individuals, a finding that supports the safety of therapeutic suppression of prion protein expression.

  19. Populations and determinants of airborne fungi in large office buildings.

    PubMed Central

    Chao, H Jasmine; Schwartz, Joel; Milton, Donald K; Burge, Harriet A

    2002-01-01

    Bioaerosol concentrations in office environments and their roles in causing building-related symptoms have drawn much attention in recent years. Most bioaerosol studies have been cross-sectional. We conducted a longitudinal study to examine the characteristics of airborne fungal populations and correlations with other environmental parameters in office environments. We investigated four office buildings in Boston, Massachusetts, during 1 year beginning May 1997, recruiting 21 offices with open workstations. We conducted intensive bioaerosol sampling every 6 weeks resulting in 10 sets of measurement events at each workstation, and recorded relative humidity, temperature, and CO2 concentrations continuously. We used principal component analysis (PCA) to identify groups of culturable fungal taxa that covaried in air. Four major groupings (PCA factors) were derived where the fungal taxa in the same groupings shared similar ecological requirements. Total airborne fungal concentrations varied significantly by season (highest in summer, lowest in winter) and were positively correlated with relative humidity and negatively related to CO2 concentrations. The first and second PCA factors had similar correlations with environmental variables compared with total fungi. The results of this study provide essential information on the variability within airborne fungal populations in office environments over time. These data also provide background against which cross-sectional data can be compared to facilitate interpretation. More studies are needed to correlate airborne fungi and occupants' health, controlling for seasonal effects and other important environmental factors. PMID:12153758

  20. Detecting differential protein expression in large-scale population proteomics

    SciTech Connect

    Ryu, Soyoung; Qian, Weijun; Camp, David G.; Smith, Richard D.; Tompkins, Ronald G.; Davis, Ronald W.; Xiao, Wenzhong

    2014-06-17

    Mass spectrometry-based high-throughput quantitative proteomics shows great potential in clinical biomarker studies, identifying and quantifying thousands of proteins in biological samples. However, methods are needed to appropriately handle issues/challenges unique to mass spectrometry data in order to detect as many biomarker proteins as possible. One issue is that different mass spectrometry experiments generate quite different total numbers of quantified peptides, which can result in more missing peptide abundances in an experiment with a smaller total number of quantified peptides. Another issue is that the quantification of peptides is sometimes absent, especially for less abundant peptides and such missing values contain the information about the peptide abundance. Here, we propose a Significance Analysis for Large-scale Proteomics Studies (SALPS) that handles missing peptide intensity values caused by the two mechanisms mentioned above. Our model has a robust performance in both simulated data and proteomics data from a large clinical study. Because varying patients’ sample qualities and deviating instrument performances are not avoidable for clinical studies performed over the course of several years, we believe that our approach will be useful to analyze large-scale clinical proteomics data.

  1. Collective Response of Human Populations to Large-Scale Emergencies

    PubMed Central

    Barabási, Albert-László

    2011-01-01

    Despite recent advances in uncovering the quantitative features of stationary human activity patterns, many applications, from pandemic prediction to emergency response, require an understanding of how these patterns change when the population encounters unfamiliar conditions. To explore societal response to external perturbations we identified real-time changes in communication and mobility patterns in the vicinity of eight emergencies, such as bomb attacks and earthquakes, comparing these with eight non-emergencies, like concerts and sporting events. We find that communication spikes accompanying emergencies are both spatially and temporally localized, but information about emergencies spreads globally, resulting in communication avalanches that engage in a significant manner the social network of eyewitnesses. These results offer a quantitative view of behavioral changes in human activity under extreme conditions, with potential long-term impact on emergency detection and response. PMID:21479206

  2. Collective response of human populations to large-scale emergencies.

    PubMed

    Bagrow, James P; Wang, Dashun; Barabási, Albert-László

    2011-03-30

    Despite recent advances in uncovering the quantitative features of stationary human activity patterns, many applications, from pandemic prediction to emergency response, require an understanding of how these patterns change when the population encounters unfamiliar conditions. To explore societal response to external perturbations we identified real-time changes in communication and mobility patterns in the vicinity of eight emergencies, such as bomb attacks and earthquakes, comparing these with eight non-emergencies, like concerts and sporting events. We find that communication spikes accompanying emergencies are both spatially and temporally localized, but information about emergencies spreads globally, resulting in communication avalanches that engage in a significant manner the social network of eyewitnesses. These results offer a quantitative view of behavioral changes in human activity under extreme conditions, with potential long-term impact on emergency detection and response.

  3. Medullary carcinoma of the large intestine: a population based analysis.

    PubMed

    Thirunavukarasu, Pragatheeshwar; Sathaiah, Magesh; Singla, Smit; Sukumar, Shyam; Karunamurthy, Arivarasan; Pragatheeshwar, Kothai Divya; Lee, Kenneth K W; Zeh, Herbert; Kane, Kevin M; Bartlett, David L

    2010-10-01

    Medullary carcinoma (MC) of the colorectum is a relatively new histological type of adenocarcinoma characterized by poor glandular differentiation and intraepithelial lymphocytic infiltrate. To date, there has been no epidemiological study of this rare tumor type, which has now been incorporated as a separate entity in the World Health Organization (WHO) classification of colorectal cancers. We used the population-based registries of the Surveillance, Epidemiology and End Results (SEER) database to identify all cases of colorectal MC between 1973 and 2006 and compared them to poorly and undifferentiated colonic adenocarcinomas (PDA and UDA, respectively). We observed that MCs were rare tumors, constituting approximately 5-8 cases for every 10,000 colon cancers diagnosed, with a mean annual incidence of 3.47 (+/-0.75) per 10 million population. Mean age at diagnosis was 69.3 (+/-12.5) years, with incidence increasing with age. MCs were twice as common in females, who presented at a later age, with a lower stage and a trend towards favorable prognosis. MCs were extremely rare among African-Americans. MCs were most common in the proximal colon (74%), where they present at a later age than the sigmoid colon. There were no cases reliably identified in the rectum or appendix. Serum carcinoembryonic antigen levels (CEA) were elevated prior to first course of treatment in 40% of the patients. MCs were more commonly poorly differentiated (72%), with 22% being undifferentiated. MCs commonly presented with Stage II disease, with 10% presenting with metastases. Only one patient presented with N2b disease (>7 positive nodes). Early outcome analyses showed that MCs have 1- and 2-year relative survival rates of 92.7 and 73.8% respectively. Although MCs showed a trend towards better early overall survival, undifferentiated MCs present more commonly with Stage III, with comparatively worse early outcomes.

  4. Phylogeography and population structure of the common warthog (Phacochoerus africanus) inferred from variation in mitochondrial DNA sequences and microsatellite loci.

    PubMed

    Muwanika, V B; Nyakaana, S; Siegismund, H R; Arctander, P

    2003-10-01

    Global climate fluctuated considerably throughout the Pliocene and Pleistocene, influencing the evolutionary history of a wide range of species. Using both mitochondrial sequences and microsatellites, we have investigated the evolutionary consequences of such environmental fluctuation for the patterns of genetic variation in the common warthog, sampled from 24 localities in Africa. In the sample of 181 individuals, 70 mitochondrial DNA haplotypes were identified and an overall nucleotide diversity of 4.0% was observed. The haplotypes cluster in three well-differentiated clades (estimated net sequence divergence of 3.1-6.6%) corresponding to the geographical origins of individuals (i.e. eastern, western and southern African clades). At the microsatellite loci, high polymorphism was observed both in the number of alleles per locus (6-21), and in the gene diversity (in each population 0.59-0.80). Analysis of population differentiation indicates greater subdivision at the mitochondrial loci (FST=0.85) than at nuclear loci (FST=0.20), but both mitochondrial and nuclear loci support the existence of the three warthog lineages. We interpret our results in terms of the large-scale climatic fluctuations of the Pleistocene.

  5. Large-scale inference of protein tissue origin in gram-positive sepsis plasma using quantitative targeted proteomics

    PubMed Central

    Malmström, Erik; Kilsgård, Ola; Hauri, Simon; Smeds, Emanuel; Herwald, Heiko; Malmström, Lars; Malmström, Johan

    2016-01-01

    The plasma proteome is highly dynamic and variable, composed of proteins derived from surrounding tissues and cells. To investigate the complex processes that control the composition of the plasma proteome, we developed a mass spectrometry-based proteomics strategy to infer the origin of proteins detected in murine plasma. The strategy relies on the construction of a comprehensive protein tissue atlas from cells and highly vascularized organs using shotgun mass spectrometry. The protein tissue atlas was transformed to a spectral library for highly reproducible quantification of tissue-specific proteins directly in plasma using SWATH-like data-independent mass spectrometry analysis. We show that the method can determine drastic changes of tissue-specific protein profiles in blood plasma from mouse animal models with sepsis. The strategy can be extended to several other species advancing our understanding of the complex processes that contribute to the plasma proteome dynamics. PMID:26732734

  6. Large-scale inference of protein tissue origin in gram-positive sepsis plasma using quantitative targeted proteomics.

    PubMed

    Malmström, Erik; Kilsgård, Ola; Hauri, Simon; Smeds, Emanuel; Herwald, Heiko; Malmström, Lars; Malmström, Johan

    2016-01-06

    The plasma proteome is highly dynamic and variable, composed of proteins derived from surrounding tissues and cells. To investigate the complex processes that control the composition of the plasma proteome, we developed a mass spectrometry-based proteomics strategy to infer the origin of proteins detected in murine plasma. The strategy relies on the construction of a comprehensive protein tissue atlas from cells and highly vascularized organs using shotgun mass spectrometry. The protein tissue atlas was transformed to a spectral library for highly reproducible quantification of tissue-specific proteins directly in plasma using SWATH-like data-independent mass spectrometry analysis. We show that the method can determine drastic changes of tissue-specific protein profiles in blood plasma from mouse animal models with sepsis. The strategy can be extended to several other species advancing our understanding of the complex processes that contribute to the plasma proteome dynamics.

  7. Dynamics of airborne fungal populations in a large office building

    NASA Technical Reports Server (NTRS)

    Burge, H. A.; Pierson, D. L.; Groves, T. O.; Strawn, K. F.; Mishra, S. K.

    2000-01-01

    The increasing concern with bioaerosols in large office buildings prompted this prospective study of airborne fungal concentrations in a newly constructed building on the Gulf coast. We collected volumetric culture plate air samples on 14 occasions over the 18-month period immediately following building occupancy. On each sampling occasion, we collected duplicate samples from three sites on three floors of this six-story building, and an outdoor sample. Fungal concentrations indoors were consistently below those outdoors, and no sample clearly indicated fungal contamination in the building, although visible growth appeared in the ventilation system during the course of the study. We conclude that modern mechanically ventilated buildings prevent the intrusion of most of the outdoor fungal aerosol, and that even relatively extensive air sampling protocols may not sufficiently document the microbial status of buildings.

  8. Dynamics of airborne fungal populations in a large office building

    NASA Technical Reports Server (NTRS)

    Burge, H. A.; Pierson, D. L.; Groves, T. O.; Strawn, K. F.; Mishra, S. K.

    2000-01-01

    The increasing concern with bioaerosols in large office buildings prompted this prospective study of airborne fungal concentrations in a newly constructed building on the Gulf coast. We collected volumetric culture plate air samples on 14 occasions over the 18-month period immediately following building occupancy. On each sampling occasion, we collected duplicate samples from three sites on three floors of this six-story building, and an outdoor sample. Fungal concentrations indoors were consistently below those outdoors, and no sample clearly indicated fungal contamination in the building, although visible growth appeared in the ventilation system during the course of the study. We conclude that modern mechanically ventilated buildings prevent the intrusion of most of the outdoor fungal aerosol, and that even relatively extensive air sampling protocols may not sufficiently document the microbial status of buildings.

  9. Hydrodynamic stretching of single cells for large population mechanical phenotyping

    PubMed Central

    Gossett, Daniel R.; Tse, Henry T. K.; Lee, Serena A.; Ying, Yong; Lindgren, Anne G.; Yang, Otto O.; Rao, Jianyu; Clark, Amander T.; Di Carlo, Dino

    2012-01-01

    Cell state is often assayed through measurement of biochemical and biophysical markers. Although biochemical markers have been widely used, intrinsic biophysical markers, such as the ability to mechanically deform under a load, are advantageous in that they do not require costly labeling or sample preparation. However, current techniques that assay cell mechanical properties have had limited adoption in clinical and cell biology research applications. Here, we demonstrate an automated microfluidic technology capable of probing single-cell deformability at approximately 2,000 cells/s. The method uses inertial focusing to uniformly deliver cells to a stretching extensional flow where cells are deformed at high strain rates, imaged with a high-speed camera, and computationally analyzed to extract quantitative parameters. This approach allows us to analyze cells at throughputs orders of magnitude faster than previously reported biophysical flow cytometers and single-cell mechanics tools, while creating easily observable larger strains and limiting user time commitment and bias through automation. Using this approach we rapidly assay the deformability of native populations of leukocytes and malignant cells in pleural effusions and accurately predict disease state in patients with cancer and immune activation with a sensitivity of 91% and a specificity of 86%. As a tool for biological research, we show the deformability we measure is an early biomarker for pluripotent stem cell differentiation and is likely linked to nuclear structural changes. Microfluidic deformability cytometry brings the statistical accuracy of traditional flow cytometric techniques to label-free biophysical biomarkers, enabling applications in clinical diagnostics, stem cell characterization, and single-cell biophysics. PMID:22547795

  10. Different Evolutionary Paths to Complexity for Small and Large Populations of Digital Organisms.

    PubMed

    LaBar, Thomas; Adami, Christoph

    2016-12-01

    A major aim of evolutionary biology is to explain the respective roles of adaptive versus non-adaptive changes in the evolution of complexity. While selection is certainly responsible for the spread and maintenance of complex phenotypes, this does not automatically imply that strong selection enhances the chance for the emergence of novel traits, that is, the origination of complexity. Population size is one parameter that alters the relative importance of adaptive and non-adaptive processes: as population size decreases, selection weakens and genetic drift grows in importance. Because of this relationship, many theories invoke a role for population size in the evolution of complexity. Such theories are difficult to test empirically because of the time required for the evolution of complexity in biological populations. Here, we used digital experimental evolution to test whether large or small asexual populations tend to evolve greater complexity. We find that both small and large-but not intermediate-sized-populations are favored to evolve larger genomes, which provides the opportunity for subsequent increases in phenotypic complexity. However, small and large populations followed different evolutionary paths towards these novel traits. Small populations evolved larger genomes by fixing slightly deleterious insertions, while large populations fixed rare beneficial insertions that increased genome size. These results demonstrate that genetic drift can lead to the evolution of complexity in small populations and that purifying selection is not powerful enough to prevent the evolution of complexity in large populations.

  11. Haemangiomas and Associated Congenital Malformations in a Large Population-Based Sample of Infants

    DTIC Science & Technology

    2008-01-01

    Naval Health Research Center Haemangiomas and Associated Congenital Malformations in A Large Population-based Sample of Infants A. T...unlimited. Naval Health Research Center 140 Sylvester Road San Diego, California 92106 Haemangiomas and associated congenital malformations in a...Alcaraz JE, Smith TC. Haemangio- mas and associated congenital malformations in a large population-based sample of infants. Paediatric and Perinatal

  12. The effect of simple imputation on inferences about population means when data are missing in biomedical research due to detection limits

    PubMed Central

    WANG, Hongyue; CHEN, Guanqing; LU, Xiang; ZHANG, Hui; FENG, Changyong

    2015-01-01

    Summary The sample geometric mean has been widely used in biomedical and psychosocial research to estimate and compare population geometric means. However, due to the detection limit of measurement instruments, the actual value of the measurement is not always observable. A common practice to deal with this problem is to replace missing values by small positive constants and make inferences based on the imputed data. However, no work has been carried out to study the effect of this naïve imputation method on inference. In this report, we show that this simple imputation method may dramatically change the reported outcomes of a study and, thus, make the results uninterpretable, even if the detection limit is very small. PMID:26977131

  13. The genealogical population dynamics of HIV-1 in a large transmission chain: bridging within and among host evolutionary rates.

    PubMed

    Vrancken, Bram; Rambaut, Andrew; Suchard, Marc A; Drummond, Alexei; Baele, Guy; Derdelinckx, Inge; Van Wijngaerden, Eric; Vandamme, Anne-Mieke; Van Laethem, Kristel; Lemey, Philippe

    2014-04-01

    Transmission lies at the interface of human immunodeficiency virus type 1 (HIV-1) evolution within and among hosts and separates distinct selective pressures that impose differences in both the mode of diversification and the tempo of evolution. In the absence of comprehensive direct comparative analyses of the evolutionary processes at different biological scales, our understanding of how fast within-host HIV-1 evolutionary rates translate to lower rates at the between host level remains incomplete. Here, we address this by analyzing pol and env data from a large HIV-1 subtype C transmission chain for which both the timing and the direction is known for most transmission events. To this purpose, we develop a new transmission model in a Bayesian genealogical inference framework and demonstrate how to constrain the viral evolutionary history to be compatible with the transmission history while simultaneously inferring the within-host evolutionary and population dynamics. We show that accommodating a transmission bottleneck affords the best fit our data, but the sparse within-host HIV-1 sampling prevents accurate quantification of the concomitant loss in genetic diversity. We draw inference under the transmission model to estimate HIV-1 evolutionary rates among epidemiologically-related patients and demonstrate that they lie in between fast intra-host rates and lower rates among epidemiologically unrelated individuals infected with HIV subtype C. Using a new molecular clock approach, we quantify and find support for a lower evolutionary rate along branches that accommodate a transmission event or branches that represent the entire backbone of transmitted lineages in our transmission history. Finally, we recover the rate differences at the different biological scales for both synonymous and non-synonymous substitution rates, which is only compatible with the 'store and retrieve' hypothesis positing that viruses stored early in latently infected cells preferentially

  14. Population structure of Atlantic mackerel inferred from RAD-seq-derived SNP markers: effects of sequence clustering parameters and hierarchical SNP selection.

    PubMed

    Rodríguez-Ezpeleta, Naiara; Bradbury, Ian R; Mendibil, Iñaki; Álvarez, Paula; Cotano, Unai; Irigoien, Xabier

    2016-07-01

    Restriction-site-associated DNA sequencing (RAD-seq) and related methods are revolutionizing the field of population genomics in nonmodel organisms as they allow generating an unprecedented number of single nucleotide polymorphisms (SNPs) even when no genomic information is available. Yet, RAD-seq data analyses rely on assumptions on nature and number of nucleotide variants present in a single locus, the choice of which may lead to an under- or overestimated number of SNPs and/or to incorrectly called genotypes. Using the Atlantic mackerel (Scomber scombrus L.) and a close relative, the Atlantic chub mackerel (Scomber colias), as case study, here we explore the sensitivity of population structure inferences to two crucial aspects in RAD-seq data analysis: the maximum number of mismatches allowed to merge reads into a locus and the relatedness of the individuals used for genotype calling and SNP selection. Our study resolves the population structure of the Atlantic mackerel, but, most importantly, provides insights into the effects of alternative RAD-seq data analysis strategies on population structure inferences that are directly applicable to other species.

  15. Advanced Large Scale Cross Domain Temporal Topic Modeling Algorithms to Infer the Influence of Recent Research on IPCC Assessment Reports

    NASA Astrophysics Data System (ADS)

    Sleeman, J.; Halem, M.; Finin, T.; Cane, M. A.

    2016-12-01

    topics, we establish which chapter-citation pairs are most similar. We will perform posterior inferences based on Hastings -Metropolis simulated annealing MCMC algorithm to infer, from the evolution of topics starting from AR1 to AR4, assertions of topics for AR5 and potentially AR6.

  16. Different Evolutionary Paths to Complexity for Small and Large Populations of Digital Organisms

    PubMed Central

    2016-01-01

    A major aim of evolutionary biology is to explain the respective roles of adaptive versus non-adaptive changes in the evolution of complexity. While selection is certainly responsible for the spread and maintenance of complex phenotypes, this does not automatically imply that strong selection enhances the chance for the emergence of novel traits, that is, the origination of complexity. Population size is one parameter that alters the relative importance of adaptive and non-adaptive processes: as population size decreases, selection weakens and genetic drift grows in importance. Because of this relationship, many theories invoke a role for population size in the evolution of complexity. Such theories are difficult to test empirically because of the time required for the evolution of complexity in biological populations. Here, we used digital experimental evolution to test whether large or small asexual populations tend to evolve greater complexity. We find that both small and large—but not intermediate-sized—populations are favored to evolve larger genomes, which provides the opportunity for subsequent increases in phenotypic complexity. However, small and large populations followed different evolutionary paths towards these novel traits. Small populations evolved larger genomes by fixing slightly deleterious insertions, while large populations fixed rare beneficial insertions that increased genome size. These results demonstrate that genetic drift can lead to the evolution of complexity in small populations and that purifying selection is not powerful enough to prevent the evolution of complexity in large populations. PMID:27923053

  17. Large-scale inference of gene function through phylogenetic annotation of Gene Ontology terms: case study of the apoptosis and autophagy cellular processes

    PubMed Central

    Feuermann, Marc; Gaudet, Pascale; Mi, Huaiyu; Lewis, Suzanna E.; Thomas, Paul D.

    2016-01-01

    We previously reported a paradigm for large-scale phylogenomic analysis of gene families that takes advantage of the large corpus of experimentally supported Gene Ontology (GO) annotations. This ‘GO Phylogenetic Annotation’ approach integrates GO annotations from evolutionarily related genes across ∼100 different organisms in the context of a gene family tree, in which curators build an explicit model of the evolution of gene functions. GO Phylogenetic Annotation models the gain and loss of functions in a gene family tree, which is used to infer the functions of uncharacterized (or incompletely characterized) gene products, even for human proteins that are relatively well studied. Here, we report our results from applying this paradigm to two well-characterized cellular processes, apoptosis and autophagy. This revealed several important observations with respect to GO annotations and how they can be used for function inference. Notably, we applied only a small fraction of the experimentally supported GO annotations to infer function in other family members. The majority of other annotations describe indirect effects, phenotypes or results from high throughput experiments. In addition, we show here how feedback from phylogenetic annotation leads to significant improvements in the PANTHER trees, the GO annotations and GO itself. Thus GO phylogenetic annotation both increases the quantity and improves the accuracy of the GO annotations provided to the research community. We expect these phylogenetically based annotations to be of broad use in gene enrichment analysis as well as other applications of GO annotations. Database URL: http://amigo.geneontology.org/amigo PMID:28025345

  18. Ecological Inference

    NASA Astrophysics Data System (ADS)

    King, Gary; Rosen, Ori; Tanner, Martin A.

    2004-09-01

    This collection of essays brings together a diverse group of scholars to survey the latest strategies for solving ecological inference problems in various fields. The last half-decade has witnessed an explosion of research in ecological inference--the process of trying to infer individual behavior from aggregate data. Although uncertainties and information lost in aggregation make ecological inference one of the most problematic types of research to rely on, these inferences are required in many academic fields, as well as by legislatures and the Courts in redistricting, by business in marketing research, and by governments in policy analysis.

  19. [Genetic structure of the sable Martes zibellina L. populations from magadan oblast as inferred from mitochondrial DNA variation].

    PubMed

    Petrovskaia, A V

    2007-04-01

    Restriction polymorphism of the mtDNA cytochrome b gene was studied in nine sable Martes zibellina L. populations from three introduction foci of Khabarovsk and Kamchatka sables in Magadan oblast: Olya, Kolyma, and Omolon. For comparison, similar studies were performed with the populations of central Kamchatka and Khabarovsk krai. In total, 444 DNA specimens were examined. Three mtDNA haplotypes (A, B, and C) proved to occur at various frequencies in the populations under study. The sable population system displayed high differentiation (FST = 22.3%). The populations of the Olya focus were most similar genetically to the populations of Kamchatka; those of the Omolon focus were similar to the Khabarovsk populations, and those of the Kolyma focus occupied an intermediate place. The observed spatial heterogeneity of the sable populations of Magadan oblast was explained in terms of the formation of the introduction foci of Kamchatka and Khabarovsk sables, starting from the 1950s.

  20. Identification and analysis of genomic regions with large between-population differentiation in humans.

    PubMed

    Myles, S; Tang, K; Somel, M; Green, R E; Kelso, J; Stoneking, M

    2008-01-01

    The primary aim of genetic association and linkage studies is to identify genetic variants that contribute to phenotypic variation within human populations. Since the overwhelming majority of human genetic variation is found within populations, these methods are expected to be effective and can likely be extrapolated from one human population to another. However, they may lack power in detecting the genetic variants that contribute to phenotypes that differ greatly between human populations. Phenotypes that show large differences between populations are expected to be associated with genomic regions exhibiting large allele frequency differences between populations. Thus, from genome-wide polymorphism data genomic regions with large allele frequency differences between populations can be identified, and evaluated as candidates for large between-population phenotypic differences. Here we use allele frequency data from approximately 1.5 million SNPs from three human populations, and present an algorithm that identifies genomic regions containing SNPs with extreme Fst. We demonstrate that our candidate regions have reduced heterozygosity in Europeans and Chinese relative to African-Americans, and are likely enriched with genes that have experienced positive natural selection. We identify genes that are likely responsible for phenotypes known to differ dramatically between human populations and present several candidates worthy of future investigation. Our list of high Fst genomic regions is a first step in identifying the genetic variants that contribute to large phenotypic differences between populations, many of which have likely experienced positive natural selection. Our approach based on between population differences can compliment traditional within population linkage and association studies to uncover novel genotype-phenotype relationships.

  1. Large-Scale Modelling of the Environmentally-Driven Population Dynamics of Temperate Aedes albopictus (Skuse).

    PubMed

    Erguler, Kamil; Smith-Unna, Stephanie E; Waldock, Joanna; Proestos, Yiannis; Christophides, George K; Lelieveld, Jos; Parham, Paul E

    2016-01-01

    The Asian tiger mosquito, Aedes albopictus, is a highly invasive vector species. It is a proven vector of dengue and chikungunya viruses, with the potential to host a further 24 arboviruses. It has recently expanded its geographical range, threatening many countries in the Middle East, Mediterranean, Europe and North America. Here, we investigate the theoretical limitations of its range expansion by developing an environmentally-driven mathematical model of its population dynamics. We focus on the temperate strain of Ae. albopictus and compile a comprehensive literature-based database of physiological parameters. As a novel approach, we link its population dynamics to globally-available environmental datasets by performing inference on all parameters. We adopt a Bayesian approach using experimental data as prior knowledge and the surveillance dataset of Emilia-Romagna, Italy, as evidence. The model accounts for temperature, precipitation, human population density and photoperiod as the main environmental drivers, and, in addition, incorporates the mechanism of diapause and a simple breeding site model. The model demonstrates high predictive skill over the reference region and beyond, confirming most of the current reports of vector presence in Europe. One of the main hypotheses derived from the model is the survival of Ae. albopictus populations through harsh winter conditions. The model, constrained by the environmental datasets, requires that either diapausing eggs or adult vectors have increased cold resistance. The model also suggests that temperature and photoperiod control diapause initiation and termination differentially. We demonstrate that it is possible to account for unobserved properties and constraints, such as differences between laboratory and field conditions, to derive reliable inferences on the environmental dependence of Ae. albopictus populations.

  2. Large-Scale Modelling of the Environmentally-Driven Population Dynamics of Temperate Aedes albopictus (Skuse)

    PubMed Central

    Erguler, Kamil; Smith-Unna, Stephanie E.; Waldock, Joanna; Proestos, Yiannis; Christophides, George K.; Lelieveld, Jos; Parham, Paul E.

    2016-01-01

    The Asian tiger mosquito, Aedes albopictus, is a highly invasive vector species. It is a proven vector of dengue and chikungunya viruses, with the potential to host a further 24 arboviruses. It has recently expanded its geographical range, threatening many countries in the Middle East, Mediterranean, Europe and North America. Here, we investigate the theoretical limitations of its range expansion by developing an environmentally-driven mathematical model of its population dynamics. We focus on the temperate strain of Ae. albopictus and compile a comprehensive literature-based database of physiological parameters. As a novel approach, we link its population dynamics to globally-available environmental datasets by performing inference on all parameters. We adopt a Bayesian approach using experimental data as prior knowledge and the surveillance dataset of Emilia-Romagna, Italy, as evidence. The model accounts for temperature, precipitation, human population density and photoperiod as the main environmental drivers, and, in addition, incorporates the mechanism of diapause and a simple breeding site model. The model demonstrates high predictive skill over the reference region and beyond, confirming most of the current reports of vector presence in Europe. One of the main hypotheses derived from the model is the survival of Ae. albopictus populations through harsh winter conditions. The model, constrained by the environmental datasets, requires that either diapausing eggs or adult vectors have increased cold resistance. The model also suggests that temperature and photoperiod control diapause initiation and termination differentially. We demonstrate that it is possible to account for unobserved properties and constraints, such as differences between laboratory and field conditions, to derive reliable inferences on the environmental dependence of Ae. albopictus populations. PMID:26871447

  3. Food web structure in the recently flooded Sep Reservoir as inferred from phytoplankton population dynamics and living microbial biomass.

    PubMed

    Tadonléké, R D; Jugnia, L B; Sime-Ngando, T; Devaux, J; Romagoux, J C

    2002-01-01

    Phytoplankton dynamics, bacterial standing stocks and living microbial biomass (derived from ATP measurements, 0.7-200 mm size class) were examined in 1996 in the newly flooded (1995) Sep Reservoir ('Massif Central,' France), for evidence of the importance of the microbial food web relative to the traditional food chain. Phosphate concentrations were low, N:P ratios were high, and phosphate losses converted into carbon accounted for <50% of phytoplankton biomass and production, indicating that P was limiting phytoplankton development during the study. The observed low availability of P contrasts with the high release of "directly" assimilable P often reported in newly flooded reservoirs, suggesting that factors determining nutrient dynamics in such ecosystems are complex. The phosphate availability, but also the water column stability, seemed to be among the major factors determining phytoplankton dynamics, as (i) large-size phytoplankton species were prominent during the period of increasing water column stability, whereas small-size species dominated phytoplankton assemblages during the period of decreasing stability, and (ii) a Dinobryon divergens bloom occurred during a period when inorganic P was undetectable, coinciding with the lowest values of bacterial standing stocks. Indication of grazing limitation of bacterial populations by the mixotrophic chrysophyte D. divergens (in late spring) and by other potential grazers (mainly rotifers in summer) seemed to be confirmed by the Model II or functional slopes of the bacterial vs phytoplankton regressions, which were always <0.63. Phytoplankton biomass was not correlated with phosphorus sources and its contribution was remarkably low relative to the living microbial biomass which, in contrast, was positively correlated with total phosphorus in summer. We conclude that planktonic microheterotrophs are strongly implicated in the phosphorus dynamics in the Sep Reservoir, and thus support the idea that an important

  4. SLUG - stochastically lighting up galaxies - III. A suite of tools for simulated photometry, spectroscopy, and Bayesian inference with stochastic stellar populations

    NASA Astrophysics Data System (ADS)

    Krumholz, Mark R.; Fumagalli, Michele; da Silva, Robert L.; Rendahl, Theodore; Parra, Jonathan

    2015-09-01

    Stellar population synthesis techniques for predicting the observable light emitted by a stellar population have extensive applications in numerous areas of astronomy. However, accurate predictions for small populations of young stars, such as those found in individual star clusters, star-forming dwarf galaxies, and small segments of spiral galaxies, require that the population be treated stochastically. Conversely, accurate deductions of the properties of such objects also require consideration of stochasticity. Here we describe a comprehensive suite of modular, open-source software tools for tackling these related problems. These include the following: a greatly-enhanced version of the SLUG code introduced by da Silva et al., which computes spectra and photometry for stochastically or deterministically sampled stellar populations with nearly arbitrary star formation histories, clustering properties, and initial mass functions; CLOUDY_SLUG, a tool that automatically couples SLUG-computed spectra with the CLOUDY radiative transfer code in order to predict stochastic nebular emission; BAYESPHOT, a general-purpose tool for performing Bayesian inference on the physical properties of stellar systems based on unresolved photometry; and CLUSTER_SLUG and SFR_SLUG, a pair of tools that use BAYESPHOT on a library of SLUG models to compute the mass, age, and extinction of mono-age star clusters, and the star formation rate of galaxies, respectively. The latter two tools make use of an extensive library of pre-computed stellar population models, which are included in the software. The complete package is available at http://www.slugsps.com.

  5. Population expansion and individual age affect endoparasite richness and diversity in a recolonising large carnivore population

    NASA Astrophysics Data System (ADS)

    Lesniak, Ines; Heckmann, Ilja; Heitlinger, Emanuel; Szentiks, Claudia A.; Nowak, Carsten; Harms, Verena; Jarausch, Anne; Reinhardt, Ilka; Kluth, Gesa; Hofer, Heribert; Krone, Oliver

    2017-01-01

    The recent recolonisation of the Central European lowland (CEL) by the grey wolf (Canis lupus) provides an excellent opportunity to study the effect of founder events on endoparasite diversity. Which role do prey and predator populations play in the re-establishment of endoparasite life cycles? Which intrinsic and extrinsic factors control individual endoparasite diversity in an expanding host population? In 53 individually known CEL wolves sampled in Germany, we revealed a community of four cestode, eight nematode, one trematode and 12 potential Sarcocystis species through molecular genetic techniques. Infections with zoonotic Echinococcus multilocularis, Trichinella britovi and T. spiralis occurred as single cases. Per capita endoparasite species richness and diversity significantly increased with population size and changed with age, whereas sex, microsatellite heterozygosity, and geographic origin had no effect. Tapeworm abundance (Taenia spp.) was significantly higher in immigrants than natives. Metacestode prevalence was slightly higher in ungulates from wolf territories than from control areas elsewhere. Even though alternative canid definitive hosts might also play a role within the investigated parasite life cycles, our findings indicate that (1) immigrated wolves increase parasite diversity in German packs, and (2) prevalence of wolf-associated parasites had declined during wolf absence and has now risen during recolonisation.

  6. Population expansion and individual age affect endoparasite richness and diversity in a recolonising large carnivore population

    PubMed Central

    Lesniak, Ines; Heckmann, Ilja; Heitlinger, Emanuel; Szentiks, Claudia A.; Nowak, Carsten; Harms, Verena; Jarausch, Anne; Reinhardt, Ilka; Kluth, Gesa; Hofer, Heribert; Krone, Oliver

    2017-01-01

    The recent recolonisation of the Central European lowland (CEL) by the grey wolf (Canis lupus) provides an excellent opportunity to study the effect of founder events on endoparasite diversity. Which role do prey and predator populations play in the re-establishment of endoparasite life cycles? Which intrinsic and extrinsic factors control individual endoparasite diversity in an expanding host population? In 53 individually known CEL wolves sampled in Germany, we revealed a community of four cestode, eight nematode, one trematode and 12 potential Sarcocystis species through molecular genetic techniques. Infections with zoonotic Echinococcus multilocularis, Trichinella britovi and T. spiralis occurred as single cases. Per capita endoparasite species richness and diversity significantly increased with population size and changed with age, whereas sex, microsatellite heterozygosity, and geographic origin had no effect. Tapeworm abundance (Taenia spp.) was significantly higher in immigrants than natives. Metacestode prevalence was slightly higher in ungulates from wolf territories than from control areas elsewhere. Even though alternative canid definitive hosts might also play a role within the investigated parasite life cycles, our findings indicate that (1) immigrated wolves increase parasite diversity in German packs, and (2) prevalence of wolf-associated parasites had declined during wolf absence and has now risen during recolonisation. PMID:28128348

  7. Population expansion and individual age affect endoparasite richness and diversity in a recolonising large carnivore population.

    PubMed

    Lesniak, Ines; Heckmann, Ilja; Heitlinger, Emanuel; Szentiks, Claudia A; Nowak, Carsten; Harms, Verena; Jarausch, Anne; Reinhardt, Ilka; Kluth, Gesa; Hofer, Heribert; Krone, Oliver

    2017-01-27

    The recent recolonisation of the Central European lowland (CEL) by the grey wolf (Canis lupus) provides an excellent opportunity to study the effect of founder events on endoparasite diversity. Which role do prey and predator populations play in the re-establishment of endoparasite life cycles? Which intrinsic and extrinsic factors control individual endoparasite diversity in an expanding host population? In 53 individually known CEL wolves sampled in Germany, we revealed a community of four cestode, eight nematode, one trematode and 12 potential Sarcocystis species through molecular genetic techniques. Infections with zoonotic Echinococcus multilocularis, Trichinella britovi and T. spiralis occurred as single cases. Per capita endoparasite species richness and diversity significantly increased with population size and changed with age, whereas sex, microsatellite heterozygosity, and geographic origin had no effect. Tapeworm abundance (Taenia spp.) was significantly higher in immigrants than natives. Metacestode prevalence was slightly higher in ungulates from wolf territories than from control areas elsewhere. Even though alternative canid definitive hosts might also play a role within the investigated parasite life cycles, our findings indicate that (1) immigrated wolves increase parasite diversity in German packs, and (2) prevalence of wolf-associated parasites had declined during wolf absence and has now risen during recolonisation.

  8. QuartetS-DB: A Large-Scale Orthology Database for Prokaryotes and Eukaryotes Inferred by Evolutionary Evidence

    DTIC Science & Technology

    2012-01-01

    address the accelerated growth in the number of available genome sequences, orthology detection methods must become more computationally efficient, even...that bypass portions of the computationally intensive, all-against-all comparative pro- cedure widely used by methods based on bi-directional best hits...accuracy. Hence, the challenge is to develop new methods that can handle large-scale applications (e.g., thousands of gen- omes) while balancing often

  9. Selective pressures on MHC class II genes in the guppy (Poecilia reticulata) as inferred by hierarchical analysis of population structure.

    PubMed

    Herdegen, M; Babik, W; Radwan, J

    2014-11-01

    Genes of the major histocompatibility complex, which are the most polymorphic of all vertebrate genes, are a pre-eminent system for the study of selective pressures that arise from host-pathogen interactions. Balancing selection capable of maintaining high polymorphism should lead to the homogenization of MHC allele frequencies among populations, but there is some evidence to suggest that diversifying selection also operates on the MHC. However, the pattern of population structure observed at MHC loci is likely to depend on the spatial and/or temporal scale examined. Here, we investigated selection acting on MHC genes at different geographic scales using Venezuelan guppy populations inhabiting four regions. We found a significant correlation between MHC and microsatellite allelic richness across populations, which suggests the role of genetic drift in shaping MHC diversity. However, compared to microsatellites, more MHC variation was explained by differences between populations within larger geographic regions and less by the differences between the regions. Furthermore, among proximate populations, variation in MHC allele frequencies was significantly higher compared to microsatellites, indicating that selection acting on MHC may increase population structure at small spatial scales. However, in populations that have significantly diverged at neutral markers, the population-genetic signature of diversifying selection may be eradicated in the long term by that of balancing selection, which acts to preserve rare alleles and thus maintain a common pool of MHC alleles.

  10. An algorithm for computing the gene tree probability under the multispecies coalescent and its application in the inference of population tree.

    PubMed

    Wu, Yufeng

    2016-06-15

    Gene tree represents the evolutionary history of gene lineages that originate from multiple related populations. Under the multispecies coalescent model, lineages may coalesce outside the species (population) boundary. Given a species tree (with branch lengths), the gene tree probability is the probability of observing a specific gene tree topology under the multispecies coalescent model. There are two existing algorithms for computing the exact gene tree probability. The first algorithm is due to Degnan and Salter, where they enumerate all the so-called coalescent histories for the given species tree and the gene tree topology. Their algorithm runs in exponential time in the number of gene lineages in general. The second algorithm is the STELLS algorithm (2012), which is usually faster but also runs in exponential time in almost all the cases. In this article, we present a new algorithm, called CompactCH, for computing the exact gene tree probability. This new algorithm is based on the notion of compact coalescent histories: multiple coalescent histories are represented by a single compact coalescent history. The key advantage of our new algorithm is that it runs in polynomial time in the number of gene lineages if the number of populations is fixed to be a constant. The new algorithm is more efficient than the STELLS algorithm both in theory and in practice when the number of populations is small and there are multiple gene lineages from each population. As an application, we show that CompactCH can be applied in the inference of population tree (i.e. the population divergence history) from population haplotypes. Simulation results show that the CompactCH algorithm enables efficient and accurate inference of population trees with much more haplotypes than a previous approach. The CompactCH algorithm is implemented in the STELLS software package, which is available for download at http://www.engr.uconn.edu/ywu/STELLS.html ywu@engr.uconn.edu Supplementary data are

  11. Linkages between large-scale climate patterns and the dynamics of Alaskan caribou populations

    Treesearch

    Kyle Joly; David R. Klein; David L. Verbyla; T. Scott Rupp; F. Stuart Chapin

    2011-01-01

    Recent research has linked climate warming to global declines in caribou and reindeer (both Rangifer tarandus) populations. We hypothesize large-scale climate patterns are a contributing factor explaining why these declines are not universal. To test our hypothesis for such relationships among Alaska caribou herds, we calculated the population growth...

  12. Significant population genetic structure detected in the rock bream Oplegnathus fasciatus (Temminck & Schlegel, 1844) inferred from fluorescent-AFLP analysis

    NASA Astrophysics Data System (ADS)

    Xiao, Yongshuang; Ma, Daoyuan; Xu, Shihong; Liu, Qinghua; Wang, Yanfeng; Xiao, Zhizhong; Li, Jun

    2016-05-01

    Oplegnathus fasciatus (rock bream) is a commercial rocky reef fish species in East Asia that has been considered for aquaculture. We estimated the population genetic diversity and population structure of the species along the coastal waters of China using fluorescent-amplified fragment length polymorphisms technology. Using 53 individuals from three populations and four pairs of selective primers, we amplified 1 264 bands, 98.73% of which were polymorphic. The Zhoushan population showed the highest Nei's genetic diversity and Shannon genetic diversity. The results of analysis of molecular variance (AMOVA) showed that 59.55% of genetic variation existed among populations and 40.45% occurred within populations, which indicated that a significant population genetic structure existed in the species. The pairwise fixation index F st ranged from 0.20 to 0.63 and were significant after sequential Bonferroni correction. The topology of an unweighted pair group method with arithmetic mean tree showed two significant genealogical branches corresponding to the sampling locations of North and South China. The AMOVA and STRUCTURE analyses suggested that the O. fasciatus populations examined should comprise two stocks.

  13. Genetic diversity and population history of the red panda (Ailurus fulgens) as inferred from mitochondrial DNA sequence variations.

    PubMed

    Su, B; Fu, Y; Wang, Y; Jin, L; Chakraborty, R

    2001-06-01

    The red panda (Ailurus fulgens) is one of the flagship species in worldwide conservation and is of special interest in evolutionary studies due to its taxonomic uniqueness. We sequenced a 236-bp fragment of the mitochondrial D-loop region in a sample of 53 red pandas from two populations in southwestern China. Seventeen polymorphic sites were found, together with a total of 25 haplotypes, indicating a high level of genetic diversity in the red panda. However, no obvious genetic divergence was detected between the Sichuan and Yunnan populations. The consensus phylogenetic tree of the 25 haplotypes was starlike. The pairwise mismatch distribution fitted into a pattern of populations undergoing expansion. Furthermore, Fu's F(S) test of neutrality was significant for the total population (F(S) = -7.573), which also suggests a recent population expansion. Interestingly, the effective population size in the Sichuan population was both larger and more stable than that in the Yunnan population, implying a southward expansion from Sichuan to Yunnan.

  14. BayFish: Bayesian inference of transcription dynamics from population snapshots of single-molecule RNA FISH in single cells.

    PubMed

    Gómez-Schiavon, Mariana; Chen, Liang-Fu; West, Anne E; Buchler, Nicolas E

    2017-09-04

    Single-molecule RNA fluorescence in situ hybridization (smFISH) provides unparalleled resolution in the measurement of the abundance and localization of nascent and mature RNA transcripts in fixed, single cells. We developed a computational pipeline (BayFish) to infer the kinetic parameters of gene expression from smFISH data at multiple time points after gene induction. Given an underlying model of gene expression, BayFish uses a Monte Carlo method to estimate the Bayesian posterior probability of the model parameters and quantify the parameter uncertainty given the observed smFISH data. We tested BayFish on synthetic data and smFISH measurements of the neuronal activity-inducible gene Npas4 in primary neurons.

  15. Effect of microsatellite selection on individual and population genetic inferences: an empirical study using cross-specific and species-specific amplifications.

    PubMed

    Queirós, J; Godinho, R; Lopes, S; Gortazar, C; de la Fuente, J; Alves, P C

    2015-07-01

    Although whole-genome sequencing is becoming more accessible and feasible for nonmodel organisms, microsatellites have remained the markers of choice for various population and conservation genetic studies. However, the criteria for choosing microsatellites are still controversial due to ascertainment bias that may be introduced into the genetic inference. An empirical study of red deer (Cervus elaphus) populations, in which cross-specific and species-specific microsatellites developed through pyrosequencing of enriched libraries, was performed for this study. Two different strategies were used to select the species-specific panels: randomly vs. highly polymorphic markers. The results suggest that reliable and accurate estimations of genetic diversity can be obtained using random microsatellites distributed throughout the genome. In addition, the results reinforce previous evidence that selecting the most polymorphic markers leads to an ascertainment bias in estimates of genetic diversity, when compared with randomly selected microsatellites. Analyses of population differentiation and clustering seem less influenced by the approach of microsatellite selection, whereas assigning individuals to populations might be affected by a random selection of a small number of microsatellites. Individual multilocus heterozygosity measures produced various discordant results, which in turn had impacts on the heterozygosity-fitness correlation test. Finally, we argue that picking the appropriate microsatellite set should primarily take into account the ecological and evolutionary questions studied. Selecting the most polymorphic markers will generally overestimate genetic diversity parameters, leading to misinterpretations of the real genetic diversity, which is particularly important in managed and threatened populations.

  16. Methodological assessment of 2b-RAD genotyping technique for population structure inferences in yellowfin tuna (Thunnus albacares).

    PubMed

    Pecoraro, Carlo; Babbucci, Massimiliano; Villamor, Adriana; Franch, Rafaella; Papetti, Chiara; Leroy, Bruno; Ortega-Garcia, Sofia; Muir, Jeff; Rooker, Jay; Arocha, Freddy; Murua, Hilario; Zudaire, Iker; Chassot, Emmanuel; Bodin, Nathalie; Tinti, Fausto; Bargelloni, Luca; Cariani, Alessia

    2016-02-01

    Global population genetic structure of yellowfin tuna (Thunnus albacares) is still poorly understood despite its relevance for the tuna fishery industry. Low levels of genetic differentiation among oceans speak in favour of the existence of a single panmictic population worldwide of this highly migratory fish. However, recent studies indicated genetic structuring at a much smaller geographic scales than previously considered, pointing out that YFT population genetic structure has not been properly assessed so far. In this study, we demonstrated for the first time, the utility of 2b-RAD genotyping technique for investigating population genetic diversity and differentiation in high gene-flow species. Running de novo pipeline in Stacks, a total of 6772 high-quality genome-wide SNPs were identified across Atlantic, Indian and Pacific population samples representing all major distribution areas. Preliminary analyses showed shallow but significant population structure among oceans (FST=0.0273; P-value<0.01). Discriminant Analysis of Principal Components endorsed the presence of genetically discrete yellowfin tuna populations among three oceanic pools. Although such evidence needs to be corroborated by increasing sample size, these results showed the efficiency of this genotyping technique in assessing genetic divergence in a marine fish with high dispersal potential. Copyright © 2015 Elsevier B.V. All rights reserved.

  17. Genetic population structure of the alpine species Rhododendron pseudochrysanthum sensu lato (Ericaceae) inferred from chloroplast and nuclear DNA

    PubMed Central

    2011-01-01

    Background A complex of incipient species with different degrees of morphological or ecological differentiation provides an ideal model for studying species divergence. We examined the phylogeography and the evolutionary history of the Rhododendron pseudochrysanthum s. l. Results Systematic inconsistency was detected between gene genealogies of the cpDNA and nrDNA. Rooted at R. hyperythrum and R. formosana, both trees lacked reciprocal monophyly for all members of the complex. For R. pseudochrysanthum s.l., the spatial distribution of the cpDNA had a noteworthy pattern showing high genetic differentiation (FST = 0.56-0.72) between populations in the Yushan Mountain Range and populations of the other mountain ranges. Conclusion Both incomplete lineage sorting and interspecific hybridization/introgression may have contributed to the lack of monophyly among R. hyperythrum, R. formosana and R. pseudochrysanthum s.l. Independent colonizations, plus low capabilities of seed dispersal in current environments, may have resulted in the genetic differentiation between populations of different mountain ranges. At the population level, the populations of Central, and Sheishan Mountains may have undergone postglacial demographic expansion, while populations of the Yushan Mountain Range are likely to have remained stable ever since the colonization. In contrast, the single population of the Alishan Mountain Range with a fixed cpDNA haplotype may have experienced bottleneck/founder's events. PMID:21501530

  18. History of click-speaking populations of Africa inferred from mtDNA and Y chromosome genetic variation.

    PubMed

    Tishkoff, Sarah A; Gonder, Mary Katherine; Henn, Brenna M; Mortensen, Holly; Knight, Alec; Gignoux, Christopher; Fernandopulle, Neil; Lema, Godfrey; Nyambo, Thomas B; Ramakrishnan, Uma; Reed, Floyd A; Mountain, Joanna L

    2007-10-01

    Little is known about the history of click-speaking populations in Africa. Prior genetic studies revealed that the click-speaking Hadza of eastern Africa are as distantly related to click speakers of southern Africa as are most other African populations. The Sandawe, who currently live within 150 km of the Hadza, are the only other population in eastern Africa whose language has been classified as part of the Khoisan language family. Linguists disagree on whether there is any detectable relationship between the Hadza and Sandawe click languages. We characterized both mtDNA and Y chromosome variation of the Sandawe, Hadza, and neighboring Tanzanian populations. New genetic data show that the Sandawe and southern African click speakers share rare mtDNA and Y chromosome haplogroups; however, common ancestry of the 2 populations dates back >35,000 years. These data also indicate that common ancestry of the Hadza and Sandawe populations dates back >15,000 years. These findings suggest that at the time of the spread of agriculture and pastoralism, the click-speaking populations were already isolated from one another and are consistent with relatively deep linguistic divergence among the respective click languages.

  19. A preliminary phylogenetic analysis of the Capsalidae (Platyhelminthes: Monogenea: Monopisthocotylea) inferred from large subunit rDNA sequences.

    PubMed

    Whittington, I D; Deveney, M R; Morgan, J A T; Chisholm, L A; Adlard, R D

    2004-05-01

    Phylogenetic relationships within the Capsalidae (Monogenea) were examined using large subunit ribosomal DNA sequences from 17 capsalid species (representing 7 genera, 5 subfamilies), 2 outgroup taxa (Monocotylidae) plus Udonella caligorum (Udonellidae). Trees were constructed using maximum likelihood, minimum evolution and maximum parsimony algorithms. An initial tree, generated from sequences 315 bases long, suggests that Capsalinae, Encotyllabinae, Entobdellinae and Trochopodinae are monophyletic, but that Benedeniinae is paraphyletic. Analyses indicate that Neobenedenia, currently in the Benedeniinae, should perhaps be placed in a separate subfamily. An additional analysis was made which omitted 3 capsalid taxa (for which only short sequences were available) and all outgroup taxa because of alignment difficulties. Sequence length increased to 693 bases and good branch support was achieved. The Benedeniinae was again paraphyletic. Higher-level classification of the Capsalidae, evolution of the Entobdellinae and issues of species identity in Neobenedenia are discussed.

  20. A large increase in U.S. methane emissions over the past decade inferred from satellite data and surface observations

    NASA Astrophysics Data System (ADS)

    Turner, A. J.; Jacob, D. J.; Benmergui, J.; Wofsy, S. C.; Maasakkers, J. D.; Butz, A.; Hasekamp, O.; Biraud, S. C.

    2016-03-01

    The global burden of atmospheric methane has been increasing over the past decade, but the causes are not well understood. National inventory estimates from the U.S. Environmental Protection Agency indicate no significant trend in U.S. anthropogenic methane emissions from 2002 to present. Here we use satellite retrievals and surface observations of atmospheric methane to suggest that U.S. methane emissions have increased by more than 30% over the 2002-2014 period. The trend is largest in the central part of the country, but we cannot readily attribute it to any specific source type. This large increase in U.S. methane emissions could account for 30-60% of the global growth of atmospheric methane seen in the past decade.

  1. Genetic variability of Echinococcus granulosus complex in various geographical populations of Iran inferred by mitochondrial DNA sequences.

    PubMed

    Spotin, Adel; Mahami-Oskouei, Mahmoud; Harandi, Majid Fasihi; Baratchian, Mehdi; Bordbar, Ali; Ahmadpour, Ehsan; Ebrahimi, Sahar

    2017-01-01

    To investigate the genetic variability and population structure of Echinococcus granulosus complex, 79 isolates were sequenced from different host species covering human, dog, camel, goat, sheep and cattle as of various geographical sub-populations of Iran (Northwestern, Northern, and Southeastern). In addition, 36 sequences of other geographical populations (Western, Southeastern and Central Iran), were directly retrieved from GenBank database for the mitochondrial cytochrome c oxidase subunit 1 (cox1) gene. The confirmed isolates were grouped as G1 genotype (n=92), G6 genotype (n=14), G3 genotype (n=8) and G2 genotype (n=1). 50 unique haplotypes were identified based on the analyzed sequences of cox1. A parsimonious network of the sequence haplotypes displayed star-like features in the overall population containing IR23 (22: 19.1%) as the most common haplotype. According to the analysis of molecular variance (AMOVA) test, the high value of haplotype diversity of E. granulosus complex was shown the total genetic variability within populations while nucleotide diversity was low in all populations. Neutrality indices of the cox1 (Tajima's D and Fu's Fs tests) were shown negative values in Western-Northwestern, Northern and Southeastern populations which indicating significant divergence from neutrality and positive but not significant in Central isolates. A pairwise fixation index (Fst) as a degree of gene flow was generally low value for all populations (0.00647-0.15198). The statistically Fst values indicate that Echinococcus sensu stricto (genotype G1-G3) populations are not genetically well differentiated in various geographical regions of Iran. To appraise the hypothetical evolutionary scenario, further study is needed to analyze concatenated mitogenomes and as well a panel of single locus nuclear markers should be considered in wider areas of Iran and neighboring countries. Copyright © 2016 Elsevier B.V. All rights reserved.

  2. Genetic structure in large, continuous mammal populations: the example of brown bears in northwestern Eurasia.

    PubMed

    Tammeleht, E; Remm, J; Korsten, M; Davison, J; Tumanov, I; Saveljev, A; Männil, P; Kojola, I; Saarma, U

    2010-12-01

    Knowledge of population structure and genetic diversity and the spatio-temporal demographic processes affecting populations is crucial for effective wildlife preservation, yet these factors are still poorly understood for organisms with large continuous ranges. Available population genetic data reveal that widespread mammals have for the most part only been carefully studied at the local population scale, which is insufficient for understanding population processes at larger scales. Here, we provide data on population structure, genetic diversity and gene flow in a brown bear population inhabiting the large territory of northwestern Eurasia. Analysis of 17 microsatellite loci indicated significant population substructure, consisting of four genetic groups. While three genetic clusters were confined to small geographical areas-located in Estonia, southern Finland and Leningrad oblast, Russia-the fourth cluster spanned a very large area broadly falling between northern Finland and the Arkhangelsk and Kirov oblasts of Russia. Thus, the data indicate a complex pattern where a fraction of the population exhibits large-scale gene flow that is unparalleled by other wild mammals studied to date, while the remainder of the population appears to have been structured by a combination of demographic history and landscape barriers. These results based on nuclear data are generally in good agreement with evidence previously derived using mitochondrial markers, and taken together, these markers provide complementary information about female-specific and population-level processes. Moreover, this study conveys information about spatial processes occurring over multiple generations that cannot be readily gained using other approaches, e.g. telemetry. © 2010 Blackwell Publishing Ltd.

  3. Mitochondrial DNA variation of an isolated population of the Adriatic brook lamprey Lampetra zanandreai (Agnatha: Petromyzontidae): phylogeographic and phylogenetic inferences.

    PubMed

    Caputo, V; Giovannotti, M; Nisi Cerioni, P; Splendiani, A; Marconi, M; Tagliavini, J

    2009-12-01

    Two mitochondrial genes were examined to compare an isolated population of the Adriatic brook lamprey Lampetra zanandreai in central Italy with other populations in the species range (Po plain) and with parasitic and freshwater lampreys. A single haplotype, identical to one in a Venetian sample, was found in 10 individuals from the isolated population. The reduced variability is consistent with a history of dispersal after the Pleistocene expansion of the Po basin. The results support the hypothesis of an origin of L. zanandreai and L. fluviatilis-L. planeri from a common anadromous ancestor.

  4. Decline and recovery of a large carnivore: environmental change and long-term trends in an endangered brown bear population.

    PubMed

    Martínez Cano, Isabel; Taboada, Fernando González; Naves, Javier; Fernández-Gil, Alberto; Wiegand, Thorsten

    2016-11-30

    Understanding what factors drive fluctuations in the abundance of endangered species is a difficult ecological problem but a major requirement to attain effective management and conservation success. The ecological traits of large mammals make this task even more complicated, calling for integrative approaches. We develop a framework combining individual-based modelling and statistical inference to assess alternative hypotheses on brown bear dynamics in the Cantabrian range (Iberian Peninsula). Models including the effect of environmental factors on mortality rates were able to reproduce three decades of variation in the number of females with cubs of the year (Fcoy), including the decline that put the population close to extinction in the mid-nineties, and the following increase in brown bear numbers. This external effect prevailed over density-dependent mechanisms (sexually selected infanticide and female reproductive suppression), with a major impact of climate driven changes in resource availability and a secondary role of changes in human pressure. Predicted changes in population structure revealed a nonlinear relationship between total abundance and the number of Fcoy, highlighting the risk of simple projections based on indirect abundance indices. This study demonstrates the advantages of integrative, mechanistic approaches and provides a widely applicable framework to improve our understanding of wildlife dynamics.

  5. Diversity lost: are all Holarctic large mammal species just relict populations?

    PubMed

    Hofreiter, Michael; Barnes, Ian

    2010-04-21

    Population genetic analyses of Eurasian wolves published recently in BMC Evolutionary Biology suggest that a major genetic turnover took place in Eurasian wolves after the Pleistocene. These results add to the growing evidence that large mammal species surviving the late Pleistocene extinctions nevertheless lost a large share of their genetic diversity.

  6. INTERACTIONS BETWEEN BYTHOTREPHES CEDERSTROEMI AND LEPTODORA KINDTII INFERRED FROM SEASONAL POPULATION ABUNDANCE PATTERNS IN LAKE MICHIGAMME, MICHIGAN, USA

    EPA Science Inventory


    Bythotrephes cederstroemi is a non-indigenous predaceous zooplankter invading North American freshwater lakes in the Great Lakes region. We present seasonal population abundance values for both Bythotrephes and Leptodora kindtii from Lake Michigamme, Michigan for the years ...

  7. INTERACTIONS BETWEEN BYTHOTREPHES CEDERSTROEMI AND LEPTODORA KINDTII INFERRED FROM SEASONAL POPULATION ABUNDANCE PATTERNS IN LAKE MICHIGAMME, MICHIGAN, USA

    EPA Science Inventory


    Bythotrephes cederstroemi is a non-indigenous predaceous zooplankter invading North American freshwater lakes in the Great Lakes region. We present seasonal population abundance values for both Bythotrephes and Leptodora kindtii from Lake Michigamme, Michigan for the years ...

  8. Comparative phylogeny and historical perspectives on population genetics of the Pacific hawksbill (Eretmochelys imbricata) and green turtles (Chelonia mydas), inferred from feeding populations in the Yaeyama Islands, Japan.

    PubMed

    Nishizawa, Hideaki; Okuyama, Junichi; Kobayashi, Masato; Abe, Osamu; Arai, Nobuaki

    2010-01-01

    Mitochondrial DNA sequence polymorphisms and patterns of genetic diversity represent the genealogy and relative impacts of historical, geographic, and demographic events on populations. In this study, historical patterns of population dynamics and differentiation in hawksbill (Eretmochelys imbricata) and green turtles (Chelonia mydas) in the Pacific were estimated from feeding populations in the Yaeyama Islands, Japan. Phylogenetic relationships of the haplotypes indicated that hawksbill and green turtles in the Pacific probably underwent very similar patterns and processes of population dynamics over the last million years, with population subdivision during the early Pleistocene and population expansion after the last glacial maximum. These significant contemporary historical events were suggested to have been caused by climatic and sea-level fluctuations. On the other hand, comparing our results to long-term population dynamics in the Atlantic, population subdivisions during the early Pleistocene were specific to Pacific hawksbill and green turtles. Therefore, regional differences in historical population dynamics are suggested. Despite limited sampling locations, these results are the first step in estimating the historical trends in Pacific sea turtles by using phylogenetics and population genetics.

  9. Population Genetics of Overwintering Monarch Butterflies, Danaus plexippus (Linnaeus), from Central Mexico Inferred from Mitochondrial DNA and Microsatellite Markers.

    PubMed

    Pfeiler, Edward; Nazario-Yepiz, Nestor O; Pérez-Gálvez, Fernan; Chávez-Mora, Cristina Alejandra; Laclette, Mariana Ramírez Loustalot; Rendón-Salinas, Eduardo; Markow, Therese Ann

    2017-03-01

    Population genetic variation and demographic history in Danaus plexippus (L.), from Mexico were assessed based on analyses of mitochondrial cytochrome c oxidase subunit I (COI; 658 bp) and subunit II (COII; 503 bp) gene segments and 7 microsatellite loci. The sample of 133 individuals included both migratory monarchs, mainly from 4 overwintering sites within the Monarch Butterfly Biosphere Reserve (MBBR) in central Mexico (states of Michoacán and México), and a nonmigratory population from Irapuato, Guanajuato. Haplotype (h) and nucleotide (π) diversities were relatively low, averaging 0.466 and 0.00073, respectively, for COI, and 0.629 and 0.00245 for COII. Analysis of molecular variance of the COI data set, which included additional GenBank sequences from a nonmigratory Costa Rican population, showed significant population structure between Mexican migratory monarchs and nonmigratory monarchs from both Mexico and Costa Rica, suggesting limited gene flow between the 2 behaviorally distinct groups. Interestingly, while the COI haplotype frequencies of the nonmigratory populations differed from the migratory, they were similar to each other, despite the great physical distance between them. Microsatellite analyses, however, suggested a lack of structure between the 2 groups, possibly owing to the number of significant deviations from Hardy-Weinberg equilibrium resulting from heterzoygote deficiencies found for most of the loci. Estimates of demographic history of the combined migratory MBBR monarch population, based on the mismatch distribution and Bayesian skyline analyses of the concatenated COI and COII data set (n = 89) suggested a population expansion dating to the late Pleistocene (~35000-40000 years before present) followed by a stable effective female population size (Nef) of about 6 million over the last 10000 years. © The American Genetic Association 2016.

  10. Population Genetics of Overwintering Monarch Butterflies, Danaus plexippus (Linnaeus), from Central Mexico Inferred from Mitochondrial DNA and Microsatellite Markers

    PubMed Central

    Pfeiler, Edward; Nazario-Yepiz, Nestor O.; Pérez-Gálvez, Fernan; Chávez-Mora, Cristina Alejandra; Laclette, Mariana Ramírez Loustalot; Rendón-Salinas, Eduardo

    2017-01-01

    Abstract Population genetic variation and demographic history in Danaus plexippus (L.), from Mexico were assessed based on analyses of mitochondrial cytochrome c oxidase subunit I (COI; 658 bp) and subunit II (COII; 503 bp) gene segments and 7 microsatellite loci. The sample of 133 individuals included both migratory monarchs, mainly from 4 overwintering sites within the Monarch Butterfly Biosphere Reserve (MBBR) in central Mexico (states of Michoacán and México), and a nonmigratory population from Irapuato, Guanajuato. Haplotype (h) and nucleotide (π) diversities were relatively low, averaging 0.466 and 0.00073, respectively, for COI, and 0.629 and 0.00245 for COII. Analysis of molecular variance of the COI data set, which included additional GenBank sequences from a nonmigratory Costa Rican population, showed significant population structure between Mexican migratory monarchs and nonmigratory monarchs from both Mexico and Costa Rica, suggesting limited gene flow between the 2 behaviorally distinct groups. Interestingly, while the COI haplotype frequencies of the nonmigratory populations differed from the migratory, they were similar to each other, despite the great physical distance between them. Microsatellite analyses, however, suggested a lack of structure between the 2 groups, possibly owing to the number of significant deviations from Hardy–Weinberg equilibrium resulting from heterzoygote deficiencies found for most of the loci. Estimates of demographic history of the combined migratory MBBR monarch population, based on the mismatch distribution and Bayesian skyline analyses of the concatenated COI and COII data set (n = 89) suggested a population expansion dating to the late Pleistocene (~35000–40000 years before present) followed by a stable effective female population size (Nef) of about 6 million over the last 10000 years. PMID:28003372

  11. Inferring Epidemiological Dynamics with Bayesian Coalescent Inference: The Merits of Deterministic and Stochastic Models

    PubMed Central

    Popinga, Alex; Vaughan, Tim; Stadler, Tanja; Drummond, Alexei J.

    2015-01-01

    Estimation of epidemiological and population parameters from molecular sequence data has become central to the understanding of infectious disease dynamics. Various models have been proposed to infer details of the dynamics that describe epidemic progression. These include inference approaches derived from Kingman’s coalescent theory. Here, we use recently described coalescent theory for epidemic dynamics to develop stochastic and deterministic coalescent susceptible–infected–removed (SIR) tree priors. We implement these in a Bayesian phylogenetic inference framework to permit joint estimation of SIR epidemic parameters and the sample genealogy. We assess the performance of the two coalescent models and also juxtapose results obtained with a recently published birth–death-sampling model for epidemic inference. Comparisons are made by analyzing sets of genealogies simulated under precisely known epidemiological parameters. Additionally, we analyze influenza A (H1N1) sequence data sampled in the Canterbury region of New Zealand and HIV-1 sequence data obtained from known United Kingdom infection clusters. We show that both coalescent SIR models are effective at estimating epidemiological parameters from data with large fundamental reproductive number R0 and large population size S0. Furthermore, we find that the stochastic variant generally outperforms its deterministic counterpart in terms of error, bias, and highest posterior density coverage, particularly for smaller R0 and S0. However, each of these inference models is shown to have undesirable properties in certain circumstances, especially for epidemic outbreaks with R0 close to one or with small effective susceptible populations. PMID:25527289

  12. Genetic relationships among populations of Aedes aegypti from Uruguay and northeastern Argentina inferred from ISSR-PCR data.

    PubMed

    Soliani, C; Rondan-Dueñas, J; Chiappero, M B; Martínez, M; Da Rosa, E García; Gardenal, C N

    2010-09-01

    Aedes aegypti (L.) (Diptera: Culicidae), the main vector of yellow fever and dengue viruses, was eradicated from Argentina between 1955 and 1963, but reinvaded the country in 1986. In Uruguay, the species was reintroduced in 1997. In this study we used highly polymorphic inter-simple sequence repeats (ISSR) markers to analyse the genetic structure of Ae. aegypti populations from Uruguay and northeastern Argentina to identify possible colonization patterns of the vector. Overall genetic differentiation among populations was high (F(ST) = 0.106) and showed no correlation with geographic distance, which is consistent with the short time since the reintroduction of the species in the area. Differentiation between pairs of Argentine populations (F(ST) 0.072 to 0.221) was on average higher than between Uruguayan populations (F(ST)-0.044 to 0.116). Bayesian estimation of population structure defined four genetic clusters and most populations were admixtures of two of them: Mercedes and Treinta y Tres (Uruguay) were mixtures of clusters 1 and 3; Salto (Uruguay) and Paraná (Argentina) of clusters 1 and 4; Fray Bentos (Uruguay) of clusters 2 and 3, and Gualeguaychú (Argentina) of clusters 2 and 3. Posadas and Buenos Aires in Argentina were fairly genetically homogeneous. Our results suggest that Ae. aegypti recolonized Uruguay from bordering cities in Argentina via bridges over the Uruguay River and also from Brazil.

  13. Inferences on the population structure and colonization process of the invasive oriental fruit fly, Bactrocera dorsalis (Hendel).

    PubMed

    Aketarawong, N; Bonizzoni, M; Thanaphum, S; Gomulski, L M; Gasperi, G; Malacrida, A R; Gugliemino, C R

    2007-09-01

    The phytophagous insects of the Tephritidae family offer different case histories of successful invasions. An example is Bactrocera dorsalis sensu stricto, the oriental fruit fly which has been recognized as a key pest of Asia and the Pacific. It is known to have the potential to establish adventive populations in various tropical and subtropical areas. Despite the economic risk associated with a putative stable presence of this fly, the genetic aspects of its invasion process have remained relatively unexplored. Using microsatellite markers we have investigated the population structure and genetic variability in 14 geographical populations across the four areas of the actual species range: Far East Asia, South Asia, Southeast Asia and the Pacific Area. Results of clustering and admixture, associated with phylogenetic and migration analyses, were used to evaluate the changes in population genetic structure that this species underwent during its invasion process and establishment in the different areas. The colonization process of this fly is associated with a relatively stable population demographic structure, especially in an unfragmented habitat, rich in intensive cultivation such as in Southeast Asia. In this area, the results suggest a lively demographic history, characterized by evolutionary recent demographic expansions and no recent bottlenecks. Cases of genetic isolation attributable to geographical factors, fragmented habitats and/or fruit trade restrictions were observed in Bangladesh, Myanmar and Hawaii. Regarding the pattern of invasion, the overall genetic profile of the considered populations suggests a western orientated migration route from China to the West.

  14. Inferred Cosmic-Ray Spectrum from Fermi Large Area Telescope γ-Ray Observations of Earth’s Limb

    SciTech Connect

    Ackermann, M.; et al.

    2014-04-17

    Recent accurate measurements of cosmic-ray (CR) species by ATIC-2, CREAM, and PAMELA reveal an unexpected hardening in the proton and He spectra above a few hundred GeV, a gradual softening of the spectra just below a few hundred GeV, and a harder spectrum of He compared to that of protons. These newly-discovered features may offer a clue to the origin of high-energy CRs. We use the ${\\it Fermi}$ Large Area Telescope observations of the $\\gamma$-ray emission from the Earth's limb for an indirect measurement of the local spectrum of CR protons in the energy range $\\sim 90~$GeV-$6~$TeV (derived from a photon energy range $15~$GeV-$1~$TeV). Our analysis shows that single power law and broken power law spectra fit the data equally well and yield a proton spectrum with index $2.68 \\pm 0.04$ and $2.61 \\pm 0.08$ above $\\sim 200~$GeV, respectively.

  15. Inferred Cosmic-Ray Spectrum from Fermi Large Area Telescope γ-Ray Observations of Earth's Limb

    NASA Astrophysics Data System (ADS)

    Ackermann, M.; Ajello, M.; Albert, A.; Allafort, A.; Baldini, L.; Barbiellini, G.; Bastieri, D.; Bechtol, K.; Bellazzini, R.; Blandford, R. D.; Bloom, E. D.; Bonamente, E.; Bottacini, E.; Bouvier, A.; Brandt, T. J.; Brigida, M.; Bruel, P.; Buehler, R.; Buson, S.; Caliandro, G. A.; Cameron, R. A.; Caraveo, P. A.; Cecchi, C.; Charles, E.; Chaves, R. C. G.; Chekhtman, A.; Chiang, J.; Chiaro, G.; Ciprini, S.; Claus, R.; Cohen-Tanugi, J.; Conrad, J.; Cutini, S.; Dalton, M.; D'Ammando, F.; de Angelis, A.; de Palma, F.; Dermer, C. D.; Digel, S. W.; Di Venere, L.; do Couto e Silva, E.; Drell, P. S.; Drlica-Wagner, A.; Favuzzi, C.; Fegan, S. J.; Ferrara, E. C.; Focke, W. B.; Franckowiak, A.; Fukazawa, Y.; Funk, S.; Fusco, P.; Gargano, F.; Gasparrini, D.; Germani, S.; Giglietto, N.; Giordano, F.; Giroletti, M.; Glanzman, T.; Godfrey, G.; Gomez-Vargas, G. A.; Grenier, I. A.; Grove, J. E.; Guiriec, S.; Gustafsson, M.; Hadasch, D.; Hanabata, Y.; Harding, A. K.; Hayashida, M.; Hayashi, K.; Hewitt, J. W.; Horan, D.; Hou, X.; Hughes, R. E.; Inoue, Y.; Jackson, M. S.; Jogler, T.; Jóhannesson, G.; Johnson, A. S.; Kamae, T.; Kawano, T.; Knödlseder, J.; Kuss, M.; Lande, J.; Larsson, S.; Latronico, L.; Longo, F.; Loparco, F.; Lovellette, M. N.; Lubrano, P.; Mayer, M.; Mazziotta, M. N.; McEnery, J. E.; Mehault, J.; Michelson, P. F.; Mitthumsiri, W.; Mizuno, T.; Moiseev, A. A.; Monte, C.; Monzani, M. E.; Morselli, A.; Moskalenko, I. V.; Murgia, S.; Nemmen, R.; Nuss, E.; Ohsugi, T.; Okumura, A.; Orienti, M.; Orlando, E.; Ormes, J. F.; Paneque, D.; Panetta, J. H.; Perkins, J. S.; Pesce-Rollins, M.; Piron, F.; Pivato, G.; Porter, T. A.; Rainò, S.; Rando, R.; Razzano, M.; Razzaque, S.; Reimer, A.; Reimer, O.; Ritz, S.; Roth, M.; Schaal, M.; Schulz, A.; Sgrò, C.; Siskind, E. J.; Spandre, G.; Spinelli, P.; Strong, A. W.; Takahashi, H.; Takeuchi, Y.; Thayer, J. G.; Thayer, J. B.; Thompson, D. J.; Tibaldo, L.; Tinivella, M.; Torres, D. F.; Tosti, G.; Troja, E.; Tronconi, V.; Usher, T. L.; Vandenbroucke, J.; Vasileiou, V.; Vianello, G.; Vitale, V.; Werner, M.; Winer, B. L.; Wood, K. S.; Wood, M.; Yang, Z.; Fermi LAT Collaboration

    2014-04-01

    Recent accurate measurements of cosmic-ray (CR) species by ATIC-2, CREAM, and PAMELA reveal an unexpected hardening in the proton and He spectra above a few hundred GeV, a gradual softening of the spectra just below a few hundred GeV, and a harder spectrum of He compared to that of protons. These newly discovered features may offer a clue to the origin of high-energy CRs. We use the Fermi Large Area Telescope observations of the γ-ray emission from Earth's limb for an indirect measurement of the local spectrum of CR protons in the energy range ˜90 GeV-6 TeV (derived from a photon energy range 15 GeV-1 TeV). Our analysis shows that single power law and broken power law spectra fit the data equally well and yield a proton spectrum with index 2.68±0.04 and 2.61±0.08 above ˜200 GeV, respectively.

  16. Inferred Cosmic-Ray Spectrum from Fermi Large Area Telescope γ -Ray Observations of Earth’s Limb

    DOE PAGES

    Ackermann, M.; Ajello, M.; Albert, A.; ...

    2014-04-17

    Accurate measurements of cosmic-ray (CR) species by ATIC-2, CREAM, and PAMELA recently reveal an unexpected hardening in the proton and He spectra above a few hundred GeV, a gradual softening of the spectra just below a few hundred GeV, and a harder spectrum of He compared to that of protons. These newly discovered features may offer a clue to the origin of high-energy CRs. Here, we use the Fermi Large Area Telescope observations of the γ -ray emission from Earth’s limb for an indirect measurement of the local spectrum of CR protons in the energy range ~ 90 GeV –more » 6 TeV (derived from a photon energy range 15 GeV–1 TeV). Our analysis shows that single power law and broken power law spectra fit the data equally well and yield a proton spectrum with index 2.68 ± 0.04 and 2.61 ± 0.08 above ~ 200 GeV , respectively.« less

  17. Inferred cosmic-ray spectrum from Fermi large area telescope γ-ray observations of Earth's limb.

    PubMed

    Ackermann, M; Ajello, M; Albert, A; Allafort, A; Baldini, L; Barbiellini, G; Bastieri, D; Bechtol, K; Bellazzini, R; Blandford, R D; Bloom, E D; Bonamente, E; Bottacini, E; Bouvier, A; Brandt, T J; Brigida, M; Bruel, P; Buehler, R; Buson, S; Caliandro, G A; Cameron, R A; Caraveo, P A; Cecchi, C; Charles, E; Chaves, R C G; Chekhtman, A; Chiang, J; Chiaro, G; Ciprini, S; Claus, R; Cohen-Tanugi, J; Conrad, J; Cutini, S; Dalton, M; D'Ammando, F; de Angelis, A; de Palma, F; Dermer, C D; Digel, S W; Di Venere, L; do Couto e Silva, E; Drell, P S; Drlica-Wagner, A; Favuzzi, C; Fegan, S J; Ferrara, E C; Focke, W B; Franckowiak, A; Fukazawa, Y; Funk, S; Fusco, P; Gargano, F; Gasparrini, D; Germani, S; Giglietto, N; Giordano, F; Giroletti, M; Glanzman, T; Godfrey, G; Gomez-Vargas, G A; Grenier, I A; Grove, J E; Guiriec, S; Gustafsson, M; Hadasch, D; Hanabata, Y; Harding, A K; Hayashida, M; Hayashi, K; Hewitt, J W; Horan, D; Hou, X; Hughes, R E; Inoue, Y; Jackson, M S; Jogler, T; Jóhannesson, G; Johnson, A S; Kamae, T; Kawano, T; Knödlseder, J; Kuss, M; Lande, J; Larsson, S; Latronico, L; Longo, F; Loparco, F; Lovellette, M N; Lubrano, P; Mayer, M; Mazziotta, M N; McEnery, J E; Mehault, J; Michelson, P F; Mitthumsiri, W; Mizuno, T; Moiseev, A A; Monte, C; Monzani, M E; Morselli, A; Moskalenko, I V; Murgia, S; Nemmen, R; Nuss, E; Ohsugi, T; Okumura, A; Orienti, M; Orlando, E; Ormes, J F; Paneque, D; Panetta, J H; Perkins, J S; Pesce-Rollins, M; Piron, F; Pivato, G; Porter, T A; Rainò, S; Rando, R; Razzano, M; Razzaque, S; Reimer, A; Reimer, O; Ritz, S; Roth, M; Schaal, M; Schulz, A; Sgrò, C; Siskind, E J; Spandre, G; Spinelli, P; Strong, A W; Takahashi, H; Takeuchi, Y; Thayer, J G; Thayer, J B; Thompson, D J; Tibaldo, L; Tinivella, M; Torres, D F; Tosti, G; Troja, E; Tronconi, V; Usher, T L; Vandenbroucke, J; Vasileiou, V; Vianello, G; Vitale, V; Werner, M; Winer, B L; Wood, K S; Wood, M; Yang, Z

    2014-04-18

    Recent accurate measurements of cosmic-ray (CR) species by ATIC-2, CREAM, and PAMELA reveal an unexpected hardening in the proton and He spectra above a few hundred GeV, a gradual softening of the spectra just below a few hundred GeV, and a harder spectrum of He compared to that of protons. These newly discovered features may offer a clue to the origin of high-energy CRs. We use the Fermi Large Area Telescope observations of the γ-ray emission from Earth's limb for an indirect measurement of the local spectrum of CR protons in the energy range ∼90  GeV-6  TeV (derived from a photon energy range 15 GeV-1 TeV). Our analysis shows that single power law and broken power law spectra fit the data equally well and yield a proton spectrum with index 2.68±0.04 and 2.61±0.08 above ∼200  GeV, respectively.

  18. Genetic diversity in two Japanese flounder populations from China seas inferred using microsatellite markers and COI sequences

    NASA Astrophysics Data System (ADS)

    Xu, Dongdong; Li, Sanlei; Lou, Bao; Zhang, Yurong; Zhan, Wei; Shi, Huilai

    2012-07-01

    Japanese flounder is one of the most important commercial species in China; however, information on the genetic background of natural populations in China seas is scarce. The lack of genetic data has hampered fishery management and aquaculture development programs for this species. In the present study, we have analyzed the genetic diversity in natural populations of Japanese flounder sampled from the Yellow Sea (Qingdao population, QD) and East China Sea (Zhoushan population, ZS) using 10 polymorphic microsatellite loci and cytochrome c oxidase subunit I (COI) sequencing data. A total of 68 different alleles were observed over 10 microsatellite loci. The total number of alleles per locus ranged from 2 to 9, and the number of genotypes per locus ranged from 3 to 45. The observed heterozygosity and expected heterozygosity in QD were 0.733 and 0.779, respectively, and in ZS the heterozygosity values were 0.708 and 0.783, respectively. Significant departures from Hardy-Weinberg equilibrium were observed in 7 of the 10 microsatellite loci in each of the two populations. The COI sequencing analysis revealed 25 polymorphic sites and 15 haplotypes in the two populations. The haplotype diversity and nucleotide diversity in the QD population were 0.746±0.072 8 and 0.003 34±0.001 03 respectively, and in ZS population the genetic diversity values were 0.712±0.047 0 and 0.003 18±0.000 49, respectively. The microsatellite data ( F st =0.048 7, P <0.001) and mitochondrial DNA data ( F st =0.128, P <0.001) both revealed significant genetic differentiation between the two populations. The information on the genetic variation and differentiation in Japanese flounder obtained in this study could be used to set up suitable guidelines for the management and conservation of this species, as well as for managing artificial selection programs. In future studies, more geographically diverse stocks should be used to obtain a deeper understanding of the population structure of Japanese

  19. Overcoming the dichotomy between open and isolated populations using genomic data from a large European dataset

    PubMed Central

    Anagnostou, Paolo; Dominici, Valentina; Battaggia, Cinzia; Pagani, Luca; Vilar, Miguel; Wells, R. Spencer; Pettener, Davide; Sarno, Stefania; Boattini, Alessio; Francalacci, Paolo; Colonna, Vincenza; Vona, Giuseppe; Calò, Carla; Destro Bisol, Giovanni; Tofanelli, Sergio

    2017-01-01

    Human populations are often dichotomized into “isolated” and “open” categories using cultural and/or geographical barriers to gene flow as differential criteria. Although widespread, the use of these alternative categories could obscure further heterogeneity due to inter-population differences in effective size, growth rate, and timing or amount of gene flow. We compared intra and inter-population variation measures combining novel and literature data relative to 87,818 autosomal SNPs in 14 open populations and 10 geographic and/or linguistic European isolates. Patterns of intra-population diversity were found to vary considerably more among isolates, probably due to differential levels of drift and inbreeding. The relatively large effective size estimated for some population isolates challenges the generalized view that they originate from small founding groups. Principal component scores based on measures of intra-population variation of isolated and open populations were found to be distributed along a continuum, with an area of intersection between the two groups. Patterns of inter-population diversity were even closer, as we were able to detect some differences between population groups only for a few multidimensional scaling dimensions. Therefore, different lines of evidence suggest that dichotomizing human populations into open and isolated groups fails to capture the actual relations among their genomic features. PMID:28145502

  20. Overcoming the dichotomy between open and isolated populations using genomic data from a large European dataset.

    PubMed

    Anagnostou, Paolo; Dominici, Valentina; Battaggia, Cinzia; Pagani, Luca; Vilar, Miguel; Wells, R Spencer; Pettener, Davide; Sarno, Stefania; Boattini, Alessio; Francalacci, Paolo; Colonna, Vincenza; Vona, Giuseppe; Calò, Carla; Destro Bisol, Giovanni; Tofanelli, Sergio

    2017-02-01

    Human populations are often dichotomized into "isolated" and "open" categories using cultural and/or geographical barriers to gene flow as differential criteria. Although widespread, the use of these alternative categories could obscure further heterogeneity due to inter-population differences in effective size, growth rate, and timing or amount of gene flow. We compared intra and inter-population variation measures combining novel and literature data relative to 87,818 autosomal SNPs in 14 open populations and 10 geographic and/or linguistic European isolates. Patterns of intra-population diversity were found to vary considerably more among isolates, probably due to differential levels of drift and inbreeding. The relatively large effective size estimated for some population isolates challenges the generalized view that they originate from small founding groups. Principal component scores based on measures of intra-population variation of isolated and open populations were found to be distributed along a continuum, with an area of intersection between the two groups. Patterns of inter-population diversity were even closer, as we were able to detect some differences between population groups only for a few multidimensional scaling dimensions. Therefore, different lines of evidence suggest that dichotomizing human populations into open and isolated groups fails to capture the actual relations among their genomic features.

  1. Rapid declines of large mammal populations after the collapse of the Soviet Union.

    PubMed

    Bragina, Eugenia V; Ives, A R; Pidgeon, A M; Kuemmerle, T; Baskin, L M; Gubar, Y P; Piquer-Rodríguez, M; Keuler, N S; Petrosyan, V G; Radeloff, V C

    2015-06-01

    Anecdotal evidence suggests that socioeconomic shocks strongly affect wildlife populations, but quantitative evidence is sparse. The collapse of socialism in Russia in 1991 caused a major socioeconomic shock, including a sharp increase in poverty. We analyzed population trends of 8 large mammals in Russia from 1981 to 2010 (i.e., before and after the collapse). We hypothesized that the collapse would first cause population declines, primarily due to overexploitation, and then population increases due to adaptation of wildlife to new environments following the collapse. The long-term Database of the Russian Federal Agency of Game Mammal Monitoring, consisting of up to 50,000 transects that are monitored annually, provided an exceptional data set for investigating these population trends. Three species showed strong declines in population growth rates in the decade following the collapse, while grey wolf (Canis lupus) increased by more than 150%. After 2000 some trends reversed. For example, roe deer (Capreolus spp.) abundance in 2010 was the highest of any period in our study. Likely reasons for the population declines in the 1990s include poaching and the erosion of wildlife protection enforcement. The rapid increase of the grey wolf populations is likely due to the cessation of governmental population control. In general, the widespread declines in wildlife populations after the collapse of the Soviet Union highlight the magnitude of the effects that socioeconomic shocks can have on wildlife populations and the possible need for special conservation efforts during such times. © 2015 Society for Conservation Biology.

  2. Population genetic structure in farm and feral American mink (Neovison vison) inferred from RAD sequencing-generated single nucleotide polymorphisms.

    PubMed

    Thirstrup, J P; Ruiz-Gonzalez, A; Pujolar, J M; Larsen, P F; Jensen, J; Randi, E; Zalewski, A; Pertoldi, C

    2015-08-01

    Feral American mink populations (), derived from mink farms, are widespread in Europe. In this study we investigated genetic diversity and genetic differentiation between feral and farm mink using a panel of genetic markers (194 SNP) generated from RAD sequencing data. Sampling included a total of 211 individuals from 14 populations, 4 feral and 10 from farms, the latter including a total of 7 color types (Brown, Black, Mahogany, Sapphire, White, Pearl, and Silver). Our study revealed similar low levels of genetic diversity in both farm and feral mink. Results are consistent with small effective population size as a consequence of line selection in the farms and founder effects of a few escapees from the farms in feral populations. Moderately high genetic differentiation was found between farm and feral animals, suggesting a scenario in which wild populations were founded from farm escapes a few decades ago. Currently, escapes and gene flow are probably limited. Genetic differentiation was higher among farm color types than among farms, consistent with line selection using few individuals to create the lines. Finally, no indications of inbreeding were found in either farm or feral samples, with significant negative values found in most farm samples, showing farms are successful in avoiding inbreeding.

  3. Long-term landscape stability in southern Tibet inferred from the preservation of a large-scale bedrock peneplain

    NASA Astrophysics Data System (ADS)

    Strobl, M.; Hetzel, R.; Ding, L.; Zhang, L.

    2010-12-01

    Peneplains constitute a widespread and well developed geomorphic element on the Tibetan Plateau, nevertheless little is known about their formation and the subsequent landscape evolution. In southern Tibet, north of Nam Co (~31°20’N, 90°E), a particularly well-preserved peneplain occurs at an elevation of ~5,300 m in Cretaceous granitoids. The main planation surface has been gradually incised by small streams that formed additional small low-relief surfaces at lower elevations. Fluvial incision of the main peneplain has generated a local relief of up to ~700 m. The progressive incision has led to hillslope gradients that increase with decreasing elevation, i.e. from the main peneplain at ~5,300 m down to the current base level at ~4,600 m, as revealed by field observations and the analysis of digital elevation model (Strobl et al. in press). In order to quantify the landscape evolution of the peneplain region we determined local and catchment-wide erosion rates from the concentration of in situ-produced cosmogenic Be-10. Local erosion rates on the main peneplain and the low-relief bedrock surfaces at lower elevation range from 6 to 12 m/Ma and indicate that the geomorphic surfaces are stable over long periods of time. Spatially integrated erosion rates of small river systems that are incising and eroding headwards into the main peneplain are only slightly higher and range from 11 to 18 m/Ma. Even if river incision has proceeded at a rate that is 2-4 times higher than the catchment-wide erosion rates, i.e. at 30 to 60 m/Ma, it would take about 10 to 20 Ma to generate the local relief of ~700 m observed today. This demonstrates that the major peneplain is a very stable geomorphic element with a minimum age of 10 to 20 Ma and that the landscape in the region has barely been modified by erosion in the last millions of years. Strobl, M., Hetzel, R., Ding, L., Zhang, L., Hampel, A., (in press). Preservation of a large-scale bedrock peneplain suggests long

  4. Genetic evidence for long-term population decline in a savannah-dwelling primate: inferences from a hierarchical bayesian model.

    PubMed

    Storz, Jay F; Beaumont, Mark A; Alberts, Susan C

    2002-11-01

    The purpose of this study was to test for evidence that savannah baboons (Papio cynocephalus) underwent a population expansion in concert with a hypothesized expansion of African human and chimpanzee populations during the late Pleistocene. The rationale is that any type of environmental event sufficient to cause simultaneous population expansions in African humans and chimpanzees would also be expected to affect other codistributed mammals. To test for genetic evidence of population expansion or contraction, we performed a coalescent analysis of multilocus microsatellite data using a hierarchical Bayesian model. Markov chain Monte Carlo (MCMC) simulations were used to estimate the posterior probability density of demographic and genealogical parameters. The model was designed to allow interlocus variation in mutational and demographic parameters, which made it possible to detect aberrant patterns of variation at individual loci that could result from heterogeneity in mutational dynamics or from the effects of selection at linked sites. Results of the MCMC simulations were consistent with zero variance in demographic parameters among loci, but there was evidence for a 10- to 20-fold difference in mutation rate between the most slowly and most rapidly evolving loci. Results of the model provided strong evidence that savannah baboons have undergone a long-term historical decline in population size. The mode of the highest posterior density for the joint distribution of current and ancestral population size indicated a roughly eightfold contraction over the past 1,000 to 250,000 years. These results indicate that savannah baboons apparently did not share a common demographic history with other codistributed primate species.

  5. Colloquium paper: working toward a synthesis of archaeological, linguistic, and genetic data for inferring African population history.

    PubMed

    Scheinfeldt, Laura B; Soi, Sameer; Tishkoff, Sarah A

    2010-05-11

    Although Africa is the origin of modern humans, the pattern and distribution of genetic variation and correlations with cultural and linguistic diversity in Africa have been understudied. Recent advances in genomic technology, however, have led to genomewide studies of African samples. In this article, we discuss genetic variation in African populations contextualized with what is known about archaeological and linguistic variation. What emerges from this review is the importance of using independent lines of evidence in the interpretation of genetic and genomic data in the reconstruction of past population histories.

  6. Genetic variation of the greenhouse whitefly, Trialeurodes vaporariorum (Hemiptera: Aleyrodidae), among populations from Serbia and neighbouring countries, as inferred from COI sequence variability.

    PubMed

    Prijović, M; Skaljac, M; Drobnjaković, T; Zanić, K; Perić, P; Marčić, D; Puizina, J

    2014-06-01

    The greenhouse whitefly Trialeurodes vaporariorum Westwood, 1856 (Hemiptera: Aleyrodidae) is an invasive and highly polyphagous phloem-feeding pest of vegetables and ornamentals. Trialeurodes vaporariorum causes serious damage due to direct feeding and transmits several important plant viruses. Excessive use of insecticides has resulted in significantly reduced levels of susceptibility of various T. vaporariorum populations. To determine the genetic variability within and among populations of T. vaporariorum from Serbia and to explore their genetic relatedness with other T. vaporariorum populations, we analysed the mitochondrial cytochrome c oxidase I (COI) sequences of 16 populations from Serbia and six neighbouring countries: Montenegro (three populations), Macedonia (one population) and Croatia (two populations), for a total of 198 analysed specimens. A low overall level of sequence divergence and only five variable nucleotides and six haplotypes were found. The most frequent haplotype, H1, was identified in all Serbian populations and in all specimens from distant localities in Croatia and Macedonia. The COI sequence data that was retrieved from GenBank and the data from our study indicated that H1 is the most globally widespread T. vaporariorum haplotype. A lack of spatial genetic structure among the studied T. vaporariorum populations, as well as two demographic tests that we performed (Tajima's D value and Fu's Fs statistics), indicate a recent colonisation event and population growth. Phylogenetic analyses of the COI haplotypes in this study and other T. vaporariorum haplotypes that were retrieved from GenBank were performed using Bayesian inference and median-joining (MJ) network analysis. Two major haplogroups with only a single unique nucleotide difference were found: haplogroup 1 (containing the five Serbian haplotypes and those previously identified in India, China, the Netherlands, the United Kingdom, Morocco, Reunion and the USA) and haplogroup 3

  7. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing

    USDA-ARS?s Scientific Manuscript database

    Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RN...

  8. Inference on the Genetic Basis of Eye and Skin Color in an Admixed Population via Bayesian Linear Mixed Models.

    PubMed

    Lloyd-Jones, Luke R; Robinson, Matthew R; Moser, Gerhard; Zeng, Jian; Beleza, Sandra; Barsh, Gregory S; Tang, Hua; Visscher, Peter M

    2017-06-01

    Genetic association studies in admixed populations are underrepresented in the genomics literature, with a key concern for researchers being the adequate control of spurious associations due to population structure. Linear mixed models (LMMs) are well suited for genome-wide association studies (GWAS) because they account for both population stratification and cryptic relatedness and achieve increased statistical power by jointly modeling all genotyped markers. Additionally, Bayesian LMMs allow for more flexible assumptions about the underlying distribution of genetic effects, and can concurrently estimate the proportion of phenotypic variance explained by genetic markers. Using three recently published Bayesian LMMs, Bayes R, BSLMM, and BOLT-LMM, we investigate an existing data set on eye (n = 625) and skin (n = 684) color from Cape Verde, an island nation off West Africa that is home to individuals with a broad range of phenotypic values for eye and skin color due to the mix of West African and European ancestry. We use simulations to demonstrate the utility of Bayesian LMMs for mapping loci and studying the genetic architecture of quantitative traits in admixed populations. The Bayesian LMMs provide evidence for two new pigmentation loci: one for eye color (AHRR) and one for skin color (DDB1). Copyright © 2017 by the Genetics Society of America.

  9. Fine-scale genetic structure and inferences on population biology in the threatened Mediterranean red coral, Corallium rubrum.

    PubMed

    Ledoux, J-B; Garrabou, J; Bianchimani, O; Drap, P; Féral, J-P; Aurelle, D

    2010-10-01

    Identifying microevolutionary processes acting in populations of marine species with larval dispersal is a challenging but crucial task because of its conservation implications. In this context, recent improvements in the study of spatial genetic structure (SGS) are particularly promising because they allow accurate insights into the demographic and evolutionary processes at stake. Using an exhaustive sampling and a combination of image processing and population genetics, we highlighted significant SGS between colonies of Corallium rubrum over an area of half a square metre, which sheds light on a number of aspects of its population biology. Based on this SGS, we found the mean dispersal range within sites to be between 22.6 and 32.1 cm, suggesting that the surveyed area approximately corresponded to a breeding unit. We then conducted a kinship analysis, which revealed a complex half-sib family structure and allowed us to quantify the level of self-recruitment and to characterize aspects of the mating system of this species. Furthermore, significant temporal variations in allele frequencies were observed, suggesting low genetic drift. These results have important conservation implications for the red coral and further our understanding of the microevolutionary processes acting within populations of sessile marine species with a larval phase.

  10. Rangewide phylogeography in the greater horseshoe bat inferred from microsatellites: implications for population history, taxonomy and conservation.

    PubMed

    Rossiter, Stephen J; Benda, Petr; Dietz, Christian; Zhang, Shuyi; Jones, Gareth

    2007-11-01

    The distribution of genetic variability across a species' range can provide valuable insights into colonization history. To assess the relative importance of European and Asian refugia in shaping current levels of genetic variation in the greater horseshoe bats, we applied a microsatellite-based approach to data collected from 56 localities ranging from the UK to Japan. A decline in allelic richness from west Asia to the UK and analyses of F(ST) both imply a northwestward colonization across Europe. However, sharp discontinuities in gene frequencies within Europe and between the Balkans and west Asia (Syria/Russia) are consistent with suture zones following expansion from multiple refugia, and a lack of recent gene flow from Asia Minor. Together, these results suggest European populations originated from west Asia in the ancient past, and experienced a more recent range expansion since the Last Glacial Maximum. Current populations in central Europe appear to originate from the Balkans and those from west Europe from either Iberia and/or Italy. Comparisons of R(ST )and F(ST) suggest that stepwise mutation has contributed to differentiation between island and continental populations (France/UK and China/Japan) and also among distant samples. However, pairwise R(ST) values between distant populations appear to be unreliable, probably due to size homoplasy. Our findings also highlight two priorities for conservation. First, stronger genetic subdivision within the UK than across 4000 km of continental Eurasia is most likely the result of population fragmentation and highlights the need to maintain gene flow in this species. Second, deep splits within China and between Europe and China are indicative of cryptic taxonomic divisions which need further investigation.

  11. Large-scale control site selection for population monitoring: an example assessing Sage-grouse trends

    USGS Publications Warehouse

    Fedy, Bradley C.; O'Donnell, Michael; Bowen, Zachary H.

    2015-01-01

    Human impacts on wildlife populations are widespread and prolific and understanding wildlife responses to human impacts is a fundamental component of wildlife management. The first step to understanding wildlife responses is the documentation of changes in wildlife population parameters, such as population size. Meaningful assessment of population changes in potentially impacted sites requires the establishment of monitoring at similar, nonimpacted, control sites. However, it is often difficult to identify appropriate control sites in wildlife populations. We demonstrated use of Geographic Information System (GIS) data across large spatial scales to select biologically relevant control sites for population monitoring. Greater sage-grouse (Centrocercus urophasianus; hearafter, sage-grouse) are negatively affected by energy development, and monitoring of sage-grouse population within energy development areas is necessary to detect population-level responses. Weused population data (1995–2012) from an energy development area in Wyoming, USA, the Atlantic Rim Project Area (ARPA), and GIS data to identify control sites that were not impacted by energy development for population monitoring. Control sites were surrounded by similar habitat and were within similar climate areas to the ARPA. We developed nonlinear trend models for both the ARPA and control sites and compared long-term trends from the 2 areas. We found little difference between the ARPA and control sites trends over time. This research demonstrated an approach for control site selection across large landscapes and can be used as a template for similar impact-monitoring studies. It is important to note that identification of changes in population parameters between control and treatment sites is only the first step in understanding the mechanisms that underlie those changes. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.

  12. Challenges of cardiac image analysis in large-scale population-based studies.

    PubMed

    Medrano-Gracia, Pau; Cowan, Brett R; Suinesiaputra, Avan; Young, Alistair A

    2015-03-01

    Large-scale population-based imaging studies of preclinical and clinical heart disease are becoming possible due to the advent of standardized robust non-invasive imaging methods and infrastructure for big data analysis. This gives an exciting opportunity to gain new information about the development and progression of heart disease across population groups. However, the large amount of image data and prohibitive time required for image analysis present challenges for obtaining useful derived data from the images. Automated analysis tools for cardiac image analysis are only now becoming available. This paper reviews the challenges and possible solutions to the analysis of big imaging data in population studies. We also highlight the potential of recent large epidemiological studies using cardiac imaging to discover new knowledge on heart health and well-being.

  13. Genetic population structure and mobility of two nectar-feeding bats from Venezuelan deserts: inferences from mitochondrial DNA.

    PubMed

    Newton, Lyndsay R; Nassar, Jafet M; Fleming, Theodore H

    2003-11-01

    Glossophaga longirostris and Leptonycteris curasoae are nectar-feeding bats associated with arid zones in northern South America. Despite their close phylogenetic relationship, sympatric condition and niche similarities, morphological and ecological evidence suggest that these species differ in dispersal capabilities. Using mitochondrial DNA, we tested the hypothesis that these species exhibit different levels of population structure that are congruent with their particular movement capabilities. We sequenced a section of the control region of mtDNA for 41 G. longirostris and 42 L. curasoae from 11 zones in Venezuela. Population subdivision in G. longirostris (FST = 0.725) was considerably higher than in L. curasoae (FST = 0.167). L. curasoae individuals shared haplotypes at greater distances (812 km) than G. longirostris (592 km). Our results offer preliminary evidence for one of two possible scenarios, either greater mobility in L. curasoae or a higher degree of female philopatry in G. longirostris.

  14. Major histocompatibility complex variation in insular populations of the Egyptian vulture: inferences about the roles of genetic drift and selection.

    PubMed

    Agudo, Rosa; Alcaide, Miguel; Rico, Ciro; Lemus, Jesus A; Blanco, Guillermo; Hiraldo, Fernando; Donázar, Jose A

    2011-06-01

    Insular populations have attracted the attention of evolutionary biologists because of their morphological and ecological peculiarities with respect to their mainland counterparts. Founder effects and genetic drift are known to distribute neutral genetic variability in these demes. However, elucidating whether these evolutionary forces have also shaped adaptive variation is crucial to evaluate the real impact of reduced genetic variation in small populations. Genes of the major histocompatibility complex (MHC) are classical examples of evolutionarily relevant loci because of their well-known role in pathogen confrontation and clearance. In this study, we aim to disentangle the partial roles of genetic drift and natural selection in the spatial distribution of MHC variation in insular populations. To this end, we integrate the study of neutral (22 microsatellites and one mtDNA locus) and MHC class II variation in one mainland (Iberia) and two insular populations (Fuerteventura and Menorca) of the endangered Egyptian vulture (Neophron percnopterus). Overall, the distribution of the frequencies of individual MHC alleles (n=17 alleles from two class II B loci) does not significantly depart from neutral expectations, which indicates a prominent role for genetic drift over selection. However, our results point towards an interesting co-evolution of gene duplicates that maintains different pairs of divergent alleles in strong linkage disequilibrium on islands. We hypothesize that the co-evolution of genes may counteract the loss of genetic diversity in insular demes, maximize antigen recognition capabilities when gene diversity is reduced, and promote the co-segregation of the most efficient allele combinations to cope with local pathogen communities.

  15. Inferring local adaptation from QST-FST comparisons: neutral genetic and quantitative trait variation in European populations of great snipe.

    PubMed

    Saether, S A; Fiske, P; Kålås, J A; Kuresoo, A; Luigujõe, L; Piertney, S B; Sahlman, T; Höglund, J

    2007-07-01

    We applied a phenotypic QST (PST) vs. FST approach to study spatial variation in selection among great snipe (Gallinago media) populations in two regions of northern Europe. Morphological divergence between regions was high despite low differentiation in selectively neutral genetic markers, whereas populations within regions showed very little neutral divergence and trait differentiation. QST > FST was robust against altering assumptions about the additive genetic proportions of variance components. The homogenizing effect of gene flow (or a short time available for neutral divergence) has apparently been effectively counterbalanced by differential natural selection, although one trait showed some evidence of being under uniform stabilizing selection. Neutral markers can hence be misleading for identifying evolutionary significant units, and adopting the PST-FST approach might therefore be valuable when common garden experiments is not an option. We discuss the statistical difficulties of documenting uniform selection as opposed to divergent selection, and the need for estimating measurement error. Instead of only comparing overall QST and FST values, we advocate the use of partial matrix permutation tests to analyse pairwise QST differences among populations, while statistically controlling for neutral differentiation.

  16. Contrasting patterns of population subdivision and historical demography in three western Mediterranean lizard species inferred from mitochondrial DNA variation.

    PubMed

    Pinho, C; Harris, D J; Ferrand, N

    2007-03-01

    Pleistocene climatic oscillations were a major force shaping genetic variability in many taxa. We analyse the relative effects of the ice ages across a latitudinal gradient in the Western Mediterranean region, testing two main predictions: (i) species with historical distributions in northern latitudes should have experienced greater loss of suitable habitat, resulting in higher extinction of historical lineages than species distributed in southern latitudes, where the effects of the ice ages were not as drastic. This would be reflected in the observation of lower diversity and number of differentiated lineages in northern areas. (ii) a signature of demographic expansion following the climate amelioration should be obvious in northern species, whereas in the south evidence of long-term effective population size stability should be observed. We used as models three species of wall lizards (Podarcis bocagei, Podarcis carbonelli and Podarcis vaucheri) that replace each other along the study area. We investigated the patterns of mitochondrial DNA diversity and subdivision and obtained demographic parameter estimates for each species. Our results suggest that P. bocagei, the northernmost species, bears low genetic diversity, a shallow coalescent history and marks of a demographic expansion. In contrast, P. vaucheri, the species with a southernmost distribution, shows deeper coalescence events, complex geographical substructure and no evidence for population growth. The species with an intermediate distribution, P. carbonelli, shows average levels of diversity, substructure and population growth. Taken together, these results conform to our main predictions and are explained by a differential influence of the ice ages on distinct latitudes.

  17. Forensic potential of the STR DXYS156 in Mexican populations: inference of X-linked allele null.

    PubMed

    Torres-Rodríguez, M; Martínez-Cortes, G; Páez-Riberos, L A; Sandoval, L; Muñoz-Valle, J F; Ceballos-Quintal, J M; Pinto-Escalante, D; Rangel-Villalobos, H

    2006-01-01

    The pentanucleotide STR (TAAAA)n DXYS156 offers advantages for genetic identity testing. In addition to establish the gender, DXYS156 expands the DNA profile and is able to indicate the possible geographic origin of the individual. We analyzed DXYS156 in 757 individuals of both sexes from Mexican populations. We studied the cosmopolitan Mestizo population and six Mexican ethnic groups: Tarahumaras, Purépechas, Nahuas, Mayas, Huicholes and Mezcala Indians. The six shorter (4-10) and the three larger alleles (11-13) were specific for the X and Y-chromosomes, respectively. A random distribution of alleles into genotypes was observed in males and females from each population. We estimated the power of exclusion for paternity testing according to the son's gender, and the power of discrimination in forensic casework. In addition, we detected a relatively high frequency of an X-linked allele null, principally in Mexican-Mestizos (3.6%), which must be considered when DXYS156 be applied for identification purposes.

  18. Complex postglacial recolonization inferred from population genetic structure of mottled sculpin Cottus bairdii in tributaries of eastern Lake Michigan, U.S.A.

    PubMed

    Homola, J J; Ruetz, C R; Kohler, S L; Thum, R A

    2016-11-01

    This study used analyses of the genetic structure of a non-game fish species, the mottled sculpin Cottus bairdii to hypothesize probable recolonization routes used by cottids and possibly other Laurentian Great Lakes fishes following glacial recession. Based on samples from 16 small streams in five major Lake Michigan, U.S.A., tributary basins, significant interpopulation differentiation was documented (overall FST = 0·235). Differentiation was complex, however, with unexpectedly high genetic similarity among basins as well as occasionally strong differentiation within basins, despite relatively close geographic proximity of populations. Genetic dissimilarities were identified between eastern and western populations within river basins, with similarities existing between eastern and western populations across basins. Given such patterns, recolonization is hypothesized to have occurred on three occasions from more than one glacial refugium, with a secondary vicariant event resulting from reduction in the water level of ancestral Lake Michigan. By studying the phylogeography of a small, non-game fish species, this study provides insight into recolonization dynamics of the region that could be difficult to infer from game species that are often broadly dispersed by humans.

  19. Drawing the history of the Hutterite population on a genetic landscape: inference from Y-chromosome and mtDNA genotypes

    PubMed Central

    Pichler, Irene; Fuchsberger, Christian; Platzer, Christa; Çalişkan, Minal; Marroni, Fabio; Pramstaller, Peter P; Ober, Carole

    2010-01-01

    Although the North American Hutterites trace their origins to South Tyrol, no attempts have been made to examine the genetic migration history of the Hutterites before emigrating to the United States in the 1870s. To investigate this, we stu