Science.gov

Sample records for inferring genomic structural

  1. Structure-based inference of molecular functions of proteins of unknown function from Berkeley Structural Genomics Center

    SciTech Connect

    Kim, Sung-Hou; Shin, Dong Hae; Hou, Jingtong; Chandonia, John-Marc; Das, Debanu; Choi, In-Geol; Kim, Rosalind; Kim, Sung-Hou

    2007-09-02

    Advances in sequence genomics have resulted in an accumulation of a huge number of protein sequences derived from genome sequences. However, the functions of a large portion of them cannot be inferred based on the current methods of sequence homology detection to proteins of known functions. Three-dimensional structure can have an important impact in providing inference of molecular function (physical and chemical function) of a protein of unknown function. Structural genomics centers worldwide have been determining many 3-D structures of the proteins of unknown functions, and possible molecular functions of them have been inferred based on their structures. Combined with bioinformatics and enzymatic assay tools, the successful acceleration of the process of protein structure determination through high throughput pipelines enables the rapid functional annotation of a large fraction of hypothetical proteins. We present a brief summary of the process we used at the Berkeley Structural Genomics Center to infer molecular functions of proteins of unknown function.

  2. Structure-based inference of molecular functions of proteins of unknown function from Berkeley Structural Genomics Center.

    PubMed

    Shin, Dong Hae; Hou, Jingtong; Chandonia, John-Marc; Das, Debanu; Choi, In-Geol; Kim, Rosalind; Kim, Sung-Hou

    2007-09-01

    Advances in sequence genomics have resulted in an accumulation of a huge number of protein sequences derived from genome sequences. However, the functions of a large portion of them cannot be inferred based on the current methods of sequence homology detection to proteins of known functions. Three-dimensional structure can have an important impact in providing inference of molecular function (physical and chemical function) of a protein of unknown function. Structural genomics centers worldwide have been determining many 3-D structures of the proteins of unknown functions, and possible molecular functions of them have been inferred based on their structures. Combined with bioinformatics and enzymatic assay tools, the successful acceleration of the process of protein structure determination through high throughput pipelines enables the rapid functional annotation of a large fraction of hypothetical proteins. We present a brief summary of the process we used at the Berkeley Structural Genomics Center to infer molecular functions of proteins of unknown function.

  3. Gene network inference via structural equation modeling in genetical genomics experiments.

    PubMed

    Liu, Bing; de la Fuente, Alberto; Hoeschele, Ina

    2008-03-01

    Our goal is gene network inference in genetical genomics or systems genetics experiments. For species where sequence information is available, we first perform expression quantitative trait locus (eQTL) mapping by jointly utilizing cis-, cis-trans-, and trans-regulation. After using local structural models to identify regulator-target pairs for each eQTL, we construct an encompassing directed network (EDN) by assembling all retained regulator-target relationships. The EDN has nodes corresponding to expressed genes and eQTL and directed edges from eQTL to cis-regulated target genes, from cis-regulated genes to cis-trans-regulated target genes, from trans-regulator genes to target genes, and from trans-eQTL to target genes. For network inference within the strongly constrained search space defined by the EDN, we propose structural equation modeling (SEM), because it can model cyclic networks and the EDN indeed contains feedback relationships. On the basis of a factorization of the likelihood and the constrained search space, our SEM algorithm infers networks involving several hundred genes and eQTL. Structure inference is based on a penalized likelihood ratio and an adaptation of Occam's window model selection. The SEM algorithm was evaluated using data simulated with nonlinear ordinary differential equations and known cyclic network topologies and was applied to a real yeast data set.

  4. Inferring gene structures in genomic sequences using pattern recognition and expressed sequence tags.

    PubMed

    Xu, Y; Mural, R J; Uberbacher, E C

    1997-01-01

    Computational methods for gene identification in genomic sequences typically have two phases: coding region prediction and gene parsing. While there are many effective methods for predicting coding regions (exons), parsing the predicted exons into proper gene structures, to a large extent, remains an unsolved problem. This paper presents an algorithm for inferring gene structures from predicted exon candidates, based on Expressed Sequence Tags (ESTs) and biological intuition/rules. The algorithm first finds all the related ESTs in the EST database (dbEST) for each predicted exon, and infers the boundaries of one or a series of genes based on the available EST information and biological rules. Then it constructs gene models within each pair of gene boundaries, that are most consistent with the EST information. By exploiting EST information and biological rules, the algorithm can (1) model complicated multiple gene structures, including embedded genes, (2) identify falsely-predicted exons and locate missed exons, and (3) make more accurate exon boundary predictions. The algorithm has been implemented and tested on long genomic sequences with a number of genes. Test results show that very accurate (predicted) gene models can be expected when related ESTs exist for the predicted exons.

  5. Inferring gene structures in genomic sequences using pattern recognition and expressed sequence tags

    SciTech Connect

    Xu, Y.; Mural, R.; Uberbacher, E.

    1997-02-01

    Computational methods for gene identification in genomic sequences typically have two phases: coding region prediction and gene parsing. While there are many effective methods for predicting coding regions (exons), parsing the predicted exons into proper gene structures, to a large extent, remains an unsolved problem. This paper presents an algorithm for inferring gene structures from predicted exon candidates, based on Expressed Sequence Tags (ESTs) and biological intuition/rules. The algorithm first finds all the related ESTs in the EST database (dbEST) for each predicted exon, and infers the boundaries of one or a series of genes based on the available EST information and biological rules. Then it constructs gene models within each pair of gene boundaries, that are most consistent with the EST information. By exploiting EST information and biological rules, the algorithm can (1) model complicated multiple gene structures, including embedded genes, (2) identify falsely-predicted exons and locate missed exons, and (3) make more accurate exon boundary predictions. The algorithm has been implemented and tested on long genomic sequences with a number of genes. Test results show that very accurate (predicted) gene models can be expected when related ESTs exist for the predicted exons.

  6. Adaptive evolution of chloroplast genome structure inferred using a parametric bootstrap approach

    PubMed Central

    Cui, Liying; Leebens-Mack, Jim; Wang, Li-San; Tang, Jijun; Rymarquis, Linda; Stern, David B; dePamphilis, Claude W

    2006-01-01

    Background Genome rearrangements influence gene order and configuration of gene clusters in all genomes. Most land plant chloroplast DNAs (cpDNAs) share a highly conserved gene content and with notable exceptions, a largely co-linear gene order. Conserved gene orders may reflect a slow intrinsic rate of neutral chromosomal rearrangements, or selective constraint. It is unknown to what extent observed changes in gene order are random or adaptive. We investigate the influence of natural selection on gene order in association with increased rate of chromosomal rearrangement. We use a novel parametric bootstrap approach to test if directional selection is responsible for the clustering of functionally related genes observed in the highly rearranged chloroplast genome of the unicellular green alga Chlamydomonas reinhardtii, relative to ancestral chloroplast genomes. Results Ancestral gene orders were inferred and then subjected to simulated rearrangement events under the random breakage model with varying ratios of inversions and transpositions. We found that adjacent chloroplast genes in C. reinhardtii were located on the same strand much more frequently than in simulated genomes that were generated under a random rearrangement processes (increased sidedness; p < 0.0001). In addition, functionally related genes were found to be more clustered than those evolved under random rearrangements (p < 0.0001). We report evidence of co-transcription of neighboring genes, which may be responsible for the observed gene clusters in C. reinhardtii cpDNA. Conclusion Simulations and experimental evidence suggest that both selective maintenance and directional selection for gene clusters are determinants of chloroplast gene order. PMID:16469102

  7. Remarkable variation in maize genome structure inferred from haplotype diversity at the bz locus

    PubMed Central

    Wang, Qinghua; Dooner, Hugo K.

    2006-01-01

    Maize is probably the most diverse of all crop species. Unexpectedly large differences among haplotypes were first revealed in a comparison of the bz genomic regions of two different inbred lines, McC and B73. Retrotransposon clusters, which comprise most of the repetitive DNA in maize, varied markedly in makeup, and location relative to the genes in the region and genic sequences, later shown to be carried by two helitron transposons, also differed between the inbreds. Thus, the allelic bz regions of these Corn Belt inbreds shared only a minority of the total sequence. To investigate further the variation caused by retrotransposons, helitrons, and other insertions, we have analyzed the organization of the bz genomic region in five additional cultivars selected because of their geographic and genetic diversity: the inbreds A188, CML258, and I137TN, and the land races Coroico and NalTel. This vertical comparison has revealed the existence of several new helitrons, new retrotransposons, members of every superfamily of DNA transposons, numerous miniature elements, and novel insertions flanked at either end by TA repeats, which we call TAFTs (TA-flanked transposons). The extent of variation in the region is remarkable. In pairwise comparisons of eight bz haplotypes, the percentage of shared sequences ranges from 25% to 84%. Chimeric haplotypes were identified that combine retrotransposon clusters found in different haplotypes. We propose that recombination in the common gene space greatly amplifies the variability produced by the retrotransposition explosion in the maize ancestry, creating the heterogeneity in genome organization found in modern maize. PMID:17101975

  8. Remarkable variation in maize genome structure inferred from haplotype diversity at the bz locus.

    PubMed

    Wang, Qinghua; Dooner, Hugo K

    2006-11-21

    Maize is probably the most diverse of all crop species. Unexpectedly large differences among haplotypes were first revealed in a comparison of the bz genomic regions of two different inbred lines, McC and B73. Retrotransposon clusters, which comprise most of the repetitive DNA in maize, varied markedly in makeup, and location relative to the genes in the region and genic sequences, later shown to be carried by two helitron transposons, also differed between the inbreds. Thus, the allelic bz regions of these Corn Belt inbreds shared only a minority of the total sequence. To investigate further the variation caused by retrotransposons, helitrons, and other insertions, we have analyzed the organization of the bz genomic region in five additional cultivars selected because of their geographic and genetic diversity: the inbreds A188, CML258, and I137TN, and the land races Coroico and NalTel. This vertical comparison has revealed the existence of several new helitrons, new retrotransposons, members of every superfamily of DNA transposons, numerous miniature elements, and novel insertions flanked at either end by TA repeats, which we call TAFTs (TA-flanked transposons). The extent of variation in the region is remarkable. In pairwise comparisons of eight bz haplotypes, the percentage of shared sequences ranges from 25% to 84%. Chimeric haplotypes were identified that combine retrotransposon clusters found in different haplotypes. We propose that recombination in the common gene space greatly amplifies the variability produced by the retrotransposition explosion in the maize ancestry, creating the heterogeneity in genome organization found in modern maize.

  9. Structure, expression profile and phylogenetic inference of chalcone isomerase-like genes from the narrow-leafed lupin (Lupinus angustifolius L.) genome

    PubMed Central

    Przysiecka, Łucja; Książkiewicz, Michał; Wolko, Bogdan; Naganowska, Barbara

    2015-01-01

    Lupins, like other legumes, have a unique biosynthesis scheme of 5-deoxy-type flavonoids and isoflavonoids. A key enzyme in this pathway is chalcone isomerase (CHI), a member of CHI-fold protein family, encompassing subfamilies of CHI1, CHI2, CHI-like (CHIL), and fatty acid-binding (FAP) proteins. Here, two Lupinus angustifolius (narrow-leafed lupin) CHILs, LangCHIL1 and LangCHIL2, were identified and characterized using DNA fingerprinting, cytogenetic and linkage mapping, sequencing and expression profiling. Clones carrying CHIL sequences were assembled into two contigs. Full gene sequences were obtained from these contigs, and mapped in two L. angustifolius linkage groups by gene-specific markers. Bacterial artificial chromosome fluorescence in situ hybridization approach confirmed the localization of two LangCHIL genes in distinct chromosomes. The expression profiles of both LangCHIL isoforms were very similar. The highest level of transcription was in the roots of the third week of plant growth; thereafter, expression declined. The expression of both LangCHIL genes in leaves and stems was similar and low. Comparative mapping to reference legume genome sequences revealed strong syntenic links; however, LangCHIL2 contig had a much more conserved structure than LangCHIL1. LangCHIL2 is assumed to be an ancestor gene, whereas LangCHIL1 probably appeared as a result of duplication. As both copies are transcriptionally active, questions arise concerning their hypothetical functional divergence. Screening of the narrow-leafed lupin genome and transcriptome with CHI-fold protein sequences, followed by Bayesian inference of phylogeny and cross-genera synteny survey, identified representatives of all but one (CHI1) main subfamilies. They are as follows: two copies of CHI2, FAPa2 and CHIL, and single copies of FAPb and FAPa1. Duplicated genes are remnants of whole genome duplication which is assumed to have occurred after the divergence of Lupinus, Arachis, and Glycine

  10. Inferring Heterozygosity from Ancient and Low Coverage Genomes

    PubMed Central

    Kousathanas, Athanasios; Leuenberger, Christoph; Link, Vivian; Sell, Christian; Burger, Joachim; Wegmann, Daniel

    2017-01-01

    While genetic diversity can be quantified accurately from high coverage sequencing data, it is often desirable to obtain such estimates from data with low coverage, either to save costs or because of low DNA quality, as is observed for ancient samples. Here, we introduce a method to accurately infer heterozygosity probabilistically from sequences with average coverage <1× of a single individual. The method relaxes the infinite sites assumption of previous methods, does not require a reference sequence, except for the initial alignment of the sequencing data, and takes into account both variable sequencing errors and potential postmortem damage. It is thus also applicable to nonmodel organisms and ancient genomes. Since error rates as reported by sequencing machines are generally distorted and require recalibration, we also introduce a method to accurately infer recalibration parameters in the presence of postmortem damage. This method does not require knowledge about the underlying genome sequence, but instead works with haploid data (e.g., from the X-chromosome from mammalian males) and integrates over the unknown genotypes. Using extensive simulations we show that a few megabasepairs of haploid data are sufficient for accurate recalibration, even at average coverages as low as 1×. At similar coverages, our method also produces very accurate estimates of heterozygosity down to 10−4 within windows of about 1 Mbp. We further illustrate the usefulness of our approach by inferring genome-wide patterns of diversity for several ancient human samples, and we found that 3000–5000-year-old samples showed diversity patterns comparable to those of modern humans. In contrast, two European hunter-gatherer samples exhibited not only considerably lower levels of diversity than modern samples, but also highly distinct distributions of diversity along their genomes. Interestingly, these distributions were also very different between the two samples, supporting earlier

  11. Genome-Wide Inference of Ancestral Recombination Graphs

    PubMed Central

    Rasmussen, Matthew D.; Hubisz, Melissa J.; Gronau, Ilan; Siepel, Adam

    2014-01-01

    The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the “ancestral recombination graph” (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of chromosomes conditional on an ARG of chromosomes, an operation we call “threading.” Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps. PMID:24831947

  12. Freshwater bacterial lifestyles inferred from comparative genomics.

    PubMed

    Livermore, Joshua A; Emrich, Scott J; Tan, John; Jones, Stuart E

    2014-03-01

    While micro-organisms actively mediate and participate in freshwater ecosystem services, we know little about freshwater microbial genetic diversity. Genome sequences are available for many bacteria from the human microbiome and the ocean (over 800 and 200, respectively), but only two freshwater genomes are currently available: the streamlined genomes of Polynucleobacter necessarius ssp. asymbioticus and the Actinobacterium AcI-B1. Here, we sequenced and analysed draft genomes of eight phylogentically diverse freshwater bacteria exhibiting a range of lifestyle characteristics. Comparative genomics of these bacteria reveals putative freshwater bacterial lifestyles based on differences in predicted growth rate, capability to respond to environmental stimuli and diversity of useable carbon substrates. Our conceptual model based on these genomic characteristics provides a foundation on which further ecophysiological and genomic studies can be built. In addition, these genomes greatly expand the diversity of existing genomic context for future studies on the ecology and genetics of freshwater bacteria.

  13. Genetic Network Inference Using Hierarchical Structure

    PubMed Central

    Kimura, Shuhei; Tokuhisa, Masato; Okada-Hatakeyama, Mariko

    2016-01-01

    Many methods for inferring genetic networks have been proposed, but the regulations they infer often include false-positives. Several researchers have attempted to reduce these erroneous regulations by proposing the use of a priori knowledge about the properties of genetic networks such as their sparseness, scale-free structure, and so on. This study focuses on another piece of a priori knowledge, namely, that biochemical networks exhibit hierarchical structures. Based on this idea, we propose an inference approach that uses the hierarchical structure in a target genetic network. To obtain a reasonable hierarchical structure, the first step of the proposed approach is to infer multiple genetic networks from the observed gene expression data. We take this step using an existing method that combines a genetic network inference method with a bootstrap method. The next step is to extract a hierarchical structure from the inferred networks that is consistent with most of the networks. Third, we use the hierarchical structure obtained to assign confidence values to all candidate regulations. Numerical experiments are also performed to demonstrate the effectiveness of using the hierarchical structure in the genetic network inference. The improvement accomplished by the use of the hierarchical structure is small. However, the hierarchical structure could be used to improve the performances of many existing inference methods. PMID:26941653

  14. Inference of self-regulated transcriptional networks by comparative genomics.

    PubMed

    Cornish, Joseph P; Matthews, Fialelei; Thomas, Julien R; Erill, Ivan

    2012-01-01

    The assumption of basic properties, like self-regulation, in simple transcriptional regulatory networks can be exploited to infer regulatory motifs from the growing amounts of genomic and meta-genomic data. These motifs can in principle be used to elucidate the nature and scope of transcriptional networks through comparative genomics. Here we assess the feasibility of this approach using the SOS regulatory network of Gram-positive bacteria as a test case. Using experimentally validated data, we show that the known regulatory motif can be inferred through the assumption of self-regulation. Furthermore, the inferred motif provides a more robust search pattern for comparative genomics than the experimental motifs defined in reference organisms. We take advantage of this robustness to generate a functional map of the SOS response in Gram-positive bacteria. Our results reveal definite differences in the composition of the LexA regulon between Firmicutes and Actinobacteria, and confirm that regulation of cell-division inhibition is a widespread characteristic of this network among Gram-positive bacteria.

  15. SOP for pathway inference in Integrated Microbial Genomes (IMG).

    PubMed

    Anderson, Iain; Chen, Amy; Markowitz, Victor; Kyrpides, Nikos; Ivanova, Natalia

    2011-12-31

    One of the most important aspects of genomic analysis is the prediction of which pathways, both metabolic and non-metabolic, are present in an organism. In IMG, this is carried out by the assignment of IMG terms, which are organized into IMG pathways. Based on manual and automatic assignment of IMG terms, the presence or absence of IMG pathways is automatically inferred. The three categories of pathway assertion are asserted (likely present), not asserted (likely absent), and unknown. In the unknown category, at least one term necessary for the pathway is missing, but an ortholog in another organism has the corresponding term assigned to it. Automatic pathway inference is an important initial step in genome analysis.

  16. Genomic inferences from Afrotheria and the evolution of elephants.

    PubMed

    Roca, Alfred L; O'Brien, Stephen J

    2005-12-01

    Recent genetic studies have established that African forest and savanna elephants are distinct species with dissociated cytonuclear genomic patterns, and have identified Asian elephants from Borneo and Sumatra as conservation priorities. Representative of Afrotheria, a superordinal clade encompassing six eutherian orders, the African savanna elephant was among the first mammals chosen for whole-genome sequencing to provide a comparative understanding of the human genome. Elephants have large and complex brains and display advanced levels of social structure, communication, learning and intelligence. The elephant genome sequence might prove useful for comparative genomic studies of these advanced traits, which have appeared independently in only three mammalian orders: primates, cetaceans and proboscideans.

  17. Inferring Strain Mixture within Clinical Plasmodium falciparum Isolates from Genomic Sequence Data

    PubMed Central

    O’Brien, John D.; Amenga-Etego, Lucas

    2016-01-01

    We present a rigorous statistical model that infers the structure of P. falciparum mixtures—including the number of strains present, their proportion within the samples, and the amount of unexplained mixture—using whole genome sequence (WGS) data. Applied to simulation data, artificial laboratory mixtures, and field samples, the model provides reasonable inference with as few as 10 reads or 50 SNPs and works efficiently even with much larger data sets. Source code and example data for the model are provided in an open source fashion. We discuss the possible uses of this model as a window into within-host selection for clinical and epidemiological studies. PMID:27362949

  18. Population Genetic Inference from Personal Genome Data: Impact of Ancestry and Admixture on Human Genomic Variation

    PubMed Central

    Kidd, Jeffrey M.; Gravel, Simon; Byrnes, Jake; Moreno-Estrada, Andres; Musharoff, Shaila; Bryc, Katarzyna; Degenhardt, Jeremiah D.; Brisbin, Abra; Sheth, Vrunda; Chen, Rong; McLaughlin, Stephen F.; Peckham, Heather E.; Omberg, Larsson; Bormann Chung, Christina A.; Stanley, Sarah; Pearlstein, Kevin; Levandowsky, Elizabeth; Acevedo-Acevedo, Suehelay; Auton, Adam; Keinan, Alon; Acuña-Alonzo, Victor; Barquera-Lozano, Rodrigo; Canizales-Quinteros, Samuel; Eng, Celeste; Burchard, Esteban G.; Russell, Archie; Reynolds, Andy; Clark, Andrew G.; Reese, Martin G.; Lincoln, Stephen E.; Butte, Atul J.; De La Vega, Francisco M.; Bustamante, Carlos D.

    2012-01-01

    Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from 11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas—70% of the European ancestry in today’s African Americans dates back to European gene flow happening only 7–8 generations ago. PMID:23040495

  19. Genomic inference of the metabolism of cosmopolitan subsurface Archaea, Hadesarchaea.

    PubMed

    Baker, Brett J; Saw, Jimmy H; Lind, Anders E; Lazar, Cassandre Sara; Hinrichs, Kai-Uwe; Teske, Andreas P; Ettema, Thijs J G

    2016-02-15

    The subsurface biosphere is largely unexplored and contains a broad diversity of uncultured microbes(1). Despite being one of the few prokaryotic lineages that is cosmopolitan in both the terrestrial and marine subsurface(2-4), the physiological and ecological roles of SAGMEG (South-African Gold Mine Miscellaneous Euryarchaeal Group) Archaea are unknown. Here, we report the metabolic capabilities of this enigmatic group as inferred from genomic reconstructions. Four high-quality (63-90% complete) genomes were obtained from White Oak River estuary and Yellowstone National Park hot spring sediment metagenomes. Phylogenomic analyses place SAGMEG Archaea as a deeply rooting sister clade of the Thermococci, leading us to propose the name Hadesarchaea for this new Archaeal class. With an estimated genome size of around 1.5 Mbp, the genomes of Hadesarchaea are distinctly streamlined, yet metabolically versatile. They share several physiological mechanisms with strict anaerobic Euryarchaeota. Several metabolic characteristics make them successful in the subsurface, including genes involved in CO and H2 oxidation (or H2 production), with potential coupling to nitrite reduction to ammonia (DNRA). This first glimpse into the metabolic capabilities of these cosmopolitan Archaea suggests they are mediating key geochemical processes and are specialized for survival in the subsurface biosphere.

  20. The aggregate site frequency spectrum for comparative population genomic inference.

    PubMed

    Xue, Alexander T; Hickerson, Michael J

    2015-12-01

    Understanding how assemblages of species responded to past climate change is a central goal of comparative phylogeography and comparative population genomics, an endeavour that has increasing potential to integrate with community ecology. New sequencing technology now provides the potential to perform complex demographic inference at unprecedented resolution across assemblages of nonmodel species. To this end, we introduce the aggregate site frequency spectrum (aSFS), an expansion of the site frequency spectrum to use single nucleotide polymorphism (SNP) data sets collected from multiple, co-distributed species for assemblage-level demographic inference. We describe how the aSFS is constructed over an arbitrary number of independent population samples and then demonstrate how the aSFS can differentiate various multispecies demographic histories under a wide range of sampling configurations while allowing effective population sizes and expansion magnitudes to vary independently. We subsequently couple the aSFS with a hierarchical approximate Bayesian computation (hABC) framework to estimate degree of temporal synchronicity in expansion times across taxa, including an empirical demonstration with a data set consisting of five populations of the threespine stickleback (Gasterosteus aculeatus). Corroborating what is generally understood about the recent postglacial origins of these populations, the joint aSFS/hABC analysis strongly suggests that the stickleback data are most consistent with synchronous expansion after the Last Glacial Maximum (posterior probability = 0.99). The aSFS will have general application for multilevel statistical frameworks to test models involving assemblages and/or communities, and as large-scale SNP data from nonmodel species become routine, the aSFS expands the potential for powerful next-generation comparative population genomic inference.

  1. Neural Circuit Inference from Function to Structure.

    PubMed

    Real, Esteban; Asari, Hiroki; Gollisch, Tim; Meister, Markus

    2017-01-23

    Advances in technology are opening new windows on the structural connectivity and functional dynamics of brain circuits. Quantitative frameworks are needed that integrate these data from anatomy and physiology. Here, we present a modeling approach that creates such a link. The goal is to infer the structure of a neural circuit from sparse neural recordings, using partial knowledge of its anatomy as a regularizing constraint. We recorded visual responses from the output neurons of the retina, the ganglion cells. We then generated a systematic sequence of circuit models that represents retinal neurons and connections and fitted them to the experimental data. The optimal models faithfully recapitulated the ganglion cell outputs. More importantly, they made predictions about dynamics and connectivity among unobserved neurons internal to the circuit, and these were subsequently confirmed by experiment. This circuit inference framework promises to facilitate the integration and understanding of big data in neuroscience.

  2. Improved genome inference in the MHC using a population reference graph.

    PubMed

    Dilthey, Alexander; Cox, Charles; Iqbal, Zamin; Nelson, Matthew R; McVean, Gil

    2015-06-01

    Although much is known about human genetic variation, such information is typically ignored in assembling new genomes. Instead, reads are mapped to a single reference, which can lead to poor characterization of regions of high sequence or structural diversity. We introduce a population reference graph, which combines multiple reference sequences and catalogs of variation. The genomes of new samples are reconstructed as paths through the graph using an efficient hidden Markov model, allowing for recombination between different haplotypes and additional variants. By applying the method to the 4.5-Mb extended MHC region on human chromosome 6, combining 8 assembled haplotypes, the sequences of known classical HLA alleles and 87,640 SNP variants from the 1000 Genomes Project, we demonstrate using simulations, SNP genotyping, and short-read and long-read data how the method improves the accuracy of genome inference and identified regions where the current set of reference sequences is substantially incomplete.

  3. Nonparametric inference of network structure and dynamics

    NASA Astrophysics Data System (ADS)

    Peixoto, Tiago P.

    The network structure of complex systems determine their function and serve as evidence for the evolutionary mechanisms that lie behind them. Despite considerable effort in recent years, it remains an open challenge to formulate general descriptions of the large-scale structure of network systems, and how to reliably extract such information from data. Although many approaches have been proposed, few methods attempt to gauge the statistical significance of the uncovered structures, and hence the majority cannot reliably separate actual structure from stochastic fluctuations. Due to the sheer size and high-dimensionality of many networks, this represents a major limitation that prevents meaningful interpretations of the results obtained with such nonstatistical methods. In this talk, I will show how these issues can be tackled in a principled and efficient fashion by formulating appropriate generative models of network structure that can have their parameters inferred from data. By employing a Bayesian description of such models, the inference can be performed in a nonparametric fashion, that does not require any a priori knowledge or ad hoc assumptions about the data. I will show how this approach can be used to perform model comparison, and how hierarchical models yield the most appropriate trade-off between model complexity and quality of fit based on the statistical evidence present in the data. I will also show how this general approach can be elegantly extended to networks with edge attributes, that are embedded in latent spaces, and that change in time. The latter is obtained via a fully dynamic generative network model, based on arbitrary-order Markov chains, that can also be inferred in a nonparametric fashion. Throughout the talk I will illustrate the application of the methods with many empirical networks such as the internet at the autonomous systems level, the global airport network, the network of actors and films, social networks, citations among

  4. How to infer relative fitness from a sample of genomic sequences.

    PubMed

    Dayarian, Adel; Shraiman, Boris I

    2014-07-01

    Mounting evidence suggests that natural populations can harbor extensive fitness diversity with numerous genomic loci under selection. It is also known that genealogical trees for populations under selection are quantifiably different from those expected under neutral evolution and described statistically by Kingman's coalescent. While differences in the statistical structure of genealogies have long been used as a test for the presence of selection, the full extent of the information that they contain has not been exploited. Here we demonstrate that the shape of the reconstructed genealogical tree for a moderately large number of random genomic samples taken from a fitness diverse, but otherwise unstructured, asexual population can be used to predict the relative fitness of individuals within the sample. To achieve this we define a heuristic algorithm, which we test in silico, using simulations of a Wright-Fisher model for a realistic range of mutation rates and selection strength. Our inferred fitness ranking is based on a linear discriminator that identifies rapidly coalescing lineages in the reconstructed tree. Inferred fitness ranking correlates strongly with actual fitness, with a genome in the top 10% ranked being in the top 20% fittest with false discovery rate of 0.1-0.3, depending on the mutation/selection parameters. The ranking also enables us to predict the genotypes that future populations inherit from the present one. While the inference accuracy increases monotonically with sample size, samples of 200 nearly saturate the performance. We propose that our approach can be used for inferring relative fitness of genomes obtained in single-cell sequencing of tumors and in monitoring viral outbreaks.

  5. How to Infer Relative Fitness from a Sample of Genomic Sequences

    PubMed Central

    Dayarian, Adel; Shraiman, Boris I.

    2014-01-01

    Mounting evidence suggests that natural populations can harbor extensive fitness diversity with numerous genomic loci under selection. It is also known that genealogical trees for populations under selection are quantifiably different from those expected under neutral evolution and described statistically by Kingman’s coalescent. While differences in the statistical structure of genealogies have long been used as a test for the presence of selection, the full extent of the information that they contain has not been exploited. Here we demonstrate that the shape of the reconstructed genealogical tree for a moderately large number of random genomic samples taken from a fitness diverse, but otherwise unstructured, asexual population can be used to predict the relative fitness of individuals within the sample. To achieve this we define a heuristic algorithm, which we test in silico, using simulations of a Wright–Fisher model for a realistic range of mutation rates and selection strength. Our inferred fitness ranking is based on a linear discriminator that identifies rapidly coalescing lineages in the reconstructed tree. Inferred fitness ranking correlates strongly with actual fitness, with a genome in the top 10% ranked being in the top 20% fittest with false discovery rate of 0.1–0.3, depending on the mutation/selection parameters. The ranking also enables us to predict the genotypes that future populations inherit from the present one. While the inference accuracy increases monotonically with sample size, samples of 200 nearly saturate the performance. We propose that our approach can be used for inferring relative fitness of genomes obtained in single-cell sequencing of tumors and in monitoring viral outbreaks. PMID:24770330

  6. Inferring causal structure: a quantum advantage

    NASA Astrophysics Data System (ADS)

    Ried, Katja; Spekkens, Robert

    2014-03-01

    The problem of inferring causal relations from observed correlations is central to science, and extensive study has yielded both important conceptual insights and widely used practical applications. Yet some of the simplest questions are impossible to answer classically: for instance, if one observes correlations between two variables (such as taking a new medical treatment and the subject's recovery), does this show a direct causal influence, or is it due to some hidden common cause? We develop a framework for quantum causal inference, and show how quantum theory provides a unique advantage in this decision problem. The key insight is that certain quantum correlations can only arise from specific causal structures, whereas pairs of classical variables can exhibit any pattern of correlation regardless of whether they have a common cause or a direct-cause relation. For example, suppose one measures the same Pauli observable on two qubits. If they share a common cause, such as being prepared in an entangled state, then one never finds perfect (positive) correlations in every basis, whereas perfect anticorrelations are possible (if one prepares the singlet state). Conversely, if a channel connects the qubits, hence a direct causal influence, perfect anticorrelations are impossible.

  7. Phylogeny Inference of Closely Related Bacterial Genomes: Combining the Features of Both Overlapping Genes and Collinear Genomic Regions

    PubMed Central

    Zhang, Yan-Cong; Lin, Kui

    2015-01-01

    Overlapping genes (OGs) represent one type of widespread genomic feature in bacterial genomes and have been used as rare genomic markers in phylogeny inference of closely related bacterial species. However, the inference may experience a decrease in performance for phylogenomic analysis of too closely or too distantly related genomes. Another drawback of OGs as phylogenetic markers is that they usually take little account of the effects of genomic rearrangement on the similarity estimation, such as intra-chromosome/genome translocations, horizontal gene transfer, and gene losses. To explore such effects on the accuracy of phylogeny reconstruction, we combine phylogenetic signals of OGs with collinear genomic regions, here called locally collinear blocks (LCBs). By putting these together, we refine our previous metric of pairwise similarity between two closely related bacterial genomes. As a case study, we used this new method to reconstruct the phylogenies of 88 Enterobacteriale genomes of the class Gammaproteobacteria. Our results demonstrated that the topological accuracy of the inferred phylogeny was improved when both OGs and LCBs were simultaneously considered, suggesting that combining these two phylogenetic markers may reduce, to some extent, the influence of gene loss on phylogeny inference. Such phylogenomic studies, we believe, will help us to explore a more effective approach to increasing the robustness of phylogeny reconstruction of closely related bacterial organisms. PMID:26715828

  8. Hybrid Origins of Citrus Varieties Inferred from DNA Marker Analysis of Nuclear and Organelle Genomes.

    PubMed

    Shimizu, Tokurou; Kitajima, Akira; Nonaka, Keisuke; Yoshioka, Terutaka; Ohta, Satoshi; Goto, Shingo; Toyoda, Atsushi; Fujiyama, Asao; Mochizuki, Takako; Nagasaki, Hideki; Kaminuma, Eli; Nakamura, Yasukazu

    2016-01-01

    Most indigenous citrus varieties are assumed to be natural hybrids, but their parentage has so far been determined in only a few cases because of their wide genetic diversity and the low transferability of DNA markers. Here we infer the parentage of indigenous citrus varieties using simple sequence repeat and indel markers developed from various citrus genome sequence resources. Parentage tests with 122 known hybrids using the selected DNA markers certify their transferability among those hybrids. Identity tests confirm that most variant strains are selected mutants, but we find four types of kunenbo (Citrus nobilis) and three types of tachibana (Citrus tachibana) for which we suggest different origins. Structure analysis with DNA markers that are in Hardy-Weinberg equilibrium deduce three basic taxa coinciding with the current understanding of citrus ancestors. Genotyping analysis of 101 indigenous citrus varieties with 123 selected DNA markers infers the parentages of 22 indigenous citrus varieties including Satsuma, Temple, and iyo, and single parents of 45 indigenous citrus varieties, including kunenbo, C. ichangensis, and Ichang lemon by allele-sharing and parentage tests. Genotyping analysis of chloroplast and mitochondrial genomes using 11 DNA markers classifies their cytoplasmic genotypes into 18 categories and deduces the combination of seed and pollen parents. Likelihood ratio analysis verifies the inferred parentages with significant scores. The reconstructed genealogy identifies 12 types of varieties consisting of Kishu, kunenbo, yuzu, koji, sour orange, dancy, kobeni mikan, sweet orange, tachibana, Cleopatra, willowleaf mandarin, and pummelo, which have played pivotal roles in the occurrence of these indigenous varieties. The inferred parentage of the indigenous varieties confirms their hybrid origins, as found by recent studies.

  9. Hybrid Origins of Citrus Varieties Inferred from DNA Marker Analysis of Nuclear and Organelle Genomes

    PubMed Central

    Kitajima, Akira; Nonaka, Keisuke; Yoshioka, Terutaka; Ohta, Satoshi; Goto, Shingo; Toyoda, Atsushi; Fujiyama, Asao; Mochizuki, Takako; Nagasaki, Hideki; Kaminuma, Eli; Nakamura, Yasukazu

    2016-01-01

    Most indigenous citrus varieties are assumed to be natural hybrids, but their parentage has so far been determined in only a few cases because of their wide genetic diversity and the low transferability of DNA markers. Here we infer the parentage of indigenous citrus varieties using simple sequence repeat and indel markers developed from various citrus genome sequence resources. Parentage tests with 122 known hybrids using the selected DNA markers certify their transferability among those hybrids. Identity tests confirm that most variant strains are selected mutants, but we find four types of kunenbo (Citrus nobilis) and three types of tachibana (Citrus tachibana) for which we suggest different origins. Structure analysis with DNA markers that are in Hardy–Weinberg equilibrium deduce three basic taxa coinciding with the current understanding of citrus ancestors. Genotyping analysis of 101 indigenous citrus varieties with 123 selected DNA markers infers the parentages of 22 indigenous citrus varieties including Satsuma, Temple, and iyo, and single parents of 45 indigenous citrus varieties, including kunenbo, C. ichangensis, and Ichang lemon by allele-sharing and parentage tests. Genotyping analysis of chloroplast and mitochondrial genomes using 11 DNA markers classifies their cytoplasmic genotypes into 18 categories and deduces the combination of seed and pollen parents. Likelihood ratio analysis verifies the inferred parentages with significant scores. The reconstructed genealogy identifies 12 types of varieties consisting of Kishu, kunenbo, yuzu, koji, sour orange, dancy, kobeni mikan, sweet orange, tachibana, Cleopatra, willowleaf mandarin, and pummelo, which have played pivotal roles in the occurrence of these indigenous varieties. The inferred parentage of the indigenous varieties confirms their hybrid origins, as found by recent studies. PMID:27902727

  10. Inference of gene regulatory networks from genome-wide knockout fitness data

    PubMed Central

    Wang, Liming; Wang, Xiaodong; Arkin, Adam P.; Samoilov, Michael S.

    2013-01-01

    Motivation: Genome-wide fitness is an emerging type of high-throughput biological data generated for individual organisms by creating libraries of knockouts, subjecting them to broad ranges of environmental conditions, and measuring the resulting clone-specific fitnesses. Since fitness is an organism-scale measure of gene regulatory network behaviour, it may offer certain advantages when insights into such phenotypical and functional features are of primary interest over individual gene expression. Previous works have shown that genome-wide fitness data can be used to uncover novel gene regulatory interactions, when compared with results of more conventional gene expression analysis. Yet, to date, few algorithms have been proposed for systematically using genome-wide mutant fitness data for gene regulatory network inference. Results: In this article, we describe a model and propose an inference algorithm for using fitness data from knockout libraries to identify underlying gene regulatory networks. Unlike most prior methods, the presented approach captures not only structural, but also dynamical and non-linear nature of biomolecular systems involved. A state–space model with non-linear basis is used for dynamically describing gene regulatory networks. Network structure is then elucidated by estimating unknown model parameters. Unscented Kalman filter is used to cope with the non-linearities introduced in the model, which also enables the algorithm to run in on-line mode for practical use. Here, we demonstrate that the algorithm provides satisfying results for both synthetic data as well as empirical measurements of GAL network in yeast Saccharomyces cerevisiae and TyrR–LiuR network in bacteria Shewanella oneidensis. Availability: MATLAB code and datasets are available to download at http://www.duke.edu/∼lw174/Fitness.zip and http://genomics.lbl.gov/supplemental/fitness-bioinf/ Contact: wangx@ee.columbia.edu or mssamoilov@lbl.gov Supplementary information

  11. Robust and scalable inference of population history from hundreds of unphased whole genomes.

    PubMed

    Terhorst, Jonathan; Kamm, John A; Song, Yun S

    2017-02-01

    It has recently been demonstrated that inference methods based on genealogical processes with recombination can uncover past population history in unprecedented detail. However, these methods scale poorly with sample size, limiting resolution in the recent past, and they require phased genomes, which contain switch errors that can catastrophically distort the inferred history. Here we present SMC++, a new statistical tool capable of analyzing orders of magnitude more samples than existing methods while requiring only unphased genomes (its results are independent of phasing). SMC++ can jointly infer population size histories and split times in diverged populations, and it employs a novel spline regularization scheme that greatly reduces estimation error. We apply SMC++ to analyze sequence data from over a thousand human genomes in Africa and Eurasia, hundreds of genomes from a Drosophila melanogaster population in Africa, and tens of genomes from zebra finch and long-tailed finch populations in Australia.

  12. fastSTRUCTURE: variational inference of population structure in large SNP data sets.

    PubMed

    Raj, Anil; Stephens, Matthew; Pritchard, Jonathan K

    2014-06-01

    Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework. Variational methods pose the problem of computing relevant posterior distributions as an optimization problem, allowing us to build on recent advances in optimization theory to develop fast inference tools. In addition, we propose useful heuristic scores to identify the number of populations represented in a data set and a new hierarchical prior to detect weak population structure in the data. We test the variational algorithms on simulated data and illustrate using genotype data from the CEPH-Human Genome Diversity Panel. The variational algorithms are almost two orders of magnitude faster than STRUCTURE and achieve accuracies comparable to those of ADMIXTURE. Furthermore, our results show that the heuristic scores for choosing model complexity provide a reasonable range of values for the number of populations represented in the data, with minimal bias toward detecting structure when it is very weak. Our algorithm, fastSTRUCTURE, is freely available online at http://pritchardlab.stanford.edu/structure.html.

  13. A molecular phylogeny of Hemiptera inferred from mitochondrial genome sequences.

    PubMed

    Song, Nan; Liang, Ai-Ping; Bu, Cui-Ping

    2012-01-01

    Classically, Hemiptera is comprised of two suborders: Homoptera and Heteroptera. Homoptera includes Cicadomorpha, Fulgoromorpha and Sternorrhyncha. However, according to previous molecular phylogenetic studies based on 18S rDNA, Fulgoromorpha has a closer relationship to Heteroptera than to other hemipterans, leaving Homoptera as paraphyletic. Therefore, the position of Fulgoromorpha is important for studying phylogenetic structure of Hemiptera. We inferred the evolutionary affiliations of twenty-five superfamilies of Hemiptera using mitochondrial protein-coding genes and rRNAs. We sequenced three mitogenomes, from Pyrops candelaria, Lycorma delicatula and Ricania marginalis, representing two additional families in Fulgoromorpha. Pyrops and Lycorma are representatives of an additional major family Fulgoridae in Fulgoromorpha, whereas Ricania is a second representative of the highly derived clade Ricaniidae. The organization and size of these mitogenomes are similar to those of the sequenced fulgoroid species. Our consensus phylogeny of Hemiptera largely supported the relationships (((Fulgoromorpha,Sternorrhyncha),Cicadomorpha),Heteroptera), and thus supported the classic phylogeny of Hemiptera. Selection of optimal evolutionary models (exclusion and inclusion of two rRNA genes or of third codon positions of protein-coding genes) demonstrated that rapidly evolving and saturated sites should be removed from the analyses.

  14. [Genomic structure of the autotetraploid oat species Avena macrostachya inferred from comparative analysis of the ITS1 and ITS2 sequences: on the oat karyotype evolution during the early stages of the Avena species divergence].

    PubMed

    Rodionov, A V; Tiupa, N B; Kim, E S; Machs, E M; Loskutov, I G

    2005-05-01

    To examine the genomic structure of Avena macrostachya, internal transcribed spacers, ITS1 and ITS2, as well as nuclear 5.8S tRNA genes from three oat species with AsAs karyotype (A. wiestii, A. hirtula, and A. atlantica), and those from A. longiglumis (AlAl), A. canariensis (AcAc), A. ventricosa (CvCv), A. pilosa, and A. clauda (CpCp) were sequenced. All species of the genus Avena examined represented a monophyletic group (bootstrap index = 98), within which two branches, i.e., species with A- and C-genomes, were distinguished (bootstrap indices = 100). The subject of our study, A. macrostachya, albeit belonging to the phylogenetic branch of C-genome oat species (karyotype with submetacentic and subacrocentric chromosomes), has preserved an isobrachyal karyotype, (i.e., that containing metacentric chromosomes), probably typical of the common Avena ancestor. It was suggested to classify the A. macrostachya genome as a specific form of C-genome, Cm-genome. Among the species from other genera studied, Arrhenatherum elatius was found to be the closest to Avena in ITS1 and ITS structure. Phylogenetic relationships between Avena and Helictotrichon remain intriguingly uncertain. The HPR389153 sequence from H. pratense genome was closest to the ITS1 sequences specific to the Avena A-genomes (p-distance = 0.0237), while the differences of this sequence from the ITS1 of A. macrostachya reached 0.1221. On the other hand, HAD389117 from H. adsurgens was close to the ITS1 specific to Avena C-genomes (p-distance = 0.0189), while its differences from the A-genome specific ITS1 sequences reached 0.1221. It seems likely that the appearance of highly polyploid (2n = 12-21x) species of H. pratense and H. adsurgens could be associated with interspecific hybridization involving Mediterranean oat species carrying A- and C-genomes. A hypothesis on the pathways of Avena chromosomes evolution during the early stages the oat species divergence is proposed.

  15. Informational laws of genome structures

    PubMed Central

    Bonnici, Vincenzo; Manca, Vincenzo

    2016-01-01

    In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined. PMID:27354155

  16. Informational laws of genome structures

    NASA Astrophysics Data System (ADS)

    Bonnici, Vincenzo; Manca, Vincenzo

    2016-06-01

    In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined.

  17. Inferring Ancestral Recombination Graphs from Bacterial Genomic Data

    PubMed Central

    Vaughan, Timothy G.; Welch, David; Drummond, Alexei J.; Biggs, Patrick J.; George, Tessy; French, Nigel P.

    2017-01-01

    Homologous recombination is a central feature of bacterial evolution, yet it confounds traditional phylogenetic methods. While a number of methods specific to bacterial evolution have been developed, none of these permit joint inference of a bacterial recombination graph and associated parameters. In this article, we present a new method which addresses this shortcoming. Our method uses a novel Markov chain Monte Carlo algorithm to perform phylogenetic inference under the ClonalOrigin model. We demonstrate the utility of our method by applying it to ribosomal multilocus sequence typing data sequenced from pathogenic and nonpathogenic Escherichia coli serotype O157 and O26 isolates collected in rural New Zealand. The method is implemented as an open source BEAST 2 package, Bacter, which is available via the project web page at http://tgvaughan.github.io/bacter. PMID:28007885

  18. Efficient Exact Inference With Loss Augmented Objective in Structured Learning.

    PubMed

    Bauer, Alexander; Nakajima, Shinichi; Muller, Klaus-Robert

    2016-08-19

    Structural support vector machine (SVM) is an elegant approach for building complex and accurate models with structured outputs. However, its applicability relies on the availability of efficient inference algorithms--the state-of-the-art training algorithms repeatedly perform inference to compute a subgradient or to find the most violating configuration. In this paper, we propose an exact inference algorithm for maximizing nondecomposable objectives due to special type of a high-order potential having a decomposable internal structure. As an important application, our method covers the loss augmented inference, which enables the slack and margin scaling formulations of structural SVM with a variety of dissimilarity measures, e.g., Hamming loss, precision and recall, Fβ-loss, intersection over union, and many other functions that can be efficiently computed from the contingency table. We demonstrate the advantages of our approach in natural language parsing and sequence segmentation applications.

  19. OMA 2011: orthology inference among 1000 complete genomes.

    PubMed

    Altenhoff, Adrian M; Schneider, Adrian; Gonnet, Gaston H; Dessimoz, Christophe

    2011-01-01

    OMA (Orthologous MAtrix) is a database that identifies orthologs among publicly available, complete genomes. Initiated in 2004, the project is at its 11th release. It now includes 1000 genomes, making it one of the largest resources of its kind. Here, we describe recent developments in terms of species covered; the algorithmic pipeline--in particular regarding the treatment of alternative splicing, and new features of the web (OMA Browser) and programming interface (SOAP API). In the second part, we review the various representations provided by OMA and their typical applications. The database is publicly accessible at http://omabrowser.org.

  20. Using Genetic Distance to Infer the Accuracy of Genomic Prediction

    PubMed Central

    Scutari, Marco; Mackay, Ian

    2016-01-01

    The prediction of phenotypic traits using high-density genomic data has many applications such as the selection of plants and animals of commercial interest; and it is expected to play an increasing role in medical diagnostics. Statistical models used for this task are usually tested using cross-validation, which implicitly assumes that new individuals (whose phenotypes we would like to predict) originate from the same population the genomic prediction model is trained on. In this paper we propose an approach based on clustering and resampling to investigate the effect of increasing genetic distance between training and target populations when predicting quantitative traits. This is important for plant and animal genetics, where genomic selection programs rely on the precision of predictions in future rounds of breeding. Therefore, estimating how quickly predictive accuracy decays is important in deciding which training population to use and how often the model has to be recalibrated. We find that the correlation between true and predicted values decays approximately linearly with respect to either FST or mean kinship between the training and the target populations. We illustrate this relationship using simulations and a collection of data sets from mice, wheat and human genetics. PMID:27589268

  1. Species Delimitation and Interspecific Relationships of the Genus Orychophragmus (Brassicaceae) Inferred from Whole Chloroplast Genomes

    PubMed Central

    Hu, Huan; Hu, Quanjun; Al-Shehbaz, Ihsan A.; Luo, Xin; Zeng, Tingting; Guo, Xinyi; Liu, Jianquan

    2016-01-01

    Genetic variations from few chloroplast DNA fragments show lower discriminatory power in the delimitation of closely related species and less resolution ability in discerning interspecific relationships than from nrITS. Here we use Orychophragmus (Brassicaceae) as a model system to test the hypothesis that the whole chloroplast genomes (plastomes), with accumulation of more variations despite the slow evolution, can overcome these weaknesses. We used Illumina sequencing technology via a reference-guided assembly to construct complete plastomes of 17 individuals from six putatively assumed species in the genus. All plastomes are highly conserved in genome structure, gene order, and orientation, and they are around 153 kb in length and contain 113 unique genes. However, nucleotide variations are quite substantial to support the delimitation of all sampled species and to resolve interspecific relationships with high statistical supports. As expected, the estimated divergences between major clades and species are lower than those estimated from nrITS probably due to the slow substitution rate of the plastomes. However, the plastome and nrITS phylogenies were contradictory in the placements of most species, thus suggesting that these species may have experienced complex non-bifurcating evolutions with incomplete lineage sorting and/or hybrid introgressions. Overall, our case study highlights the importance of using plastomes to examine species boundaries and establish an independent phylogeny to infer the speciation history of plants. PMID:27999584

  2. Structural Genomics of Protein Phosphatases

    SciTech Connect

    Almo,S.; Bonanno, J.; Sauder, J.; Emtage, S.; Dilorenzo, T.; Malashkevich, V.; Wasserman, S.; Swaminathan, S.; Eswaramoorthy, S.; et al

    2007-01-01

    The New York SGX Research Center for Structural Genomics (NYSGXRC) of the NIGMS Protein Structure Initiative (PSI) has applied its high-throughput X-ray crystallographic structure determination platform to systematic studies of all human protein phosphatases and protein phosphatases from biomedically-relevant pathogens. To date, the NYSGXRC has determined structures of 21 distinct protein phosphatases: 14 from human, 2 from mouse, 2 from the pathogen Toxoplasma gondii, 1 from Trypanosoma brucei, the parasite responsible for African sleeping sickness, and 2 from the principal mosquito vector of malaria in Africa, Anopheles gambiae. These structures provide insights into both normal and pathophysiologic processes, including transcriptional regulation, regulation of major signaling pathways, neural development, and type 1 diabetes. In conjunction with the contributions of other international structural genomics consortia, these efforts promise to provide an unprecedented database and materials repository for structure-guided experimental and computational discovery of inhibitors for all classes of protein phosphatases.

  3. Genome Size Variation and Species Relationships in Hieracium Sub-genus Pilosella (Asteraceae) as Inferred by Flow Cytometry

    PubMed Central

    Suda, Jan; Krahulcová, Anna; Trávníček, Pavel; Rosenbaumová, Radka; Peckert, Tomáš; Krahulec, František

    2007-01-01

    Background and Aims Hieracium sub-genus Pilosella (hawkweeds) is a taxonomically complicated group of vascular plants, the structure of which is substantially influenced by frequent interspecific hybridization and polyploidization. Two kinds of species, ‘basic’ and ‘intermediate’ (i.e. hybridogenous), are usually recognized. In this study, genome size variation was investigated in a representative set of Central European hawkweeds in order to assess the value of such a data set for species delineation and inference of evolutionary relationships. Methods Holoploid and monoploid genome sizes (C- and Cx-values) were determined using propidium iodide flow cytometry for 376 homogeneously cultivated individuals of Hieracium sub-genus Pilosella, including 24 species (271 individuals), five recent natural hybrids (seven individuals) and experimental F1 hybrids from four parental combinations (98 individuals). Chromosome counts were available for more than half of the plant accessions. Base composition (proportion of AT/GC bases) was cytometrically estimated in 73 individuals. Key Results Seven different ploidy levels (2x–8x) were detected, with intraspecific ploidy polymorphism (up to four different cytotypes) occurring in 11 wild species. Mean 2C-values varied approx. 4·3-fold from 3·53 pg in diploid H. hoppeanum to 15·30 pg in octoploid H. brachiatum. 1Cx-values ranged from 1·72 pg in H. pilosella to 2·16 pg in H. echioides (1·26-fold). The DNA content of (high) polyploids was usually proportional to the DNA values of their diploid/low polyploid counterparts, indicating lack of processes altering genome size (i.e. genome down-sizing). Most species showed constant nuclear DNA amounts, exceptions being three hybridogenous taxa, in which introgressive hybridization was suggested as a presumable trigger for genome size variation. Monoploid genome sizes of hybridogenous species were always between the corresponding values of their putative parents. In addition

  4. Inferring Epidemic Contact Structure from Phylogenetic Trees

    PubMed Central

    Leventhal, Gabriel E.; Kouyos, Roger; Stadler, Tanja; von Wyl, Viktor; Yerly, Sabine; Böni, Jürg; Cellerai, Cristina; Klimkait, Thomas; Günthard, Huldrych F.; Bonhoeffer, Sebastian

    2012-01-01

    Contact structure is believed to have a large impact on epidemic spreading and consequently using networks to model such contact structure continues to gain interest in epidemiology. However, detailed knowledge of the exact contact structure underlying real epidemics is limited. Here we address the question whether the structure of the contact network leaves a detectable genetic fingerprint in the pathogen population. To this end we compare phylogenies generated by disease outbreaks in simulated populations with different types of contact networks. We find that the shape of these phylogenies strongly depends on contact structure. In particular, measures of tree imbalance allow us to quantify to what extent the contact structure underlying an epidemic deviates from a null model contact network and illustrate this in the case of random mixing. Using a phylogeny from the Swiss HIV epidemic, we show that this epidemic has a significantly more unbalanced tree than would be expected from random mixing. PMID:22412361

  5. Inferring epidemic contact structure from phylogenetic trees.

    PubMed

    Leventhal, Gabriel E; Kouyos, Roger; Stadler, Tanja; Wyl, Viktor von; Yerly, Sabine; Böni, Jürg; Cellerai, Cristina; Klimkait, Thomas; Günthard, Huldrych F; Bonhoeffer, Sebastian

    2012-01-01

    Contact structure is believed to have a large impact on epidemic spreading and consequently using networks to model such contact structure continues to gain interest in epidemiology. However, detailed knowledge of the exact contact structure underlying real epidemics is limited. Here we address the question whether the structure of the contact network leaves a detectable genetic fingerprint in the pathogen population. To this end we compare phylogenies generated by disease outbreaks in simulated populations with different types of contact networks. We find that the shape of these phylogenies strongly depends on contact structure. In particular, measures of tree imbalance allow us to quantify to what extent the contact structure underlying an epidemic deviates from a null model contact network and illustrate this in the case of random mixing. Using a phylogeny from the Swiss HIV epidemic, we show that this epidemic has a significantly more unbalanced tree than would be expected from random mixing.

  6. Quantum inferring acausal structures and the Monty Hall problem

    NASA Astrophysics Data System (ADS)

    Kurzyk, Dariusz; Glos, Adam

    2016-12-01

    This paper presents a quantum version of the Monty Hall problem based upon the quantum inferring acausal structures, which can be identified with generalization of Bayesian networks. Considered structures are expressed in formalism of quantum information theory, where density operators are identified with quantum generalization of probability distributions. Conditional relations between quantum counterpart of random variables are described by quantum conditional operators. Presented quantum inferring structures are used to construct a model inspired by scenario of well-known Monty Hall game, where we show the differences between classical and quantum Bayesian reasoning.

  7. The History of Slavs Inferred from Complete Mitochondrial Genome Sequences

    PubMed Central

    Mielnik-Sikorska, Marta; Daca, Patrycja; Malyarchuk, Boris; Derenko, Miroslava; Skonieczna, Katarzyna; Perkova, Maria; Dobosz, Tadeusz; Grzybowski, Tomasz

    2013-01-01

    To shed more light on the processes leading to crystallization of a Slavic identity, we investigated variability of complete mitochondrial genomes belonging to haplogroups H5 and H6 (63 mtDNA genomes) from the populations of Eastern and Western Slavs, including new samples of Poles, Ukrainians and Czechs presented here. Molecular dating implies formation of H5 approximately 11.5–16 thousand years ago (kya) in the areas of southern Europe. Within ancient haplogroup H6, dated at around 15–28 kya, there is a subhaplogroup H6c, which probably survived the last glaciation in Europe and has undergone expansion only 3–4 kya, together with the ancestors of some European groups, including the Slavs, because H6c has been detected in Czechs, Poles and Slovaks. Detailed analysis of complete mtDNAs allowed us to identify a number of lineages that seem specific for Central and Eastern Europe (H5a1f, H5a2, H5a1r, H5a1s, H5b4, H5e1a, H5u1, some subbranches of H5a1a and H6a1a9). Some of them could possibly be traced back to at least ∼4 kya, which indicates that some of the ancestors of today's Slavs (Poles, Czechs, Slovaks, Ukrainians and Russians) inhabited areas of Central and Eastern Europe much earlier than it was estimated on the basis of archaeological and historical data. We also sequenced entire mitochondrial genomes of several non-European lineages (A, C, D, G, L) found in contemporary populations of Poland and Ukraine. The analysis of these haplogroups confirms the presence of Siberian (C5c1, A8a1) and Ashkenazi-specific (L2a1l2a) mtDNA lineages in Slavic populations. Moreover, we were able to pinpoint some lineages which could possibly reflect the relatively recent contacts of Slavs with nomadic Altaic peoples (C4a1a, G2a, D5a2a1a1). PMID:23342138

  8. Covariance Between Genotypic Effects and its Use for Genomic Inference in Half-Sib Families

    PubMed Central

    Wittenburg, Dörte; Teuscher, Friedrich; Klosa, Jan; Reinsch, Norbert

    2016-01-01

    In livestock, current statistical approaches utilize extensive molecular data, e.g., single nucleotide polymorphisms (SNPs), to improve the genetic evaluation of individuals. The number of model parameters increases with the number of SNPs, so the multicollinearity between covariates can affect the results obtained using whole genome regression methods. In this study, dependencies between SNPs due to linkage and linkage disequilibrium among the chromosome segments were explicitly considered in methods used to estimate the effects of SNPs. The population structure affects the extent of such dependencies, so the covariance among SNP genotypes was derived for half-sib families, which are typical in livestock populations. Conditional on the SNP haplotypes of the common parent (sire), the theoretical covariance was determined using the haplotype frequencies of the population from which the individual parent (dam) was derived. The resulting covariance matrix was included in a statistical model for a trait of interest, and this covariance matrix was then used to specify prior assumptions for SNP effects in a Bayesian framework. The approach was applied to one family in simulated scenarios (few and many quantitative trait loci) and using semireal data obtained from dairy cattle to identify genome segments that affect performance traits, as well as to investigate the impact on predictive ability. Compared with a method that does not explicitly consider any of the relationship among predictor variables, the accuracy of genetic value prediction was improved by 10–22%. The results show that the inclusion of dependence is particularly important for genomic inference based on small sample sizes. PMID:27402363

  9. Covariance Between Genotypic Effects and its Use for Genomic Inference in Half-Sib Families.

    PubMed

    Wittenburg, Dörte; Teuscher, Friedrich; Klosa, Jan; Reinsch, Norbert

    2016-09-08

    In livestock, current statistical approaches utilize extensive molecular data, e.g., single nucleotide polymorphisms (SNPs), to improve the genetic evaluation of individuals. The number of model parameters increases with the number of SNPs, so the multicollinearity between covariates can affect the results obtained using whole genome regression methods. In this study, dependencies between SNPs due to linkage and linkage disequilibrium among the chromosome segments were explicitly considered in methods used to estimate the effects of SNPs. The population structure affects the extent of such dependencies, so the covariance among SNP genotypes was derived for half-sib families, which are typical in livestock populations. Conditional on the SNP haplotypes of the common parent (sire), the theoretical covariance was determined using the haplotype frequencies of the population from which the individual parent (dam) was derived. The resulting covariance matrix was included in a statistical model for a trait of interest, and this covariance matrix was then used to specify prior assumptions for SNP effects in a Bayesian framework. The approach was applied to one family in simulated scenarios (few and many quantitative trait loci) and using semireal data obtained from dairy cattle to identify genome segments that affect performance traits, as well as to investigate the impact on predictive ability. Compared with a method that does not explicitly consider any of the relationship among predictor variables, the accuracy of genetic value prediction was improved by 10-22%. The results show that the inclusion of dependence is particularly important for genomic inference based on small sample sizes.

  10. Inference of gorilla demographic and selective history from whole-genome sequence data.

    PubMed

    McManus, Kimberly F; Kelley, Joanna L; Song, Shiya; Veeramah, Krishna R; Woerner, August E; Stevison, Laurie S; Ryder, Oliver A; Ape Genome Project, Great; Kidd, Jeffrey M; Wall, Jeffrey D; Bustamante, Carlos D; Hammer, Michael F

    2015-03-01

    Although population-level genomic sequence data have been gathered extensively for humans, similar data from our closest living relatives are just beginning to emerge. Examination of genomic variation within great apes offers many opportunities to increase our understanding of the forces that have differentially shaped the evolutionary history of hominid taxa. Here, we expand upon the work of the Great Ape Genome Project by analyzing medium to high coverage whole-genome sequences from 14 western lowland gorillas (Gorilla gorilla gorilla), 2 eastern lowland gorillas (G. beringei graueri), and a single Cross River individual (G. gorilla diehli). We infer that the ancestors of western and eastern lowland gorillas diverged from a common ancestor approximately 261 ka, and that the ancestors of the Cross River population diverged from the western lowland gorilla lineage approximately 68 ka. Using a diffusion approximation approach to model the genome-wide site frequency spectrum, we infer a history of western lowland gorillas that includes an ancestral population expansion of 1.4-fold around 970 ka and a recent 5.6-fold contraction in population size 23 ka. The latter may correspond to a major reduction in African equatorial forests around the Last Glacial Maximum. We also analyze patterns of variation among western lowland gorillas to identify several genomic regions with strong signatures of recent selective sweeps. We find that processes related to taste, pancreatic and saliva secretion, sodium ion transmembrane transport, and cardiac muscle function are overrepresented in genomic regions predicted to have experienced recent positive selection.

  11. Sigma: Strain-level inference of genomes from metagenomic analysis for biosurveillance

    SciTech Connect

    Ahn, Tae-Hyuk; Chai, Juanjuan; Pan, Chongle

    2014-09-29

    Motivation: Metagenomic sequencing of clinical samples provides a promising technique for direct pathogen detection and characterization in biosurveillance. Taxonomic analysis at the strain level can be used to resolve serotypes of a pathogen in biosurveillance. Sigma was developed for strain-level identification and quantification of pathogens using their reference genomes based on metagenomic analysis. Results: Sigma provides not only accurate strain-level inferences, but also three unique capabilities: (i) Sigma quantifies the statistical uncertainty of its inferences, which includes hypothesis testing of identified genomes and confidence interval estimation of their relative abundances; (ii) Sigma enables strain variant calling by assigning metagenomic reads to their most likely reference genomes; and (iii) Sigma supports parallel computing for fast analysis of large datasets. In conclusion, the algorithm performance was evaluated using simulated mock communities and fecal samples with spike-in pathogen strains. Availability and Implementation: Sigma was implemented in C++ with source codes and binaries freely available at http://sigma.omicsbio.org.

  12. Genealogical lineage sorting leads to significant, but incorrect Bayesian multilocus inference of population structure

    PubMed Central

    OROZCO-terWENGEL, PABLO; CORANDER, JUKKA; SCHLÖTTERER, CHRISTIAN

    2011-01-01

    Over the past decades, the use of molecular markers has revolutionized biology and led to the foundation of a new research discipline—phylogeography. Of particular interest has been the inference of population structure and biogeography. While initial studies focused on mtDNA as a molecular marker, it has become apparent that selection and genealogical lineage sorting could lead to erroneous inferences. As it is not clear to what extent these forces affect a given marker, it has become common practice to use the combined evidence from a set of molecular markers as an attempt to recover the signals that approximate the true underlying demography. Typically, the number of markers used is determined by either budget constraints or by statistical power required to recognize significant population differentiation. Using microsatellite markers from Drosophila and humans, we show that even large numbers of loci (>50) can frequently result in statistically well-supported, but incorrect inference of population structure using the software baps. Most importantly, genomic features, such as chromosomal location, variability of the markers, or recombination rate, cannot explain this observation. Instead, it can be attributed to sampling variation among loci with different realizations of the stochastic lineage sorting. This phenomenon is particularly pronounced for low levels of population differentiation. Our results have important implications for ongoing studies of population differentiation, as we unambiguously demonstrate that statistical significance of population structure inferred from a random set of genetic markers cannot necessarily be taken as evidence for a reliable demographic inference. PMID:21244537

  13. Structure and inference in annotated networks

    NASA Astrophysics Data System (ADS)

    Newman, M. E. J.; Clauset, Aaron

    2016-06-01

    For many networks of scientific interest we know both the connections of the network and information about the network nodes, such as the age or gender of individuals in a social network. Here we demonstrate how this `metadata' can be used to improve our understanding of network structure. We focus in particular on the problem of community detection in networks and develop a mathematically principled approach that combines a network and its metadata to detect communities more accurately than can be done with either alone. Crucially, the method does not assume that the metadata are correlated with the communities we are trying to find. Instead, the method learns whether a correlation exists and correctly uses or ignores the metadata depending on whether they contain useful information. We demonstrate our method on synthetic networks with known structure and on real-world networks, large and small, drawn from social, biological and technological domains.

  14. Structure and inference in annotated networks

    PubMed Central

    Newman, M. E. J.; Clauset, Aaron

    2016-01-01

    For many networks of scientific interest we know both the connections of the network and information about the network nodes, such as the age or gender of individuals in a social network. Here we demonstrate how this ‘metadata' can be used to improve our understanding of network structure. We focus in particular on the problem of community detection in networks and develop a mathematically principled approach that combines a network and its metadata to detect communities more accurately than can be done with either alone. Crucially, the method does not assume that the metadata are correlated with the communities we are trying to find. Instead, the method learns whether a correlation exists and correctly uses or ignores the metadata depending on whether they contain useful information. We demonstrate our method on synthetic networks with known structure and on real-world networks, large and small, drawn from social, biological and technological domains. PMID:27306566

  15. Insights and inferences about integron evolution from genomic data

    PubMed Central

    Nemergut, Diana R; Robeson, Michael S; Kysela, Robert F; Martin, Andrew P; Schmidt, Steven K; Knight, Rob

    2008-01-01

    Background Integrons are mechanisms that facilitate horizontal gene transfer, allowing bacteria to integrate and express foreign DNA. These are important in the exchange of antibiotic resistance determinants, but can also transfer a diverse suite of genes unrelated to pathogenicity. Here, we provide a systematic analysis of the distribution and diversity of integron intI genes and integron-containing bacteria. Results We found integrons in 103 different pathogenic and non-pathogenic bacteria, in six major phyla. Integrons were widely scattered, and their presence was not confined to specific clades within bacterial orders. Nearly 1/3 of the intI genes that we identified were pseudogenes, containing either an internal stop codon or a frameshift mutation that would render the protein product non-functional. Additionally, 20% of bacteria contained more than one integrase gene. dN/dS ratios revealed mutational hotspots in clades of Vibrio and Shewanella intI genes. Finally, we characterized the gene cassettes associated with integrons in Methylobacillus flagellatus KT and Dechloromonas aromatica RCB, and found a heavy metal efflux gene as well as genes involved in protein folding and stability. Conclusion Our analysis suggests that the present distribution of integrons is due to multiple losses and gene transfer events. While, in some cases, the ability to integrate and excise foreign DNA may be selectively advantageous, the gain, loss, or rearrangment of gene cassettes could also be deleterious, selecting against functional integrases. Thus, such a high fraction of pseudogenes may suggest that the selective impact of integrons on genomes is variable, oscillating between beneficial and deleterious, possibly depending on environmental conditions. PMID:18513439

  16. Untangling statistical and biological models to understand network inference: the need for a genomics network ontology.

    PubMed

    Emmert-Streib, Frank; Dehmer, Matthias; Haibe-Kains, Benjamin

    2014-01-01

    In this paper, we shed light on approaches that are currently used to infer networks from gene expression data with respect to their biological meaning. As we will show, the biological interpretation of these networks depends on the chosen theoretical perspective. For this reason, we distinguish a statistical perspective from a mathematical modeling perspective and elaborate their differences and implications. Our results indicate the imperative need for a genomic network ontology in order to avoid increasing confusion about the biological interpretation of inferred networks, which can be even enhanced by approaches that integrate multiple data sets, respectively, data types.

  17. Comparative analysis of mitochondrial genomes in Diplura (hexapoda, arthropoda): taxon sampling is crucial for phylogenetic inferences.

    PubMed

    Chen, Wan-Jun; Koch, Markus; Mallatt, Jon M; Luan, Yun-Xia

    2014-01-01

    Two-pronged bristletails (Diplura) are traditionally classified into three major superfamilies: Campodeoidea, Projapygoidea, and Japygoidea. The interrelationships of these three superfamilies and the monophyly of Diplura have been much debated. Few previous studies included Projapygoidea in their phylogenetic considerations, and its position within Diplura still is a puzzle from both morphological and molecular points of view. Until now, no mitochondrial genome has been sequenced for any projapygoid species. To fill in this gap, we determined and annotated the complete mitochondrial genome of Octostigma sinensis (Octostigmatidae, Projapygoidea), and of three more dipluran species, one each from the Campodeidae, Parajapygidae, and Japygidae. All four newly sequenced dipluran mtDNAs encode the same set of genes in the same gene order as shared by most crustaceans and hexapods. Secondary structure truncations have occurred in trnR, trnC, trnS1, and trnS2, and the reduction of transfer RNA D-arms was found to be taxonomically correlated, with Campodeoidea having experienced the most reduction. Partitioned phylogenetic analyses, based on both amino acids and nucleotides of the protein-coding genes plus the ribosomal RNA genes, retrieve significant support for a monophyletic Diplura within Pancrustacea, with Projapygoidea more closely related to Campodeoidea than to Japygoidea. Another key finding is that monophyly of Diplura cannot be recovered unless Projapygoidea is included in the phylogenetic analyses; this explains the dipluran polyphyly found by past mitogenomic studies. Including Projapygoidea increased the sampling density within Diplura and probably helped by breaking up a long-branch-attraction artifact. This finding provides an example of how proper sampling is significant for phylogenetic inference.

  18. Streamlining and Large Ancestral Genomes in Archaea Inferred with a Phylogenetic Birth-and-Death Model

    PubMed Central

    Miklós, István

    2009-01-01

    Homologous genes originate from a common ancestor through vertical inheritance, duplication, or horizontal gene transfer. Entire homolog families spawned by a single ancestral gene can be identified across multiple genomes based on protein sequence similarity. The sequences, however, do not always reveal conclusively the history of large families. To study the evolution of complete gene repertoires, we propose here a mathematical framework that does not rely on resolved gene family histories. We show that so-called phylogenetic profiles, formed by family sizes across multiple genomes, are sufficient to infer principal evolutionary trends. The main novelty in our approach is an efficient algorithm to compute the likelihood of a phylogenetic profile in a model of birth-and-death processes acting on a phylogeny. We examine known gene families in 28 archaeal genomes using a probabilistic model that involves lineage- and family-specific components of gene acquisition, duplication, and loss. The model enables us to consider all possible histories when inferring statistics about archaeal evolution. According to our reconstruction, most lineages are characterized by a net loss of gene families. Major increases in gene repertoire have occurred only a few times. Our reconstruction underlines the importance of persistent streamlining processes in shaping genome composition in Archaea. It also suggests that early archaeal genomes were as complex as typical modern ones, and even show signs, in the case of the methanogenic ancestor, of an extremely large gene repertoire. PMID:19570746

  19. Structural Genomics on the Web

    PubMed Central

    Wixon, Jo

    2001-01-01

    In this review we provide a brief guide to some of the resources and databases that can be used to locate information and aid research in the growing field of structural genomics. The review will provide examples, for less experienced users, of what can be achieved using a selection of the available sites. We hope that this will encourage you to use these sites to their full potential and whet your appetite to search for other related sites. PMID:18628900

  20. Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories.

    PubMed

    Chockalingam, Sriram; Aluru, Maneesha; Aluru, Srinivas

    2016-09-19

    Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp.

  1. 2004 Structural, Function and Evolutionary Genomics

    SciTech Connect

    Douglas L. Brutlag Nancy Ryan Gray

    2005-03-23

    This Gordon conference will cover the areas of structural, functional and evolutionary genomics. It will take a systematic approach to genomics, examining the evolution of proteins, protein functional sites, protein-protein interactions, regulatory networks, and metabolic networks. Emphasis will be placed on what we can learn from comparative genomics and entire genomes and proteomes.

  2. Causal inference and the hierarchical structure of experience

    PubMed Central

    Johnson, Samuel G. B.; Keil, Frank C.

    2014-01-01

    Children and adults make rich causal inferences about the physical and social world, even in novel situations where they cannot rely on prior knowledge of causal mechanisms. We propose that this capacity is supported in part by constraints provided by event structure—the cognitive organization of experience into discrete events that are hierarchically organized. These event-structured causal inferences are guided by a level-matching principle, with events conceptualized at one level of an event hierarchy causally matched to other events at that same level, and a boundary-blocking principle, with events causally matched to other events that are parts of the same superordinate event. These principles are used to constrain inferences about plausible causal candidates in unfamiliar situations, both in diagnosing causes (Experiment 1) and predicting effects (Experiment 2). The results could not be explained by construal level (Experiment 3) or similarity-matching (Experiment 4), and were robust across a variety of physical and social causal systems. Taken together, these experiments demonstrate a novel way in which non-causal information we extract from the environment can help to constrain inferences about causal structure. PMID:25347533

  3. Towards the unification of inference structures in medical diagnostic tasks.

    PubMed

    Mira, J; Rives, J; Delgado, A E; Martínez, R

    1998-01-01

    The central purpose of artificial intelligence applied to medicine is to develop models for diagnosis and therapy planning at the knowledge level, in the Newell sense, and software environments to facilitate the reduction of these models to the symbol level. The usual methodology (KADS, Common-KADS, GAMES, HELIOS, Protégé, etc) has been to develop libraries of generic tasks and reusable problem-solving methods with explicit ontologies. The principal problem which clinicians have with these methodological developments concerns the diversity and complexity of new terms whose meaning is not sufficiently clear, precise, unambiguous and consensual for them to be accessible in the daily clinical environment. As a contribution to the solution of this problem, we develop in this article the conjecture that one inference structure is enough to describe the set of analysis tasks associated with medical diagnoses. To this end, we first propose a modification of the systematic diagnostic inference scheme to obtain an analysis generic task and then compare it with the monitoring and the heuristic classification task inference schemes using as comparison criteria the compatibility of domain roles (data structures), the similarity in the inferences, and the commonality in the set of assumptions which underlie the functionally equivalent models. The equivalences proposed are illustrated with several examples. Note that though our ongoing work aims to simplify the methodology and to increase the precision of the terms used, the proposal presented here should be viewed more in the nature of a conjecture.

  4. Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF

    PubMed Central

    Cong, Yingnan; Chan, Yao-ban; Phillips, Charles A.; Langston, Michael A.; Ragan, Mark A.

    2017-01-01

    Bacteria and archaea can exchange genetic material across lineages through processes of lateral genetic transfer (LGT). Collectively, these exchange relationships can be modeled as a network and analyzed using concepts from graph theory. In particular, densely connected regions within an LGT network have been defined as genetic exchange communities (GECs). However, it has been problematic to construct networks in which edges solely represent LGT. Here we apply term frequency-inverse document frequency (TF-IDF), an alignment-free method originating from document analysis, to infer regions of lateral origin in bacterial genomes. We examine four empirical datasets of different size (number of genomes) and phyletic breadth, varying a key parameter (word length k) within bounds established in previous work. We map the inferred lateral regions to genes in recipient genomes, and construct networks in which the nodes are groups of genomes, and the edges natively represent LGT. We then extract maximum and maximal cliques (i.e., GECs) from these graphs, and identify nodes that belong to GECs across a wide range of k. Most surviving lateral transfer has happened within these GECs. Using Gene Ontology enrichment tests we demonstrate that biological processes associated with metabolism, regulation and transport are often over-represented among the genes affected by LGT within these communities. These enrichments are largely robust to change of k. PMID:28154557

  5. ClonalFrameML: Efficient Inference of Recombination in Whole Bacterial Genomes

    PubMed Central

    Didelot, Xavier; Wilson, Daniel J.

    2015-01-01

    Recombination is an important evolutionary force in bacteria, but it remains challenging to reconstruct the imports that occurred in the ancestry of a genomic sample. Here we present ClonalFrameML, which uses maximum likelihood inference to simultaneously detect recombination in bacterial genomes and account for it in phylogenetic reconstruction. ClonalFrameML can analyse hundreds of genomes in a matter of hours, and we demonstrate its usefulness on simulated and real datasets. We find evidence for recombination hotspots associated with mobile elements in Clostridium difficile ST6 and a previously undescribed 310kb chromosomal replacement in Staphylococcus aureus ST582. ClonalFrameML is freely available at http://clonalframeml.googlecode.com/. PMID:25675341

  6. The feasibility of genome-scale biological network inference using Graphics Processing Units.

    PubMed

    Thiagarajan, Raghuram; Alavi, Amir; Podichetty, Jagdeep T; Bazil, Jason N; Beard, Daniel A

    2017-01-01

    Systems research spanning fields from biology to finance involves the identification of models to represent the underpinnings of complex systems. Formal approaches for data-driven identification of network interactions include statistical inference-based approaches and methods to identify dynamical systems models that are capable of fitting multivariate data. Availability of large data sets and so-called 'big data' applications in biology present great opportunities as well as major challenges for systems identification/reverse engineering applications. For example, both inverse identification and forward simulations of genome-scale gene regulatory network models pose compute-intensive problems. This issue is addressed here by combining the processing power of Graphics Processing Units (GPUs) and a parallel reverse engineering algorithm for inference of regulatory networks. It is shown that, given an appropriate data set, information on genome-scale networks (systems of 1000 or more state variables) can be inferred using a reverse-engineering algorithm in a matter of days on a small-scale modern GPU cluster.

  7. Inferring drug-disease associations from integration of chemical, genomic and phenotype data using network propagation

    PubMed Central

    2013-01-01

    Background During the last few years, the knowledge of drug, disease phenotype and protein has been rapidly accumulated and more and more scientists have been drawn the attention to inferring drug-disease associations by computational method. Development of an integrated approach for systematic discovering drug-disease associations by those informational data is an important issue. Methods We combine three different networks of drug, genomic and disease phenotype and assign the weights to the edges from available experimental data and knowledge. Given a specific disease, we use our network propagation approach to infer the drug-disease associations. Results We apply prostate cancer and colorectal cancer as our test data. We use the manually curated drug-disease associations from comparative toxicogenomics database to be our benchmark. The ranked results show that our proposed method obtains higher specificity and sensitivity and clearly outperforms previous methods. Our result also show that our method with off-targets information gets higher performance than that with only primary drug targets in both test data. Conclusions We clearly demonstrate the feasibility and benefits of using network-based analyses of chemical, genomic and phenotype data to reveal drug-disease associations. The potential associations inferred by our method provide new perspectives for toxicogenomics and drug reposition evaluation. PMID:24565337

  8. Bayesian Inference for Latent Biologic Structure with Determinantal Point Processes (DPP)

    PubMed Central

    Xu, Yanxun; Müller, Peter; Telesca, Donatello

    2016-01-01

    Summary We discuss the use of the determinantal point process (DPP) as a prior for latent structure in biomedical applications, where inference often centers on the interpretation of latent features as biologically or clinically meaningful structure. Typical examples include mixture models, when the terms of the mixture are meant to represent clinically meaningful subpopulations (of patients, genes, etc.). Another class of examples are feature allocation models. We propose the DPP prior as a repulsive prior on latent mixture components in the first example, and as prior on feature-specific parameters in the second case. We argue that the DPP is in general an attractive prior model for latent structure when biologically relevant interpretation of such structure is desired. We illustrate the advantages of DPP prior in three case studies, including inference in mixture models for magnetic resonance images (MRI) and for protein expression, and a feature allocation model for gene expression using data from The Cancer Genome Atlas. An important part of our argument are efficient and straightforward posterior simulation methods. We implement a variation of reversible jump Markov chain Monte Carlo simulation for inference under the DPP prior, using a density with respect to the unit rate Poisson process. PMID:26873271

  9. Sigma: Strain-level inference of genomes from metagenomic analysis for biosurveillance

    PubMed Central

    Ahn, Tae-Hyuk; Chai, Juanjuan; Pan, Chongle

    2015-01-01

    Motivation: Metagenomic sequencing of clinical samples provides a promising technique for direct pathogen detection and characterization in biosurveillance. Taxonomic analysis at the strain level can be used to resolve serotypes of a pathogen in biosurveillance. Sigma was developed for strain-level identification and quantification of pathogens using their reference genomes based on metagenomic analysis. Results: Sigma provides not only accurate strain-level inferences, but also three unique capabilities: (i) Sigma quantifies the statistical uncertainty of its inferences, which includes hypothesis testing of identified genomes and confidence interval estimation of their relative abundances; (ii) Sigma enables strain variant calling by assigning metagenomic reads to their most likely reference genomes; and (iii) Sigma supports parallel computing for fast analysis of large datasets. The algorithm performance was evaluated using simulated mock communities and fecal samples with spike-in pathogen strains. Availability and Implementation: Sigma was implemented in C++ with source codes and binaries freely available at http://sigma.omicsbio.org. Contact: panc@ornl.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25266224

  10. Sigma: Strain-level inference of genomes from metagenomic analysis for biosurveillance

    DOE PAGES

    Ahn, Tae-Hyuk; Chai, Juanjuan; Pan, Chongle

    2014-09-29

    Motivation: Metagenomic sequencing of clinical samples provides a promising technique for direct pathogen detection and characterization in biosurveillance. Taxonomic analysis at the strain level can be used to resolve serotypes of a pathogen in biosurveillance. Sigma was developed for strain-level identification and quantification of pathogens using their reference genomes based on metagenomic analysis. Results: Sigma provides not only accurate strain-level inferences, but also three unique capabilities: (i) Sigma quantifies the statistical uncertainty of its inferences, which includes hypothesis testing of identified genomes and confidence interval estimation of their relative abundances; (ii) Sigma enables strain variant calling by assigning metagenomic readsmore » to their most likely reference genomes; and (iii) Sigma supports parallel computing for fast analysis of large datasets. In conclusion, the algorithm performance was evaluated using simulated mock communities and fecal samples with spike-in pathogen strains. Availability and Implementation: Sigma was implemented in C++ with source codes and binaries freely available at http://sigma.omicsbio.org.« less

  11. Inferring Bottlenecks from Genome-Wide Samples of Short Sequence Blocks

    PubMed Central

    Bunnefeld, Lynsey; Frantz, Laurent A. F.; Lohse, Konrad

    2015-01-01

    The advent of the genomic era has necessitated the development of methods capable of analyzing large volumes of genomic data efficiently. Being able to reliably identify bottlenecks—extreme population size changes of short duration—not only is interesting in the context of speciation and extinction but also matters (as a null model) when inferring selection. Bottlenecks can be detected in polymorphism data via their distorting effect on the shape of the underlying genealogy. Here, we use the generating function of genealogies to derive the probability of mutational configurations in short sequence blocks under a simple bottleneck model. Given a large number of nonrecombining blocks, we can compute maximum-likelihood estimates of the time and strength of the bottleneck. Our method relies on a simple summary of the joint distribution of polymorphic sites. We extend the site frequency spectrum by counting mutations in frequency classes in short sequence blocks. Using linkage information over short distances in this way gives greater power to detect bottlenecks than the site frequency spectrum and potentially opens up a wide range of demographic histories to blockwise inference. Finally, we apply our method to genomic data from a species of pig (Sus cebifrons) endemic to islands in the center and west of the Philippines to estimate whether a bottleneck occurred upon island colonization and compare our scheme to Li and Durbin’s pairwise sequentially Markovian coalescent (PSMC) both for the pig data and using simulations. PMID:26341659

  12. RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach

    SciTech Connect

    Novichkov, Pavel S.; Rodionov, Dmitry A.; Stavrovskaya, Elena D.; Novichkova, Elena S.; Kazakov, Alexey E.; Gelfand, Mikhail S.; Arkin, Adam P.; Mironov, Andrey A.; Dubchak, Inna

    2010-05-26

    RegPredict web server is designed to provide comparative genomics tools for reconstruction and analysis of microbial regulons using comparative genomics approach. The server allows the user to rapidly generate reference sets of regulons and regulatory motif profiles in a group of prokaryotic genomes. The new concept of a cluster of co-regulated orthologous operons allows the user to distribute the analysis of large regulons and to perform the comparative analysis of multiple clusters independently. Two major workflows currently implemented in RegPredict are: (i) regulon reconstruction for a known regulatory motif and (ii) ab initio inference of a novel regulon using several scenarios for the generation of starting gene sets. RegPredict provides a comprehensive collection of manually curated positional weight matrices of regulatory motifs. It is based on genomic sequences, ortholog and operon predictions from the MicrobesOnline. An interactive web interface of RegPredict integrates and presents diverse genomic and functional information about the candidate regulon members from several web resources. RegPredict is freely accessible at http://regpredict.lbl.gov.

  13. Inferring Where and When Replication Initiates from Genome-Wide Replication Timing Data

    NASA Astrophysics Data System (ADS)

    Baker, A.; Audit, B.; Yang, S. C.-H.; Bechhoefer, J.; Arneodo, A.

    2012-06-01

    Based on an analogy between DNA replication and one dimensional nucleation-and-growth processes, various attempts to infer the local initiation rate I(x,t) of DNA replication origins from replication timing data have been developed in the framework of phase transition kinetics theories. These works have all used curve-fit strategies to estimate I(x,t) from genome-wide replication timing data. Here, we show how to invert analytically the Kolmogorov-Johnson-Mehl-Avrami model and extract I(x,t) directly. Tests on both simulated and experimental budding-yeast data confirm the location and firing-time distribution of replication origins.

  14. Inferring human population size and separation history from multiple genome sequences

    PubMed Central

    Schiffels, Stephan; Durbin, Richard

    2014-01-01

    The availability of complete human genome sequences from populations across the world has given rise to new population genetic inference methods that explicitly model their ancestral relationship under recombination and mutation. So far, application of these methods to evolutionary history more recent than 20-30 thousand years ago and to population separations has been limited. Here we present a new method that overcomes these shortcomings. The Multiple Sequentially Markovian Coalescent (MSMC) analyses the observed pattern of mutations in multiple individuals, focusing on the first coalescence between any two individuals. Results from applying MSMC to genome sequences from nine populations across the world suggest that the genetic separation of non-African ancestors from African Yoruban ancestors started long before 50,000 years ago, and give information about human population history as recently as 2,000 years ago, including the bottleneck in the peopling of the Americas, and separations within Africa, East Asia and Europe. PMID:24952747

  15. Mitochondrial Genome Structure of Photosynthetic Eukaryotes.

    PubMed

    Yurina, N P; Odintsova, M S

    2016-02-01

    Current ideas of plant mitochondrial genome organization are presented. Data on the size and structural organization of mtDNA, gene content, and peculiarities are summarized. Special emphasis is given to characteristic features of the mitochondrial genomes of land plants and photosynthetic algae that distinguish them from the mitochondrial genomes of other eukaryotes. The data published before the end of 2014 are reviewed.

  16. Phylogenetics and biogeography of the dung beetle genus Onthophagus inferred from mitochondrial genomes.

    PubMed

    Breeschoten, Thijmen; Doorenweerd, Camiel; Tarasov, Sergei; Vogler, Alfried P

    2016-12-01

    Phylogenetic relationships of dung beetles in the tribe Onthophagini, including the species-rich, cosmopolitan genus Onthophagus, were inferred using whole mitochondrial genomes. Data were generated by shotgun sequencing of mixed genomic DNA from >100 individuals on 50% of an Illumina MiSeq flow cell. Genome assembly of the mixed reads produced contigs of 74 (nearly) complete mitogenomes. The final dataset included representatives of Onthophagus from all biogeographic regions, closely related genera of Onthophagini, and the related tribes Onitini and Oniticellini. The analysis defined four major clades of Onthophagini, which was paraphyletic for Oniticellini, with Onitini as sister group to all others. Several (sub)genera considered as members of Onthophagus in the older literature formed separate deep lineages. All New World species of Onthophagus formed a monophyletic group, and the Australian taxa are confined to a single or two closely related clades, one of which forms the sister group of the New World species. Dating the tree by constraining the basal splits with existing calibrations of Scarabaeoidea suggests an origin of Onthophagini sensu lato in the Eocene and a rapid spread from an African ancestral stock into the Oriental region, and secondarily to Australia and the Americas at about 20-24 Mya. The successful assembly of mitogenomes and the well-supported tree obtained from these sequences demonstrates the power of shotgun sequencing from total genomic DNA of species pools as an efficient tool in genus-level phylogenetics.

  17. Proteomics-inferred genome typing (PIGT) demonstrates inter-populationrecombination as a strategy for environmental adaptation

    SciTech Connect

    Denef, Vincent; Verberkmoes, Nathan C; Shah, Manesh B; Abraham, Paul E; Lefsrud, Mark G; Hettich, Robert {Bob} L; Banfield, Jillian F.

    2009-01-01

    Analyses of ecological and evolutionary processes that shape microbial consortia are facilitated by comprehensive studies of ecosystems with low species richness. In the current study we evaluated the role of recombination in altering the fitness of chemoautotrophic bacteria in their natural environment. Proteomics-inferred genome typing (PIGT) was used to determine the genomic make-up of Leptospirillum group II populations in 27 biofilms sampled from six locations in the Richmond Mine acid mine drainage system (Iron Mountain, CA) over a four-year period. We observed six distinct genotypes that are recombinants comprised of segments from two parental genotypes. Community genomic analyses revealed additional low abundance recombinant variants. The dominance of some genotypes despite a larger available genome pool, and patterns of spatiotemporal distribution within the ecosystem, indicate selection for distinct recombinants. Genes involved in motility, signal transduction and transport were overrepresented in the tens to hundreds of kilobase recombinant blocks, whereas core metabolic functions were significantly underrepresented. Our findings demonstrate the power of PIGT and reveal that recombination is a mechanism for fine-scale adaptation in this system.

  18. mStruct: inference of population structure in light of both genetic admixing and allele mutations.

    PubMed

    Shringarpure, Suyash; Xing, Eric P

    2009-06-01

    Traditional methods for analyzing population structure, such as the Structure program, ignore the influence of the effect of allele mutations between the ancestral and current alleles of genetic markers, which can dramatically influence the accuracy of the structural estimation of current populations. Studying these effects can also reveal additional information about population evolution such as the divergence time and migration history of admixed populations. We propose mStruct, an admixture of population-specific mixtures of inheritance models that addresses the task of structure inference and mutation estimation jointly through a hierarchical Bayesian framework, and a variational algorithm for inference. We validated our method on synthetic data and used it to analyze the Human Genome Diversity Project-Centre d'Etude du Polymorphisme Humain (HGDP-CEPH) cell line panel of microsatellites and HGDP single-nucleotide polymorphism (SNP) data. A comparison of the structural maps of world populations estimated by mStruct and Structure is presented, and we also report potentially interesting mutation patterns in world populations estimated by mStruct.

  19. The aggregate site frequency spectrum (aSFS) for comparative population genomic inference

    PubMed Central

    Xue, Alexander T.; Hickerson, Michael J.

    2015-01-01

    Understanding how assemblages of species responded to past climate change is a central goal of comparative phylogeography and comparative population genomics, an endeavor that has increasing potential to integrate with community ecology. New sequencing technology now provides the potential to perform complex demographic inference at unprecedented resolution across assemblages of non-model species. To this end, we introduce the aggregate site frequency spectrum (aSFS), an expansion of the site frequency spectrum to use single nucleotide polymorphism (SNP) datasets collected from multiple, co-distributed species for assemblage-level demographic inference. We describe how the aSFS is constructed over an arbitrary number of independent population samples and then demonstrate how the aSFS can differentiate various multi-species demographic histories under a wide range of sampling configurations while allowing effective population sizes and expansion magnitudes to vary independently. We subsequently couple the aSFS with a hierarchical approximate Bayesian computation (hABC) framework to estimate degree of temporal synchronicity in expansion times across taxa, including an empirical demonstration with a dataset consisting of five populations of the threespine stickleback (Gasterosteus aculeatus). Corroborating what is generally understood about the recent post-glacial origins of these populations, the joint aSFS/hABC analysis strongly suggests that the stickleback data are most consistent with synchronous expansion after the Last Glacial Maximum (posterior probability = 0.99). The aSFS will have general application for multi-level statistical frameworks to test models involving assemblages and/or communities and as large-scale SNP data from non-model species become routine, the aSFS expands the potential for powerful next-generation comparative population genomic inference. PMID:26769405

  20. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs.

    PubMed

    Dilthey, Alexander T; Gourraud, Pierre-Antoine; Mentzer, Alexander J; Cereb, Nezih; Iqbal, Zamin; McVean, Gil

    2016-10-01

    Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30-250 CPU hours per sample) remain a significant

  1. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs

    PubMed Central

    Dilthey, Alexander T.; Gourraud, Pierre-Antoine; McVean, Gil

    2016-01-01

    Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30–250 CPU hours per sample) remain a significant

  2. Genome-Wide SNP Discovery, Genotyping and Their Preliminary Applications for Population Genetic Inference in Spotted Sea Bass (Lateolabrax maculatus)

    PubMed Central

    Wang, Juan; Xue, Dong-Xiu; Zhang, Bai-Dong; Li, Yu-Long; Liu, Bing-Jian; Liu, Jin-Xian

    2016-01-01

    Next-generation sequencing and the collection of genome-wide single-nucleotide polymorphisms (SNPs) allow identifying fine-scale population genetic structure and genomic regions under selection. The spotted sea bass (Lateolabrax maculatus) is a non-model species of ecological and commercial importance and widely distributed in northwestern Pacific. A total of 22 648 SNPs was discovered across the genome of L. maculatus by paired-end sequencing of restriction-site associated DNA (RAD-PE) for 30 individuals from two populations. The nucleotide diversity (π) for each population was 0.0028±0.0001 in Dandong and 0.0018±0.0001 in Beihai, respectively. Shallow but significant genetic differentiation was detected between the two populations analyzed by using both the whole data set (FST = 0.0550, P < 0.001) and the putatively neutral SNPs (FST = 0.0347, P < 0.001). However, the two populations were highly differentiated based on the putatively adaptive SNPs (FST = 0.6929, P < 0.001). Moreover, a total of 356 SNPs representing 298 unique loci were detected as outliers putatively under divergent selection by FST-based outlier tests as implemented in BAYESCAN and LOSITAN. Functional annotation of the contigs containing putatively adaptive SNPs yielded hits for 22 of 55 (40%) significant BLASTX matches. Candidate genes for local selection constituted a wide array of functions, including binding, catalytic and metabolic activities, etc. The analyses with the SNPs developed in the present study highlighted the importance of genome-wide genetic variation for inference of population structure and local adaptation in L. maculatus. PMID:27336696

  3. Inferring causal genomic alterations in breast cancer using gene expression data

    PubMed Central

    2011-01-01

    Background One of the primary objectives in cancer research is to identify causal genomic alterations, such as somatic copy number variation (CNV) and somatic mutations, during tumor development. Many valuable studies lack genomic data to detect CNV; therefore, methods that are able to infer CNVs from gene expression data would help maximize the value of these studies. Results We developed a framework for identifying recurrent regions of CNV and distinguishing the cancer driver genes from the passenger genes in the regions. By inferring CNV regions across many datasets we were able to identify 109 recurrent amplified/deleted CNV regions. Many of these regions are enriched for genes involved in many important processes associated with tumorigenesis and cancer progression. Genes in these recurrent CNV regions were then examined in the context of gene regulatory networks to prioritize putative cancer driver genes. The cancer driver genes uncovered by the framework include not only well-known oncogenes but also a number of novel cancer susceptibility genes validated via siRNA experiments. Conclusions To our knowledge, this is the first effort to systematically identify and validate drivers for expression based CNV regions in breast cancer. The framework where the wavelet analysis of copy number alteration based on expression coupled with the gene regulatory network analysis, provides a blueprint for leveraging genomic data to identify key regulatory components and gene targets. This integrative approach can be applied to many other large-scale gene expression studies and other novel types of cancer data such as next-generation sequencing based expression (RNA-Seq) as well as CNV data. PMID:21806811

  4. Accurate inference of subtle population structure (and other genetic discontinuities) using principal coordinates

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Accurate inference of genetic discontinuities between populations is an essential component of intraspecific biodiversity and evolution studies, as well as associative genetics. The most widely used methods to infer population structure are model based, Bayesian MCMC procedures that minimize Hardy...

  5. Comparative genome analyses of Arabidopsis spp.: Inferring chromosomal rearrangement events in the evolutionary history of A. thaliana

    PubMed Central

    Yogeeswaran, Krithika; Frary, Amy; York, Thomas L.; Amenta, Alison; Lesser, Andrew H.; Nasrallah, June B.; Tanksley, Steven D.; Nasrallah, Mikhail E.

    2005-01-01

    Comparative genome analysis is a powerful tool that can facilitate the reconstruction of the evolutionary history of the genomes of modern-day species. The model plant Arabidopsis thaliana with its n = 5 genome is thought to be derived from an ancestral n = 8 genome. Pairwise comparative genome analyses of A. thaliana with polyploid and diploid Brassicaceae species have suggested that rapid genome evolution, manifested by chromosomal rearrangements and duplications, characterizes the polyploid, but not the diploid, lineages of this family. In this study, we constructed a low-density genetic linkage map of Arabidopsis lyrata ssp. lyrata (A. l. lyrata; n = 8, diploid), the closest known relative of A. thaliana (MRCA ∼5 Mya), using A. thaliana-specific markers that resolve into the expected eight linkage groups. We then performed comparative Bayesian analyses using raw mapping data from this study and from a Capsella study to infer the number and nature of rearrangements that distinguish the n = 8 genomes of A. l. lyrata and Capsella from the n = 5 genome of A. thaliana. We conclude that there is strong statistical support in favor of the parsimony scenarios of 10 major chromosomal rearrangements separating these n = 8 genomes from A. thaliana. These chromosomal rearrangement events contribute to a rate of chromosomal evolution higher than previously reported in this lineage. We infer that at least seven of these events, common to both sets of data, are responsible for the change in karyotype and underlie genome reduction in A. thaliana. PMID:15805492

  6. Impact of Sample Type and DNA Isolation Procedure on Genomic Inference of Microbiome Composition

    PubMed Central

    Munk, Patrick; Lukjancenko, Oksana; Priemé, Anders; Aarestrup, Frank M.

    2016-01-01

    ABSTRACT Explorations of complex microbiomes using genomics greatly enhance our understanding about their diversity, biogeography, and function. The isolation of DNA from microbiome specimens is a key prerequisite for such examinations, but challenges remain in obtaining sufficient DNA quantities required for certain sequencing approaches, achieving accurate genomic inference of microbiome composition, and facilitating comparability of findings across specimen types and sequencing projects. These aspects are particularly relevant for the genomics-based global surveillance of infectious agents and antimicrobial resistance from different reservoirs. Here, we compare in a stepwise approach a total of eight commercially available DNA extraction kits and 16 procedures based on these for three specimen types (human feces, pig feces, and hospital sewage). We assess DNA extraction using spike-in controls and different types of beads for bead beating, facilitating cell lysis. We evaluate DNA concentration, purity, and stability and microbial community composition using 16S rRNA gene sequencing and for selected samples using shotgun metagenomic sequencing. Our results suggest that inferred community composition was dependent on inherent specimen properties as well as DNA extraction method. We further show that bead beating or enzymatic treatment can increase the extraction of DNA from Gram-positive bacteria. Final DNA quantities could be increased by isolating DNA from a larger volume of cell lysate than that in standard protocols. Based on this insight, we designed an improved DNA isolation procedure optimized for microbiome genomics that can be used for the three examined specimen types and potentially also for other biological specimens. A standard operating procedure is available from https://dx.doi.org/10.6084/m9.figshare.3475406. IMPORTANCE Sequencing-based analyses of microbiomes may lead to a breakthrough in our understanding of the microbial worlds associated with

  7. The Phylogeny and Evolutionary Timescale of Muscoidea (Diptera: Brachycera: Calyptratae) Inferred from Mitochondrial Genomes

    PubMed Central

    Wang, Ning; Cameron, Stephen L.; Mao, Meng; Wang, Yuyu; Xi, Yuqiang; Yang, Ding

    2015-01-01

    Muscoidea is a significant dipteran clade that includes house flies (Family Muscidae), latrine flies (F. Fannidae), dung flies (F. Scathophagidae) and root maggot flies (F. Anthomyiidae). It is comprised of approximately 7000 described species. The monophyly of the Muscoidea and the precise relationships of muscoids to the closest superfamily the Oestroidea (blow flies, flesh flies etc) are both unresolved. Until now mitochondrial (mt) genomes were available for only two of the four muscoid families precluding a thorough test of phylogenetic relationships using this data source. Here we present the first two mt genomes for the families Fanniidae (Euryomma sp.) (family Fanniidae) and Anthomyiidae (Delia platura (Meigen, 1826)). We also conducted phylogenetic analyses containing of these newly sequenced mt genomes plus 15 other species representative of dipteran diversity to address the internal relationship of Muscoidea and its systematic position. Both maximum-likelihood and Bayesian analyses suggested that Muscoidea was not a monophyletic group with the relationship: (Fanniidae + Muscidae) + ((Anthomyiidae + Scathophagidae) + (Calliphoridae + Sarcophagidae)), supported by the majority of analysed datasets. This also infers that Oestroidea was paraphyletic in the majority of analyses. Divergence time estimation suggested that the earliest split within the Calyptratae, separating (Tachinidae + Oestridae) from the remaining families, occurred in the Early Eocene. The main divergence within the paraphyletic muscoidea grade was between Fanniidae + Muscidae and the lineage ((Anthomyiidae + Scathophagidae) + (Calliphoridae + Sarcophagidae)) which occurred in the Late Eocene. PMID:26225760

  8. Structure identification in fuzzy inference using reinforcement learning

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.; Khedkar, Pratap

    1993-01-01

    In our previous work on the GARIC architecture, we have shown that the system can start with surface structure of the knowledge base (i.e., the linguistic expression of the rules) and learn the deep structure (i.e., the fuzzy membership functions of the labels used in the rules) by using reinforcement learning. Assuming the surface structure, GARIC refines the fuzzy membership functions used in the consequents of the rules using a gradient descent procedure. This hybrid fuzzy logic and reinforcement learning approach can learn to balance a cart-pole system and to backup a truck to its docking location after a few trials. In this paper, we discuss how to do structure identification using reinforcement learning in fuzzy inference systems. This involves identifying both surface as well as deep structure of the knowledge base. The term set of fuzzy linguistic labels used in describing the values of each control variable must be derived. In this process, splitting a label refers to creating new labels which are more granular than the original label and merging two labels creates a more general label. Splitting and merging of labels directly transform the structure of the action selection network used in GARIC by increasing or decreasing the number of hidden layer nodes.

  9. The influence of genomic context on mutation patterns in the human genome inferred from rare variants.

    PubMed

    Schaibley, Valerie M; Zawistowski, Matthew; Wegmann, Daniel; Ehm, Margaret G; Nelson, Matthew R; St Jean, Pamela L; Abecasis, Gonçalo R; Novembre, John; Zöllner, Sebastian; Li, Jun Z

    2013-12-01

    Understanding patterns of spontaneous mutations is of fundamental interest in studies of human genome evolution and genetic disease. Here, we used extremely rare variants in humans to model the molecular spectrum of single-nucleotide mutations. Compared to common variants in humans and human-chimpanzee fixed differences (substitutions), rare variants, on average, arose more recently in the human lineage and are less affected by the potentially confounding effects of natural selection, population demographic history, and biased gene conversion. We analyzed variants obtained from a population-based sequencing study of 202 genes in >14,000 individuals. We observed considerable variability in the per-gene mutation rate, which was correlated with local GC content, but not recombination rate. Using >20,000 variants with a derived allele frequency ≤ 10(-4), we examined the effect of local GC content and recombination rate on individual variant subtypes and performed comparisons with common variants and substitutions. The influence of local GC content on rare variants differed from that on common variants or substitutions, and the differences varied by variant subtype. Furthermore, recombination rate and recombination hotspots have little effect on rare variants of any subtype, yet both have a relatively strong impact on multiple variant subtypes in common variants and substitutions. This observation is consistent with the effect of biased gene conversion or selection-dependent processes. Our results highlight the distinct biases inherent in the initial mutation patterns and subsequent evolutionary processes that affect segregating variants.

  10. Structural genomics of pathogenic protozoa: an overview.

    PubMed

    Fan, Erkang; Baker, David; Fields, Stanley; Gelb, Michael H; Buckner, Frederick S; Van Voorhis, Wesley C; Phizicky, Eric; Dumont, Mark; Mehlin, Christopher; Grayhack, Elizabeth; Sullivan, Mark; Verlinde, Christophe; Detitta, George; Meldrum, Deirdre R; Merritt, Ethan A; Earnest, Thomas; Soltis, Michael; Zucker, Frank; Myler, Peter J; Schoenfeld, Lori; Kim, David; Worthey, Liz; Lacount, Doug; Vignali, Marissa; Li, Jizhen; Mondal, Somnath; Massey, Archna; Carroll, Brian; Gulde, Stacey; Luft, Joseph; Desoto, Larry; Holl, Mark; Caruthers, Jonathan; Bosch, Jürgen; Robien, Mark; Arakaki, Tracy; Holmes, Margaret; Le Trong, Isolde; Hol, Wim G J

    2008-01-01

    The Structural Genomics of Pathogenic Protozoa (SGPP) Consortium aimed to determine crystal structures of proteins from trypanosomatid and malaria parasites in a high throughput manner. The pipeline of target selection, protein production, crystallization, and structure determination, is sketched. Special emphasis is given to a number of technology developments including domain prediction, the use of "co-crystallants," and capillary crystallization. "Fragment cocktail crystallography" for medical structural genomics is also described.

  11. Integration of Multiple Genomic and Phenotype Data to Infer Novel miRNA-Disease Associations

    PubMed Central

    Zhou, Meng; Cheng, Liang; Yang, Haixiu; Wang, Jing; Sun, Jie; Wang, Zhenzhen

    2016-01-01

    MicroRNAs (miRNAs) play an important role in the development and progression of human diseases. The identification of disease-associated miRNAs will be helpful for understanding the molecular mechanisms of diseases at the post-transcriptional level. Based on different types of genomic data sources, computational methods for miRNA-disease association prediction have been proposed. However, individual source of genomic data tends to be incomplete and noisy; therefore, the integration of various types of genomic data for inferring reliable miRNA-disease associations is urgently needed. In this study, we present a computational framework, CHNmiRD, for identifying miRNA-disease associations by integrating multiple genomic and phenotype data, including protein-protein interaction data, gene ontology data, experimentally verified miRNA-target relationships, disease phenotype information and known miRNA-disease connections. The performance of CHNmiRD was evaluated by experimentally verified miRNA-disease associations, which achieved an area under the ROC curve (AUC) of 0.834 for 5-fold cross-validation. In particular, CHNmiRD displayed excellent performance for diseases without any known related miRNAs. The results of case studies for three human diseases (glioblastoma, myocardial infarction and type 1 diabetes) showed that all of the top 10 ranked miRNAs having no known associations with these three diseases in existing miRNA-disease databases were directly or indirectly confirmed by our latest literature mining. All these results demonstrated the reliability and efficiency of CHNmiRD, and it is anticipated that CHNmiRD will serve as a powerful bioinformatics method for mining novel disease-related miRNAs and providing a new perspective into molecular mechanisms underlying human diseases at the post-transcriptional level. CHNmiRD is freely available at http://www.bio-bigdata.com/CHNmiRD. PMID:26849207

  12. Integration of Multiple Genomic and Phenotype Data to Infer Novel miRNA-Disease Associations.

    PubMed

    Shi, Hongbo; Zhang, Guangde; Zhou, Meng; Cheng, Liang; Yang, Haixiu; Wang, Jing; Sun, Jie; Wang, Zhenzhen

    2016-01-01

    MicroRNAs (miRNAs) play an important role in the development and progression of human diseases. The identification of disease-associated miRNAs will be helpful for understanding the molecular mechanisms of diseases at the post-transcriptional level. Based on different types of genomic data sources, computational methods for miRNA-disease association prediction have been proposed. However, individual source of genomic data tends to be incomplete and noisy; therefore, the integration of various types of genomic data for inferring reliable miRNA-disease associations is urgently needed. In this study, we present a computational framework, CHNmiRD, for identifying miRNA-disease associations by integrating multiple genomic and phenotype data, including protein-protein interaction data, gene ontology data, experimentally verified miRNA-target relationships, disease phenotype information and known miRNA-disease connections. The performance of CHNmiRD was evaluated by experimentally verified miRNA-disease associations, which achieved an area under the ROC curve (AUC) of 0.834 for 5-fold cross-validation. In particular, CHNmiRD displayed excellent performance for diseases without any known related miRNAs. The results of case studies for three human diseases (glioblastoma, myocardial infarction and type 1 diabetes) showed that all of the top 10 ranked miRNAs having no known associations with these three diseases in existing miRNA-disease databases were directly or indirectly confirmed by our latest literature mining. All these results demonstrated the reliability and efficiency of CHNmiRD, and it is anticipated that CHNmiRD will serve as a powerful bioinformatics method for mining novel disease-related miRNAs and providing a new perspective into molecular mechanisms underlying human diseases at the post-transcriptional level. CHNmiRD is freely available at http://www.bio-bigdata.com/CHNmiRD.

  13. Inferring Quantitative Trait Pathways Associated with Bull Fertility from a Genome-Wide Association Study

    PubMed Central

    Peñagaricano, Francisco; Weigel, Kent A.; Rosa, Guilherme J. M.; Khatib, Hasan

    2013-01-01

    Whole-genome association studies typically focus on genetic markers with the strongest evidence of association. However, single markers often explain only a small component of the genetic variance and hence offer a limited understanding of the trait under study. As such, the objective of this study was to perform a pathway-based association analysis in Holstein dairy cattle in order to identify relevant pathways involved in bull fertility. The results of a single-marker association analysis, using 1,755 bulls with sire conception rate data and genotypes for 38,650 single nucleotide polymorphisms (SNPs), were used in this study. A total of 16,819 annotated genes, including 2,767 significantly associated with bull fertility, were used to interrogate a total of 662 Gene Ontology (GO) terms and 248 InterPro (IP) entries using a test of proportions based on the cumulative hypergeometric distribution. After multiple-testing correction, 20 GO categories and one IP entry showed significant overrepresentation of genes statistically associated with bull fertility. Several of these functional categories such as small GTPases mediated signal transduction, neurogenesis, calcium ion binding, and cytoskeleton are known to be involved in biological processes closely related to male fertility. These results could provide insight into the genetic architecture of this complex trait in dairy cattle. In addition, this study shows that quantitative trait pathways inferred from single-marker analyses could enhance our interpretations of the results of genome-wide association studies. PMID:23335935

  14. ABC inference of multi-population divergence with admixture from unphased population genomic data

    PubMed Central

    Robinson, John D; Bunnefeld, Lynsey; Hearn, Jack; Stone, Graham N; Hickerson, Michael J

    2014-01-01

    Rapidly developing sequencing technologies and declining costs have made it possible to collect genome-scale data from population-level samples in nonmodel systems. Inferential tools for historical demography given these data sets are, at present, underdeveloped. In particular, approximate Bayesian computation (ABC) has yet to be widely embraced by researchers generating these data. Here, we demonstrate the promise of ABC for analysis of the large data sets that are now attainable from nonmodel taxa through current genomic sequencing technologies. We develop and test an ABC framework for model selection and parameter estimation, given histories of three-population divergence with admixture. We then explore different sampling regimes to illustrate how sampling more loci, longer loci or more individuals affects the quality of model selection and parameter estimation in this ABC framework. Our results show that inferences improved substantially with increases in the number and/or length of sequenced loci, while less benefit was gained by sampling large numbers of individuals. Optimal sampling strategies given our inferential models included at least 2000 loci, each approximately 2 kb in length, sampled from five diploid individuals per population, although specific strategies are model and question dependent. We tested our ABC approach through simulation-based cross-validations and illustrate its application using previously analysed data from the oak gall wasp, Biorhiza pallida. PMID:25113024

  15. ABC inference of multi-population divergence with admixture from unphased population genomic data.

    PubMed

    Robinson, John D; Bunnefeld, Lynsey; Hearn, Jack; Stone, Graham N; Hickerson, Michael J

    2014-09-01

    Rapidly developing sequencing technologies and declining costs have made it possible to collect genome-scale data from population-level samples in nonmodel systems. Inferential tools for historical demography given these data sets are, at present, underdeveloped. In particular, approximate Bayesian computation (ABC) has yet to be widely embraced by researchers generating these data. Here, we demonstrate the promise of ABC for analysis of the large data sets that are now attainable from nonmodel taxa through current genomic sequencing technologies. We develop and test an ABC framework for model selection and parameter estimation, given histories of three-population divergence with admixture. We then explore different sampling regimes to illustrate how sampling more loci, longer loci or more individuals affects the quality of model selection and parameter estimation in this ABC framework. Our results show that inferences improved substantially with increases in the number and/or length of sequenced loci, while less benefit was gained by sampling large numbers of individuals. Optimal sampling strategies given our inferential models included at least 2000 loci, each approximately 2 kb in length, sampled from five diploid individuals per population, although specific strategies are model and question dependent. We tested our ABC approach through simulation-based cross-validations and illustrate its application using previously analysed data from the oak gall wasp, Biorhiza pallida.

  16. Orthology Inference in Nonmodel Organisms Using Transcriptomes and Low-Coverage Genomes: Improving Accuracy and Matrix Occupancy for Phylogenomics

    PubMed Central

    Yang, Ya; Smith, Stephen A.

    2014-01-01

    Orthology inference is central to phylogenomic analyses. Phylogenomic data sets commonly include transcriptomes and low-coverage genomes that are incomplete and contain errors and isoforms. These properties can severely violate the underlying assumptions of orthology inference with existing heuristics. We present a procedure that uses phylogenies for both homology and orthology assignment. The procedure first uses similarity scores to infer putative homologs that are then aligned, constructed into phylogenies, and pruned of spurious branches caused by deep paralogs, misassembly, frameshifts, or recombination. These final homologs are then used to identify orthologs. We explore four alternative tree-based orthology inference approaches, of which two are new. These accommodate gene and genome duplications as well as gene tree discordance. We demonstrate these methods in three published data sets including the grape family, Hymenoptera, and millipedes with divergence times ranging from approximately 100 to over 400 Ma. The procedure significantly increased the completeness and accuracy of the inferred homologs and orthologs. We also found that data sets that are more recently diverged and/or include more high-coverage genomes had more complete sets of orthologs. To explicitly evaluate sources of conflicting phylogenetic signals, we applied serial jackknife analyses of gene regions keeping each locus intact. The methods described here can scale to over 100 taxa. They have been implemented in python with independent scripts for each step, making it easy to modify or incorporate them into existing pipelines. All scripts are available from https://bitbucket.org/yangya/phylogenomic_dataset_construction. PMID:25158799

  17. Chapter 6: Structural variation and medical genomics.

    PubMed

    Raphael, Benjamin J

    2012-01-01

    Differences between individual human genomes, or between human and cancer genomes, range in scale from single nucleotide variants (SNVs) through intermediate and large-scale duplications, deletions, and rearrangements of genomic segments. The latter class, called structural variants (SVs), have received considerable attention in the past several years as they are a previously under appreciated source of variation in human genomes. Much of this recent attention is the result of the availability of higher-resolution technologies for measuring these variants, including both microarray-based techniques, and more recently, high-throughput DNA sequencing. We describe the genomic technologies and computational techniques currently used to measure SVs, focusing on applications in human and cancer genomics.

  18. PICARA, an analytical pipeline providing probabilistic inference about a priori candidates genes underlying genome-wide association QTL in plants

    Technology Transfer Automated Retrieval System (TEKTRAN)

    PICARA is an analytical pipeline designed to systematically summarize observed SNP/trait associations identified by genome wide association studies (GWAS) and to identify candidate genes involved in the regulation of complex trait variation. The pipeline provides probabilistic inference about a prio...

  19. Genomic analysis of circulating cell-free DNA infers breast cancer dormancy

    PubMed Central

    Shaw, Jacqueline A.; Page, Karen; Blighe, Kevin; Hava, Natasha; Guttery, David; Ward, Becky; Brown, James; Ruangpratheep, Chetana; Stebbing, Justin; Payne, Rachel; Palmieri, Carlo; Cleator, Suzy; Walker, Rosemary A.; Coombes, R. Charles

    2012-01-01

    Biomarkers in breast cancer to monitor minimal residual disease have remained elusive. We hypothesized that genomic analysis of circulating free DNA (cfDNA) isolated from plasma may form the basis for a means of detecting and monitoring breast cancer. We profiled 251 genomes using Affymetrix SNP 6.0 arrays to determine copy number variations (CNVs) and loss of heterozygosity (LOH), comparing 138 cfDNA samples with matched primary tumor and normal leukocyte DNA in 65 breast cancer patients and eight healthy female controls. Concordance of SNP genotype calls in paired cfDNA and leukocyte DNA samples distinguished between breast cancer patients and healthy female controls (P < 0.0001) and between preoperative patients and patients on follow-up who had surgery and treatment (P = 0.0016). Principal component analyses of cfDNA SNP/copy number results also separated presurgical breast cancer patients from the healthy controls, suggesting specific CNVs in cfDNA have clinical significance. We identified focal high-level DNA amplification in paired tumor and cfDNA clustered in a number of chromosome arms, some of which harbor genes with oncogenic potential, including USP17L2 (DUB3), BRF1, MTA1, and JAG2. Remarkably, in 50 patients on follow-up, specific CNVs were detected in cfDNA, mirroring the primary tumor, up to 12 yr after diagnosis despite no other evidence of disease. These data demonstrate the potential of SNP/CNV analysis of cfDNA to distinguish between patients with breast cancer and healthy controls during routine follow-up. The genomic profiles of cfDNA infer dormancy/minimal residual disease in the majority of patients on follow-up. PMID:21990379

  20. Genomic analysis of circulating cell-free DNA infers breast cancer dormancy.

    PubMed

    Shaw, Jacqueline A; Page, Karen; Blighe, Kevin; Hava, Natasha; Guttery, David; Ward, Becky; Brown, James; Ruangpratheep, Chetana; Stebbing, Justin; Payne, Rachel; Palmieri, Carlo; Cleator, Suzy; Walker, Rosemary A; Coombes, R Charles

    2012-02-01

    Biomarkers in breast cancer to monitor minimal residual disease have remained elusive. We hypothesized that genomic analysis of circulating free DNA (cfDNA) isolated from plasma may form the basis for a means of detecting and monitoring breast cancer. We profiled 251 genomes using Affymetrix SNP 6.0 arrays to determine copy number variations (CNVs) and loss of heterozygosity (LOH), comparing 138 cfDNA samples with matched primary tumor and normal leukocyte DNA in 65 breast cancer patients and eight healthy female controls. Concordance of SNP genotype calls in paired cfDNA and leukocyte DNA samples distinguished between breast cancer patients and healthy female controls (P < 0.0001) and between preoperative patients and patients on follow-up who had surgery and treatment (P = 0.0016). Principal component analyses of cfDNA SNP/copy number results also separated presurgical breast cancer patients from the healthy controls, suggesting specific CNVs in cfDNA have clinical significance. We identified focal high-level DNA amplification in paired tumor and cfDNA clustered in a number of chromosome arms, some of which harbor genes with oncogenic potential, including USP17L2 (DUB3), BRF1, MTA1, and JAG2. Remarkably, in 50 patients on follow-up, specific CNVs were detected in cfDNA, mirroring the primary tumor, up to 12 yr after diagnosis despite no other evidence of disease. These data demonstrate the potential of SNP/CNV analysis of cfDNA to distinguish between patients with breast cancer and healthy controls during routine follow-up. The genomic profiles of cfDNA infer dormancy/minimal residual disease in the majority of patients on follow-up.

  1. The evolutionary history of termites as inferred from 66 mitochondrial genomes.

    PubMed

    Bourguignon, Thomas; Lo, Nathan; Cameron, Stephen L; Šobotník, Jan; Hayashi, Yoshinobu; Shigenobu, Shuji; Watanabe, Dai; Roisin, Yves; Miura, Toru; Evans, Theodore A

    2015-02-01

    Termites have colonized many habitats and are among the most abundant animals in tropical ecosystems, which they modify considerably through their actions. The timing of their rise in abundance and of the dispersal events that gave rise to modern termite lineages is not well understood. To shed light on termite origins and diversification, we sequenced the mitochondrial genome of 48 termite species and combined them with 18 previously sequenced termite mitochondrial genomes for phylogenetic and molecular clock analyses using multiple fossil calibrations. The 66 genomes represent most major clades of termites. Unlike previous phylogenetic studies based on fewer molecular data, our phylogenetic tree is fully resolved for the lower termites. The phylogenetic positions of Macrotermitinae and Apicotermitinae are also resolved as the basal groups in the higher termites, but in the crown termitid groups, including Termitinae + Syntermitinae + Nasutitermitinae + Cubitermitinae, the position of some nodes remains uncertain. Our molecular clock tree indicates that the lineages leading to termites and Cryptocercus roaches diverged 170 Ma (153-196 Ma 95% confidence interval [CI]), that modern Termitidae arose 54 Ma (46-66 Ma 95% CI), and that the crown termitid group arose 40 Ma (35-49 Ma 95% CI). This indicates that the distribution of basal termite clades was influenced by the final stages of the breakup of Pangaea. Our inference of ancestral geographic ranges shows that the Termitidae, which includes more than 75% of extant termite species, most likely originated in Africa or Asia, and acquired their pantropical distribution after a series of dispersal and subsequent diversification events.

  2. Genome Structure Gallery from the Mycobacterium Tuberculosis Structual Genomics Consortium

    DOE Data Explorer

    The TB Structural Genomics Consortium works with the structures of proteins from M. tuberculosis, analyzing these structures in the context of functional information that currently exists and that the Consortium generates. The database of linked structural and functional information constructed from this project will form a lasting basis for understanding M. tuberculosis pathogenesis and for structure-based drug design. The Consortium's structural and functional information is publicly available. The Structures Gallery makes more than 650 total structures available by PDB identifier. Some of these are not consortium targets, but all are viewable in 3D color and can be manipulated in various ways by Jmol, an open-source Java viewer for chemical structures in 3D from http://www.jmol.org/

  3. Structure-based function inference using protein family-specific fingerprints

    PubMed Central

    Bandyopadhyay, Deepak; Huan, Jun; Liu, Jinze; Prins, Jan; Snoeyink, Jack; Wang, Wei; Tropsha, Alexander

    2006-01-01

    We describe a method to assign a protein structure to a functional family using family-specific fingerprints. Fingerprints represent amino acid packing patterns that occur in most members of a family but are rare in the background, a nonredundant subset of PDB; their information is additional to sequence alignments, sequence patterns, structural superposition, and active-site templates. Fingerprints were derived for 120 families in SCOP using Frequent Subgraph Mining. For a new structure, all occurrences of these family-specific fingerprints may be found by a fast algorithm for subgraph isomorphism; the structure can then be assigned to a family with a confidence value derived from the number of fingerprints found and their distribution in background proteins. In validation experiments, we infer the function of new members added to SCOP families and we discriminate between structurally similar, but functionally divergent TIM barrel families. We then apply our method to predict function for several structural genomics proteins, including orphan structures. Some predictions have been corroborated by other computational methods and some validated by subsequent functional characterization. PMID:16731985

  4. Ceres' internal structure as inferred from its large craters

    NASA Astrophysics Data System (ADS)

    Marchi, Simone; Raymond, Carol; Fu, Roger; Ermakov, Anton I.; O'Brien, David P.; De Sanctis, Cristina; Ammannito, Eleonora; Russell, Christopher T.

    2016-10-01

    The Dawn spacecraft has gathered important data about the surface composition, internal structure, and geomorphology of Ceres, revealing a cratered landscape. Digital terrain models and global mosaics have been used to derive a global catalog of impact craters larger than 10 km in diameter. A surface dichotomy appears evident: a large fraction of the northern hemisphere is heavily cratered as the result of several billion of years of collisions, while portions of the equatorial region and southern hemisphere are much less cratered. The latter are associated with the presence of the two largest (~270-280 km) impact craters, Kerwan and Yalode. The global crater count shows a severe depletion for diameters larger than 100-150 km with respect to collisional models and other large asteroids, like Vesta. This is a strong indication that a significant population of large cerean craters has been obliterated over geological time-scales. This observation is supported by the overall topographic power spectrum of Ceres, which shows that long wavelengths in topography are suppressed (that is, flatter surface) compared to short wavelengths.Viscous relaxation of topography may be a natural culprit for the observed paucity of large craters. Relaxation accommodated by the creep of water ice is expected to result in much more rapid and complete decay of topography than inferred. In contrast, we favor a strong crust composed of a mixture of silicates and salt species (<30% vol water ice) with viscosity decreasing by two-three orders of magnitude in the top 45-70 km of Ceres' crust. This model can account for the observed topography power spectrum and explain the lack of craters in the size range ~100-600 km.Interestingly, Ceres' surface exhibits an 800-km-wide, 4-km-deep depression, known as Vendimia Planitia. The overall topography of Vendimia Planitia is compatible with a partially relaxed mega impact structure. The presence of such a large scale depression bears implications for

  5. Effect of sampling on the extent and accuracy of the inferred genetic history of recombining genome.

    PubMed

    Platt, Daniel E; Utro, Filippo; Parida, Laxmi

    2014-06-01

    Accessible biotechnology is enabling the cataloging of genetic variants in individuals in populations at unprecedented scales. The use of phylogeny of the individuals within populations allows a model-based approach to studying these variations, which is important in understanding relationships between and across populations. For the somatic genome, however, the phylogeny must take recombinations (and other genetic mixing events) into account. Hence the resulting topology is more complex than a tree. Unlike a tree topology, it is not as apparent which events are visible from the extant samples. An earlier work presented a mathematical model (called the minimal descriptor) for teasing apart the inherent visible information from that which any specific algorithm might see. We use this framework to study the effect of sampling sizes on the overall inferred genetic history. In this paper, we seek to understand the extent, characteristics (in terms of recent versus ancient genetic events) and reliability of what was resolvable within field samples drawn from modern populations. We observed that most of the visible ancient events are recoverable from relatively small sample sizes. However, without identification of this relatively small minority of ancient genetic events, most of the signal will appear to reflect modern events and admixtures. We also found that the more ancient events are likely to be reproduced with higher fidelity between multiple samplings, and that the identified older events are less likely to yield false positive discrimination between populations. We conclude that a recombinant phylogenetic reconstruction is necessary to identify which markers are most likely to discriminate ancient events, and to discriminate between populations with lower risk of false positives. Secondly, on a broader note, this study also provides a general methodology for a critical assessment of the inferred common genetic history of populations (say, in plant cultivars or

  6. Inferring the structure and dynamics of interactions in schooling fish

    PubMed Central

    Katz, Yael; Tunstrøm, Kolbjørn; Ioannou, Christos C.; Huepe, Cristián; Couzin, Iain D.

    2011-01-01

    Determining individual-level interactions that govern highly coordinated motion in animal groups or cellular aggregates has been a long-standing challenge, central to understanding the mechanisms and evolution of collective behavior. Numerous models have been proposed, many of which display realistic-looking dynamics, but nonetheless rely on untested assumptions about how individuals integrate information to guide movement. Here we infer behavioral rules directly from experimental data. We begin by analyzing trajectories of golden shiners (Notemigonus crysoleucas) swimming in two-fish and three-fish shoals to map the mean effective forces as a function of fish positions and velocities. Speeding and turning responses are dynamically modulated and clearly delineated. Speed regulation is a dominant component of how fish interact, and changes in speed are transmitted to those both behind and ahead. Alignment emerges from attraction and repulsion, and fish tend to copy directional changes made by those ahead. We find no evidence for explicit matching of body orientation. By comparing data from two-fish and three-fish shoals, we challenge the standard assumption, ubiquitous in physics-inspired models of collective behavior, that individual motion results from averaging responses to each neighbor considered separately; three-body interactions make a substantial contribution to fish dynamics. However, pairwise interactions qualitatively capture the correct spatial interaction structure in small groups, and this structure persists in larger groups of 10 and 30 fish. The interactions revealed here may help account for the rapid changes in speed and direction that enable real animal groups to stay cohesive and amplify important social information. PMID:21795604

  7. Genomic inference accurately predicts the timing and severity of a recent bottleneck in a non-model insect population

    PubMed Central

    McCoy, Rajiv C.; Garud, Nandita R.; Kelley, Joanna L.; Boggs, Carol L.; Petrov, Dmitri A.

    2015-01-01

    The analysis of molecular data from natural populations has allowed researchers to answer diverse ecological questions that were previously intractable. In particular, ecologists are often interested in the demographic history of populations, information that is rarely available from historical records. Methods have been developed to infer demographic parameters from genomic data, but it is not well understood how inferred parameters compare to true population history or depend on aspects of experimental design. Here we present and evaluate a method of SNP discovery using RNA-sequencing and demographic inference using the program δaδi, which uses a diffusion approximation to the allele frequency spectrum to fit demographic models. We test these methods in a population of the checkerspot butterfly Euphydryas gillettii. This population was intentionally introduced to Gothic, Colorado in 1977 and has since experienced extreme fluctuations including bottlenecks of fewer than 25 adults, as documented by nearly annual field surveys. Using RNA-sequencing of eight individuals from Colorado and eight individuals from a native population in Wyoming, we generate the first genomic resources for this system. While demographic inference is commonly used to examine ancient demography, our study demonstrates that our inexpensive, all-in-one approach to marker discovery and genotyping provides sufficient data to accurately infer the timing of a recent bottleneck. This demographic scenario is relevant for many species of conservation concern, few of which have sequenced genomes. Our results are remarkably insensitive to sample size or number of genomic markers, which has important implications for applying this method to other non-model systems. PMID:24237665

  8. epiG: statistical inference and profiling of DNA methylation from whole-genome bisulfite sequencing data.

    PubMed

    Vincent, Martin; Mundbjerg, Kamilla; Skou Pedersen, Jakob; Liang, Gangning; Jones, Peter A; Ørntoft, Torben Falck; Dalsgaard Sørensen, Karina; Wiuf, Carsten

    2017-02-21

    The study of epigenetic heterogeneity at the level of individual cells and in whole populations is the key to understanding cellular differentiation, organismal development, and the evolution of cancer. We develop a statistical method, epiG, to infer and differentiate between different epi-allelic haplotypes, annotated with CpG methylation status and DNA polymorphisms, from whole-genome bisulfite sequencing data, and nucleosome occupancy from NOMe-seq data. We demonstrate the capabilities of the method by inferring allele-specific methylation and nucleosome occupancy in cell lines, and colon and tumor samples, and by benchmarking the method against independent experimental data.

  9. Identification of structural variation in mouse genomes

    PubMed Central

    Keane, Thomas M.; Wong, Kim; Adams, David J.; Flint, Jonathan; Reymond, Alexandre; Yalcin, Binnaz

    2014-01-01

    Structural variation is variation in structure of DNA regions affecting DNA sequence length and/or orientation. It generally includes deletions, insertions, copy-number gains, inversions, and transposable elements. Traditionally, the identification of structural variation in genomes has been challenging. However, with the recent advances in high-throughput DNA sequencing and paired-end mapping (PEM) methods, the ability to identify structural variation and their respective association to human diseases has improved considerably. In this review, we describe our current knowledge of structural variation in the mouse, one of the prime model systems for studying human diseases and mammalian biology. We further present the evolutionary implications of structural variation on transposable elements. We conclude with future directions on the study of structural variation in mouse genomes that will increase our understanding of molecular architecture and functional consequences of structural variation. PMID:25071822

  10. The fractal structure of the mitochondrial genomes

    NASA Astrophysics Data System (ADS)

    Oiwa, Nestor N.; Glazier, James A.

    2002-08-01

    The mitochondrial DNA genome has a definite multifractal structure. We show that loops, hairpins and inverted palindromes are responsible for this self-similarity. We can thus establish a definite relation between the function of subsequences and their fractal dimension. Intriguingly, protein coding DNAs also exhibit palindromic structures, although they do not appear in the sequence of amino acids. These structures may reflect the stabilization and transcriptional control of DNA or the control of posttranscriptional editing of mRNA.

  11. Functional coverage of the human genome by existing structures, structural genomics targets, and homology models.

    PubMed

    Xie, Lei; Bourne, Philip E

    2005-08-01

    The bias in protein structure and function space resulting from experimental limitations and targeting of particular functional classes of proteins by structural biologists has long been recognized, but never continuously quantified. Using the Enzyme Commission and the Gene Ontology classifications as a reference frame, and integrating structure data from the Protein Data Bank (PDB), target sequences from the structural genomics projects, structure homology derived from the SUPERFAMILY database, and genome annotations from Ensembl and NCBI, we provide a quantified view, both at the domain and whole-protein levels, of the current and projected coverage of protein structure and function space relative to the human genome. Protein structures currently provide at least one domain that covers 37% of the functional classes identified in the genome; whole structure coverage exists for 25% of the genome. If all the structural genomics targets were solved (twice the current number of structures in the PDB), it is estimated that structures of one domain would cover 69% of the functional classes identified and complete structure coverage would be 44%. Homology models from existing experimental structures extend the 37% coverage to 56% of the genome as single domains and 25% to 31% for complete structures. Coverage from homology models is not evenly distributed by protein family, reflecting differing degrees of sequence and structure divergence within families. While these data provide coverage, conversely, they also systematically highlight functional classes of proteins for which structures should be determined. Current key functional families without structure representation are highlighted here; updated information on the "most wanted list" that should be solved is available on a weekly basis from http://function.rcsb.org:8080/pdb/function_distribution/index.html.

  12. A physical map for the Amborella trichopoda genome sheds light on the evolution of angiosperm genome structure

    PubMed Central

    2011-01-01

    Background Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome. Results Analysis of Amborella BAC ends sequenced from each contig suggests that the density of long terminal repeat retrotransposons is negatively correlated with that of protein coding genes. Syntenic, presumably ancestral, gene blocks were identified in comparisons of the Amborella BAC contigs and the sequenced Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa genomes. Parsimony mapping of the loss of synteny corroborates previous analyses suggesting that the rate of structural change has been more rapid on lineages leading to Arabidopsis and Oryza compared with lineages leading to Populus and Vitis. The gamma paleohexiploidy event identified in the Arabidopsis, Populus and Vitis genomes is shown to have occurred after the divergence of all other known angiosperms from the lineage leading to Amborella. Conclusions When placed in the context of a physical map, BAC end sequences representing just 5.4% of the Amborella genome have facilitated reconstruction of gene blocks that existed in the last common ancestor of all flowering plants. The Amborella genome is an invaluable reference for inferences concerning the ancestral angiosperm and subsequent genome evolution. PMID:21619600

  13. Feature Inference and the Causal Structure of Categories

    ERIC Educational Resources Information Center

    Rehder, B.; Burnett, R.C.

    2005-01-01

    The purpose of this article was to establish how theoretical category knowledge-specifically, knowledge of the causal relations that link the features of categories-supports the ability to infer the presence of unobserved features. Our experiments were designed to test proposals that causal knowledge is represented psychologically as Bayesian…

  14. Genome Alignment Spanning Major Poaceae Lineages Reveals Heterogeneous Evolutionary Rates and Alters Inferred Dates for Key Evolutionary Events.

    PubMed

    Wang, Xiyin; Wang, Jingpeng; Jin, Dianchuan; Guo, Hui; Lee, Tae-Ho; Liu, Tao; Paterson, Andrew H

    2015-06-01

    Multiple comparisons among genomes can clarify their evolution, speciation, and functional innovations. To date, the genome sequences of eight grasses representing the most economically important Poaceae (grass) clades have been published, and their genomic-level comparison is an essential foundation for evolutionary, functional, and translational research. Using a formal and conservative approach, we aligned these genomes. Direct comparison of paralogous gene pairs all duplicated simultaneously reveal striking variation in evolutionary rates among whole genomes, with nucleotide substitution slowest in rice and up to 48% faster in other grasses, adding a new dimension to the value of rice as a grass model. We reconstructed ancestral genome contents for major evolutionary nodes, potentially contributing to understanding the divergence and speciation of grasses. Recent fossil evidence suggests revisions of the estimated dates of key evolutionary events, implying that the pan-grass polyploidization occurred ∼96 million years ago and could not be related to the Cretaceous-Tertiary mass extinction as previously inferred. Adjusted dating to reflect both updated fossil evidence and lineage-specific evolutionary rates suggested that maize subgenome divergence and maize-sorghum divergence were virtually simultaneous, a coincidence that would be explained if polyploidization directly contributed to speciation. This work lays a solid foundation for Poaceae translational genomics.

  15. An integrated approach to structural genomics.

    PubMed

    Heinemann, U; Frevert, J; Hofmann, K; Illing, G; Maurer, C; Oschkinat, H; Saenger, W

    2000-01-01

    Structural genomics aims at determining a set of protein structures that will represent all domain folds present in the biosphere. These structures can be used as the basis for the homology modelling of the majority of all remaining protein domains or, indeed, proteins. Structural genomics therefore promises to provide a comprehensive structural description of the protein universe. To achieve this, a broad scientific effort is required. The Berlin-based "Protein Structure Factory" (PSF) plans to contribute to this effort by setting up a local infrastructure for the low-cost, high-throughput analysis of soluble human proteins. In close collaboration with the German Human Genome Project (DHGP) protein-coding genes will be expressed in Escherichia coli or yeast. Affinity-tagged proteins will be purified semi-automatically for biophysical characterization and structure analysis by X-ray diffraction methods and NMR spectroscopy. In all steps of the structure analysis process, possibilities for automation, parallelization and standardization will be explored. Major new facilities that are created for the PSF include a robotic station for large-scale protein crystallization, an NMR center and an experimental station for protein crystallography at the synchrotron storage ring BESSY II in Berlin.

  16. Genome at Juncture of Early Human Migration: A Systematic Analysis of Two Whole Genomes and Thirteen Exomes from Kuwaiti Population Subgroup of Inferred Saudi Arabian Tribe Ancestry

    PubMed Central

    Alsmadi, Osama; Hebbar, Prashantha; Antony, Dinu; Behbehani, Kazem; Thanaraj, Thangavel Alphonse

    2014-01-01

    Population of the State of Kuwait is composed of three genetic subgroups of inferred Persian, Saudi Arabian tribe and Bedouin ancestry. The Saudi Arabian tribe subgroup traces its origin to the Najd region of Saudi Arabia. By sequencing two whole genomes and thirteen exomes from this subgroup at high coverage (>40X), we identify 4,950,724 Single Nucleotide Polymorphisms (SNPs), 515,802 indels and 39,762 structural variations. Of the identified variants, 10,098 (8.3%) exomic SNPs, 139,923 (2.9%) non-exomic SNPs, 5,256 (54.3%) exomic indels, and 374,959 (74.08%) non-exomic indels are ‘novel’. Up to 8,070 (79.9%) of the reported novel biallelic exomic SNPs are seen in low frequency (minor allele frequency <5%). We observe 5,462 known and 1,004 novel potentially deleterious nonsynonymous SNPs. Allele frequencies of common SNPs from the 15 exomes is significantly correlated with those from genotype data of a larger cohort of 48 individuals (Pearson correlation coefficient, 0.91; p <2.2×10−16). A set of 2,485 SNPs show significantly different allele frequencies when compared to populations from other continents. Two notable variants having risk alleles in high frequencies in this subgroup are: a nonsynonymous deleterious SNP (rs2108622 [19:g.15990431C>T] from CYP4F2 gene [MIM:*604426]) associated with warfarin dosage levels [MIM:#122700] required to elicit normal anticoagulant response; and a 3′ UTR SNP (rs6151429 [22:g.51063477T>C]) from ARSA gene [MIM:*607574]) associated with Metachromatic Leukodystrophy [MIM:#250100]. Hemoglobin Riyadh variant (identified for the first time in a Saudi Arabian woman) is observed in the exome data. The mitochondrial haplogroup profiles of the 15 individuals are consistent with the haplogroup diversity seen in Saudi Arabian natives, who are believed to have received substantial gene flow from Africa and eastern provenance. We present the first genome resource imperative for designing future genetic studies in Saudi Arabian

  17. Using Genomics for Natural Product Structure Elucidation.

    PubMed

    Tietz, Jonathan I; Mitchell, Douglas A

    2016-01-01

    Natural products (NPs) are the most historically bountiful source of chemical matter for drug development-especially for anti-infectives. With insights gleaned from genome mining, interest in natural product discovery has been reinvigorated. An essential stage in NP discovery is structural elucidation, which sheds light not only on the chemical composition of a molecule but also its novelty, properties, and derivatization potential. The history of structure elucidation is replete with techniquebased revolutions: combustion analysis, crystallography, UV, IR, MS, and NMR have each provided game-changing advances; the latest such advance is genomics. All natural products have a genetic basis, and the ability to obtain and interpret genomic information for structure elucidation is increasingly available at low cost to non-specialists. In this review, we describe the value of genomics as a structural elucidation technique, especially from the perspective of the natural product chemist approaching an unknown metabolite. Herein we first introduce the databases and programs of interest to the natural products chemist, with an emphasis on those currently most suited for general usability. We describe strategies for linking observed natural product-linked phenotypes to their corresponding gene clusters. We then discuss techniques for extracting structural information from genes, illustrated with numerous case examples. We also provide an analysis of the biases and limitations of the field with recommendations for future development. Our overview is not only aimed at biologically-oriented researchers already at ease with bioinformatic techniques, but also, in particular, at natural product, organic, and/or medicinal chemists not previously familiar with genomic techniques.

  18. A data management system for structural genomics

    PubMed Central

    Raymond, Stéphane; O'Toole, Nicholas; Cygler, Miroslaw

    2004-01-01

    Background Structural genomics (SG) projects aim to determine thousands of protein structures by the development of high-throughput techniques for all steps of the experimental structure determination pipeline. Crucial to the success of such endeavours is the careful tracking and archiving of experimental and external data on protein targets. Results We have developed a sophisticated data management system for structural genomics. Central to the system is an Oracle-based, SQL-interfaced database. The database schema deals with all facets of the structure determination process, from target selection to data deposition. Users access the database via any web browser. Experimental data is input by users with pre-defined web forms. Data can be displayed according to numerous criteria. A list of all current target proteins can be viewed, with links for each target to associated entries in external databases. To avoid unnecessary work on targets, our data management system matches protein sequences weekly using BLAST to entries in the Protein Data Bank and to targets of other SG centers worldwide. Conclusion Our system is a working, effective and user-friendly data management tool for structural genomics projects. In this report we present a detailed summary of the various capabilities of the system, using real target data as examples, and indicate our plans for future enhancements. PMID:15210054

  19. Interrogating the druggable genome with structural informatics.

    PubMed

    Hambly, Kevin; Danzer, Joseph; Muskal, Steven; Debe, Derek A

    2006-08-01

    Structural genomics projects are producing protein structure data at an unprecedented rate. In this paper, we present the Target Informatics Platform (TIP), a novel structural informatics approach for amplifying the rapidly expanding body of experimental protein structure information to enhance the discovery and optimization of small molecule protein modulators on a genomic scale. In TIP, existing experimental structure information is augmented using a homology modeling approach, and binding sites across multiple target families are compared using a clique detection algorithm. We report here a detailed analysis of the structural coverage for the set of druggable human targets, highlighting drug target families where the level of structural knowledge is currently quite high, as well as those areas where structural knowledge is sparse. Furthermore, we demonstrate the utility of TIP's intra- and inter-family binding site similarity analysis using a series of retrospective case studies. Our analysis underscores the utility of a structural informatics infrastructure for extracting drug discovery-relevant information from structural data, aiding researchers in the identification of lead discovery and optimization opportunities as well as potential "off-target" liabilities.

  20. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness.

    PubMed

    Conomos, Matthew P; Miller, Michael B; Thornton, Timothy A

    2015-05-01

    Population structure inference with genetic data has been motivated by a variety of applications in population genetics and genetic association studies. Several approaches have been proposed for the identification of genetic ancestry differences in samples where study participants are assumed to be unrelated, including principal components analysis (PCA), multidimensional scaling (MDS), and model-based methods for proportional ancestry estimation. Many genetic studies, however, include individuals with some degree of relatedness, and existing methods for inferring genetic ancestry fail in related samples. We present a method, PC-AiR, for robust population structure inference in the presence of known or cryptic relatedness. PC-AiR utilizes genome-screen data and an efficient algorithm to identify a diverse subset of unrelated individuals that is representative of all ancestries in the sample. The PC-AiR method directly performs PCA on the identified ancestry representative subset and then predicts components of variation for all remaining individuals based on genetic similarities. In simulation studies and in applications to real data from Phase III of the HapMap Project, we demonstrate that PC-AiR provides a substantial improvement over existing approaches for population structure inference in related samples. We also demonstrate significant efficiency gains, where a single axis of variation from PC-AiR provides better prediction of ancestry in a variety of structure settings than using 10 (or more) components of variation from widely used PCA and MDS approaches. Finally, we illustrate that PC-AiR can provide improved population stratification correction over existing methods in genetic association studies with population structure and relatedness.

  1. Genome-Wide Views of Chromatin Structure

    PubMed Central

    Rando, Oliver J.; Chang, Howard Y.

    2010-01-01

    Eukaryotic genomes are packaged into a nucleoprotein complex known as chromatin, which affects most processes that occur on DNA. Along with genetic and biochemical studies of resident chromatin proteins and their modifying enzymes, mapping of chromatin structure in vivo is one of the main pillars in our understanding of how chromatin relates to cellular processes. In this review, we discuss the use of genomic technologies to characterize chromatin structure in vivo, with a focus on data from budding yeast and humans. The picture emerging from these studies is the detailed chromatin structure of a typical gene, where the typical behavior gives insight into the mechanisms and deep rules that establish chromatin structure. Important deviation from the archetype is also observed, usually as a consequence of unique regulatory mechanisms at special genomic loci. Chromatin structure shows substantial conservation from yeast to humans, but mammalian chromatin has additional layers of complexity that likely relate to the requirements of multicellularity such as the need to establish faithful gene regulatory mechanisms for cell differentiation. PMID:19317649

  2. A Detailed History of Intron-rich Eukaryotic Ancestors Inferred from a Global Survey of 100 Complete Genomes

    PubMed Central

    Csuros, Miklos; Rogozin, Igor B.; Koonin, Eugene V.

    2011-01-01

    Protein-coding genes in eukaryotes are interrupted by introns, but intron densities widely differ between eukaryotic lineages. Vertebrates, some invertebrates and green plants have intron-rich genes, with 6–7 introns per kilobase of coding sequence, whereas most of the other eukaryotes have intron-poor genes. We reconstructed the history of intron gain and loss using a probabilistic Markov model (Markov Chain Monte Carlo, MCMC) on 245 orthologous genes from 99 genomes representing the three of the five supergroups of eukaryotes for which multiple genome sequences are available. Intron-rich ancestors are confidently reconstructed for each major group, with 53 to 74% of the human intron density inferred with 95% confidence for the Last Eukaryotic Common Ancestor (LECA). The results of the MCMC reconstruction are compared with the reconstructions obtained using Maximum Likelihood (ML) and Dollo parsimony methods. An excellent agreement between the MCMC and ML inferences is demonstrated whereas Dollo parsimony introduces a noticeable bias in the estimations, typically yielding lower ancestral intron densities than MCMC and ML. Evolution of eukaryotic genes was dominated by intron loss, with substantial gain only at the bases of several major branches including plants and animals. The highest intron density, 120 to 130% of the human value, is inferred for the last common ancestor of animals. The reconstruction shows that the entire line of descent from LECA to mammals was intron-rich, a state conducive to the evolution of alternative splicing. PMID:21935348

  3. A detailed history of intron-rich eukaryotic ancestors inferred from a global survey of 100 complete genomes.

    PubMed

    Csuros, Miklos; Rogozin, Igor B; Koonin, Eugene V

    2011-09-01

    Protein-coding genes in eukaryotes are interrupted by introns, but intron densities widely differ between eukaryotic lineages. Vertebrates, some invertebrates and green plants have intron-rich genes, with 6-7 introns per kilobase of coding sequence, whereas most of the other eukaryotes have intron-poor genes. We reconstructed the history of intron gain and loss using a probabilistic Markov model (Markov Chain Monte Carlo, MCMC) on 245 orthologous genes from 99 genomes representing the three of the five supergroups of eukaryotes for which multiple genome sequences are available. Intron-rich ancestors are confidently reconstructed for each major group, with 53 to 74% of the human intron density inferred with 95% confidence for the Last Eukaryotic Common Ancestor (LECA). The results of the MCMC reconstruction are compared with the reconstructions obtained using Maximum Likelihood (ML) and Dollo parsimony methods. An excellent agreement between the MCMC and ML inferences is demonstrated whereas Dollo parsimony introduces a noticeable bias in the estimations, typically yielding lower ancestral intron densities than MCMC and ML. Evolution of eukaryotic genes was dominated by intron loss, with substantial gain only at the bases of several major branches including plants and animals. The highest intron density, 120 to 130% of the human value, is inferred for the last common ancestor of animals. The reconstruction shows that the entire line of descent from LECA to mammals was intron-rich, a state conducive to the evolution of alternative splicing.

  4. Inference of Transmission Network Structure from HIV Phylogenetic Trees.

    PubMed

    Giardina, Federica; Romero-Severson, Ethan Obie; Albert, Jan; Britton, Tom; Leitner, Thomas

    2017-01-01

    Phylogenetic inference is an attractive means to reconstruct transmission histories and epidemics. However, there is not a perfect correspondence between transmission history and virus phylogeny. Both node height and topological differences may occur, depending on the interaction between within-host evolutionary dynamics and between-host transmission patterns. To investigate these interactions, we added a within-host evolutionary model in epidemiological simulations and examined if the resulting phylogeny could recover different types of contact networks. To further improve realism, we also introduced patient-specific differences in infectivity across disease stages, and on the epidemic level we considered incomplete sampling and the age of the epidemic. Second, we implemented an inference method based on approximate Bayesian computation (ABC) to discriminate among three well-studied network models and jointly estimate both network parameters and key epidemiological quantities such as the infection rate. Our ABC framework used both topological and distance-based tree statistics for comparison between simulated and observed trees. Overall, our simulations showed that a virus time-scaled phylogeny (genealogy) may be substantially different from the between-host transmission tree. This has important implications for the interpretation of what a phylogeny reveals about the underlying epidemic contact network. In particular, we found that while the within-host evolutionary process obscures the transmission tree, the diversification process and infectivity dynamics also add discriminatory power to differentiate between different types of contact networks. We also found that the possibility to differentiate contact networks depends on how far an epidemic has progressed, where distance-based tree statistics have more power early in an epidemic. Finally, we applied our ABC inference on two different outbreaks from the Swedish HIV-1 epidemic.

  5. Inference of Transmission Network Structure from HIV Phylogenetic Trees

    PubMed Central

    Britton, Tom; Leitner, Thomas

    2017-01-01

    Phylogenetic inference is an attractive means to reconstruct transmission histories and epidemics. However, there is not a perfect correspondence between transmission history and virus phylogeny. Both node height and topological differences may occur, depending on the interaction between within-host evolutionary dynamics and between-host transmission patterns. To investigate these interactions, we added a within-host evolutionary model in epidemiological simulations and examined if the resulting phylogeny could recover different types of contact networks. To further improve realism, we also introduced patient-specific differences in infectivity across disease stages, and on the epidemic level we considered incomplete sampling and the age of the epidemic. Second, we implemented an inference method based on approximate Bayesian computation (ABC) to discriminate among three well-studied network models and jointly estimate both network parameters and key epidemiological quantities such as the infection rate. Our ABC framework used both topological and distance-based tree statistics for comparison between simulated and observed trees. Overall, our simulations showed that a virus time-scaled phylogeny (genealogy) may be substantially different from the between-host transmission tree. This has important implications for the interpretation of what a phylogeny reveals about the underlying epidemic contact network. In particular, we found that while the within-host evolutionary process obscures the transmission tree, the diversification process and infectivity dynamics also add discriminatory power to differentiate between different types of contact networks. We also found that the possibility to differentiate contact networks depends on how far an epidemic has progressed, where distance-based tree statistics have more power early in an epidemic. Finally, we applied our ABC inference on two different outbreaks from the Swedish HIV-1 epidemic. PMID:28085876

  6. Triallelic Population Genomics for Inferring Correlated Fitness Effects of Same Site Nonsynonymous Mutations.

    PubMed

    Ragsdale, Aaron P; Coffman, Alec J; Hsieh, PingHsun; Struck, Travis J; Gutenkunst, Ryan N

    2016-05-01

    The distribution of mutational effects on fitness is central to evolutionary genetics. Typical univariate distributions, however, cannot model the effects of multiple mutations at the same site, so we introduce a model in which mutations at the same site have correlated fitness effects. To infer the strength of that correlation, we developed a diffusion approximation to the triallelic frequency spectrum, which we applied to data from Drosophila melanogaster We found a moderate positive correlation between the fitness effects of nonsynonymous mutations at the same codon, suggesting that both mutation identity and location are important for determining fitness effects in proteins. We validated our approach by comparing it to biochemical mutational scanning experiments, finding strong quantitative agreement, even between different organisms. We also found that the correlation of mutational fitness effects was not affected by protein solvent exposure or structural disorder. Together, our results suggest that the correlation of fitness effects at the same site is a previously overlooked yet fundamental property of protein evolution.

  7. Inferring friendship network structure by using mobile phone data.

    PubMed

    Eagle, Nathan; Pentland, Alex Sandy; Lazer, David

    2009-09-08

    Data collected from mobile phones have the potential to provide insight into the relational dynamics of individuals. This paper compares observational data from mobile phones with standard self-report survey data. We find that the information from these two data sources is overlapping but distinct. For example, self-reports of physical proximity deviate from mobile phone records depending on the recency and salience of the interactions. We also demonstrate that it is possible to accurately infer 95% of friendships based on the observational data alone, where friend dyads demonstrate distinctive temporal and spatial patterns in their physical proximity and calling patterns. These behavioral patterns, in turn, allow the prediction of individual-level outcomes such as job satisfaction.

  8. The Plasmodium apicoplast genome: conserved structure and close relationship of P. ovale to rodent malaria parasites.

    PubMed

    Arisue, Nobuko; Hashimoto, Tetsuo; Mitsui, Hideya; Palacpac, Nirianne M Q; Kaneko, Akira; Kawai, Satoru; Hasegawa, Masami; Tanabe, Kazuyuki; Horii, Toshihiro

    2012-09-01

    Apicoplast, a nonphotosynthetic plastid derived from secondary symbiotic origin, is essential for the survival of malaria parasites of the genus Plasmodium. Elucidation of the evolution of the apicoplast genome in Plasmodium species is important to better understand the functions of the organelle. However, the complete apicoplast genome is available for only the most virulent human malaria parasite, Plasmodium falciparum. Here, we obtained the near-complete apicoplast genome sequences from eight Plasmodium species that infect a wide variety of vertebrate hosts and performed structural and phylogenetic analyses. We found that gene repertoire, gene arrangement, and other structural attributes were highly conserved. Phylogenetic reconstruction using 30 protein-coding genes of the apicoplast genome inferred, for the first time, a close relationship between P. ovale and rodent parasites. This close relatedness was robustly supported using multiple evolutionary assumptions and models. The finding suggests that an ancestral host switch occurred between rodent and human Plasmodium parasites.

  9. Demographic Divergence History of Pied Flycatcher and Collared Flycatcher Inferred from Whole-Genome Re-sequencing Data

    PubMed Central

    Nadachowska-Brzyska, Krystyna; Burri, Reto; Olason, Pall I.; Kawakami, Takeshi; Smeds, Linnéa; Ellegren, Hans

    2013-01-01

    Profound knowledge of demographic history is a prerequisite for the understanding and inference of processes involved in the evolution of population differentiation and speciation. Together with new coalescent-based methods, the recent availability of genome-wide data enables investigation of differentiation and divergence processes at unprecedented depth. We combined two powerful approaches, full Approximate Bayesian Computation analysis (ABC) and pairwise sequentially Markovian coalescent modeling (PSMC), to reconstruct the demographic history of the split between two avian speciation model species, the pied flycatcher and collared flycatcher. Using whole-genome re-sequencing data from 20 individuals, we investigated 15 demographic models including different levels and patterns of gene flow, and changes in effective population size over time. ABC provided high support for recent (mode 0.3 my, range <0.7 my) species divergence, declines in effective population size of both species since their initial divergence, and unidirectional recent gene flow from pied flycatcher into collared flycatcher. The estimated divergence time and population size changes, supported by PSMC results, suggest that the ancestral species persisted through one of the glacial periods of middle Pleistocene and then split into two large populations that first increased in size before going through severe bottlenecks and expanding into their current ranges. Secondary contact appears to have been established after the last glacial maximum. The severity of the bottlenecks at the last glacial maximum is indicated by the discrepancy between current effective population sizes (20,000–80,000) and census sizes (5–50 million birds) of the two species. The recent divergence time challenges the supposition that avian speciation is a relatively slow process with extended times for intrinsic postzygotic reproductive barriers to evolve. Our study emphasizes the importance of using genome-wide data to

  10. Inferring Selective Constraint from Population Genomic Data Suggests Recent Regulatory Turnover in the Human Brain

    PubMed Central

    Schrider, Daniel R.; Kern, Andrew D.

    2015-01-01

    The comparative genomics revolution of the past decade has enabled the discovery of functional elements in the human genome via sequence comparison. While that is so, an important class of elements, those specific to humans, is entirely missed by searching for sequence conservation across species. Here we present an analysis based on variation data among human genomes that utilizes a supervised machine learning approach for the identification of human-specific purifying selection in the genome. Using only allele frequency information from the complete low-coverage 1000 Genomes Project data set in conjunction with a support vector machine trained from known functional and nonfunctional portions of the genome, we are able to accurately identify portions of the genome constrained by purifying selection. Our method identifies previously known human-specific gains or losses of function and uncovers many novel candidates. Candidate targets for gain and loss of function along the human lineage include numerous putative regulatory regions of genes essential for normal development of the central nervous system, including a significant enrichment of gain of function events near neurotransmitter receptor genes. These results are consistent with regulatory turnover being a key mechanism in the evolution of human-specific characteristics of brain development. Finally, we show that the majority of the genome is unconstrained by natural selection currently, in agreement with what has been estimated from phylogenetic methods but in sharp contrast to estimates based on transcriptomics or other high-throughput functional methods. PMID:26590212

  11. Inferring Selective Constraint from Population Genomic Data Suggests Recent Regulatory Turnover in the Human Brain.

    PubMed

    Schrider, Daniel R; Kern, Andrew D

    2015-11-19

    The comparative genomics revolution of the past decade has enabled the discovery of functional elements in the human genome via sequence comparison. While that is so, an important class of elements, those specific to humans, is entirely missed by searching for sequence conservation across species. Here we present an analysis based on variation data among human genomes that utilizes a supervised machine learning approach for the identification of human-specific purifying selection in the genome. Using only allele frequency information from the complete low-coverage 1000 Genomes Project data set in conjunction with a support vector machine trained from known functional and nonfunctional portions of the genome, we are able to accurately identify portions of the genome constrained by purifying selection. Our method identifies previously known human-specific gains or losses of function and uncovers many novel candidates. Candidate targets for gain and loss of function along the human lineage include numerous putative regulatory regions of genes essential for normal development of the central nervous system, including a significant enrichment of gain of function events near neurotransmitter receptor genes. These results are consistent with regulatory turnover being a key mechanism in the evolution of human-specific characteristics of brain development. Finally, we show that the majority of the genome is unconstrained by natural selection currently, in agreement with what has been estimated from phylogenetic methods but in sharp contrast to estimates based on transcriptomics or other high-throughput functional methods.

  12. Reference set of regulons in Desulfovibrionales inferred by comparative genomics approach

    SciTech Connect

    Kazakov, A.E.; Rodionov, D.A.; Price, M.N.; Arkin, A.P.; Dubchak, I.; Novichkov, P.S.

    2010-11-15

    in this study, we carried out large-scale comparative genomics analysis of regulatory interactions in Desulfovibrio vulgaris and 12 related genomes from Desulfovibrionales order using our recently developed web server RegPredict (http://regpredict.lbl.gov). An overall reference collection of 26 Desulfovibrionales regulogs can be accessed through RegPrecise database (http://regpredict.lbl.gov).

  13. Mechanisms underlying structural variant formation in genomic disorders

    PubMed Central

    Carvalho, Claudia M. B.; Lupski, James R.

    2016-01-01

    With the recent burst of technological developments in genomics, and the clinical implementation of genome-wide assays, our understanding of the molecular basis of genomic disorders, specifically the contribution of structural variation to disease burden, is evolving quickly. Ongoing studies have revealed a ubiquitous role for genome architecture in the formation of structural variants at a given locus, both in DNA recombination-based processes and in replication-based processes. These reports showcase the influence of repeat sequences on genomic stability and structural variant complexity and also highlight the tremendous plasticity and dynamic nature of our genome in evolution, health and disease susceptibility. PMID:26924765

  14. Unifying Inference of Meso-Scale Structures in Networks.

    PubMed

    Tunç, Birkan; Verma, Ragini

    2015-01-01

    Networks are among the most prevalent formal representations in scientific studies, employed to depict interactions between objects such as molecules, neuronal clusters, or social groups. Studies performed at meso-scale that involve grouping of objects based on their distinctive interaction patterns form one of the main lines of investigation in network science. In a social network, for instance, meso-scale structures can correspond to isolated social groupings or groups of individuals that serve as a communication core. Currently, the research on different meso-scale structures such as community and core-periphery structures has been conducted via independent approaches, which precludes the possibility of an algorithmic design that can handle multiple meso-scale structures and deciding which structure explains the observed data better. In this study, we propose a unified formulation for the algorithmic detection and analysis of different meso-scale structures. This facilitates the investigation of hybrid structures that capture the interplay between multiple meso-scale structures and statistical comparison of competing structures, all of which have been hitherto unavailable. We demonstrate the applicability of the methodology in analyzing the human brain network, by determining the dominant organizational structure (communities) of the brain, as well as its auxiliary characteristics (core-periphery).

  15. Bayesian inference of protein structure from chemical shift data.

    PubMed

    Bratholm, Lars A; Christensen, Anders S; Hamelryck, Thomas; Jensen, Jan H

    2015-01-01

    Protein chemical shifts are routinely used to augment molecular mechanics force fields in protein structure simulations, with weights of the chemical shift restraints determined empirically. These weights, however, might not be an optimal descriptor of a given protein structure and predictive model, and a bias is introduced which might result in incorrect structures. In the inferential structure determination framework, both the unknown structure and the disagreement between experimental and back-calculated data are formulated as a joint probability distribution, thus utilizing the full information content of the data. Here, we present the formulation of such a probability distribution where the error in chemical shift prediction is described by either a Gaussian or Cauchy distribution. The methodology is demonstrated and compared to a set of empirically weighted potentials through Markov chain Monte Carlo simulations of three small proteins (ENHD, Protein G and the SMN Tudor Domain) using the PROFASI force field and the chemical shift predictor CamShift. Using a clustering-criterion for identifying the best structure, together with the addition of a solvent exposure scoring term, the simulations suggests that sampling both the structure and the uncertainties in chemical shift prediction leads more accurate structures compared to conventional methods using empirical determined weights. The Cauchy distribution, using either sampled uncertainties or predetermined weights, did, however, result in overall better convergence to the native fold, suggesting that both types of distribution might be useful in different aspects of the protein structure prediction.

  16. Systematic Prioritization of Druggable Mutations in ∼5000 Genomes Across 16 Cancer Types Using a Structural Genomics-based Approach*

    PubMed Central

    Zhao, Junfei; Cheng, Feixiong; Wang, Yuanyuan; Arteaga, Carlos L.; Zhao, Zhongming

    2016-01-01

    A massive amount of somatic mutations has been cataloged in large-scale projects such as The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium projects. The majority of the somatic mutations found in tumor genomes are neutral 'passenger' rather than damaging “driver” mutations. Now, understanding their biological consequences and prioritizing them for druggable targets are urgently needed. Thanks to the rapid advances in structural genomics technologies (e.g. X-ray), large-scale protein structural data has now been made available, providing critical information for deciphering functional roles of mutations in cancer and prioritizing those alterations that may mediate drug binding at the atom resolution and, as such, be druggable targets. We hypothesized that mutations at protein–ligand binding-site residues are likely to be druggable targets. Thus, to prioritize druggable mutations, we developed SGDriver, a structural genomics-based method incorporating the somatic missense mutations into protein–ligand binding-site residues using a Bayes inference statistical framework. We applied SGDriver to 746,631 missense mutations observed in 4997 tumor-normal pairs across 16 cancer types from The Cancer Genome Atlas. SGDriver detected 14,471 potential druggable mutations in 2091 proteins (including 1,516 recurrently mutated proteins) across 3558 cancer genomes (71.2%), and further identified 298 proteins harboring mutations that were significantly enriched at protein–ligand binding-site residues (adjusted p value < 0.05). The identified proteins are significantly enriched in both oncoproteins and tumor suppressors. The follow-up drug-target network analysis suggested 98 known and 126 repurposed druggable anticancer targets (e.g. SPOP and NR3C1). Furthermore, our integrative analysis indicated that 13% of patients might benefit from current targeted therapy, and this –proportion would increase to 31% when considering drug repositioning

  17. Genome instability mechanisms and the structure of cancer genomes.

    PubMed

    Cassidy, Liam D; Venkitaraman, Ashok R

    2012-02-01

    Genomic instability is a hallmark of cancer cells, and arises from the aberrations that these cells exhibit in the normal biological mechanisms that repair and replicate the genome, or ensure its accurate segregation during cell division. Increasingly detailed descriptions of cancer genomes have begun to emerge from next-generation sequencing (NGS), providing snapshots of their nature and heterogeneity in different cancers at different stages in their evolution. Here, we attempt to extract from these sequencing studies insights into the role of genome instability mechanisms in carcinogenesis, and to identify challenges impeding further progress.

  18. Complete Chloroplast Genome of the Wollemi Pine (Wollemia nobilis): Structure and Evolution

    PubMed Central

    Yap, Jia-Yee S.; Rohner, Thore; Greenfield, Abigail; Van Der Merwe, Marlien; McPherson, Hannah; Glenn, Wendy; Kornfeld, Geoff; Marendy, Elessa; Pan, Annie Y. H.; Wilkins, Marc R.; Rossetto, Maurizio; Delaney, Sven K.

    2015-01-01

    The Wollemi pine (Wollemia nobilis) is a rare Southern conifer with striking morphological similarity to fossil pines. A small population of W. nobilis was discovered in 1994 in a remote canyon system in the Wollemi National Park (near Sydney, Australia). This population contains fewer than 100 individuals and is critically endangered. Previous genetic studies of the Wollemi pine have investigated its evolutionary relationship with other pines in the family Araucariaceae, and have suggested that the Wollemi pine genome contains little or no variation. However, these studies were performed prior to the widespread use of genome sequencing, and their conclusions were based on a limited fraction of the Wollemi pine genome. In this study, we address this problem by determining the entire sequence of the W. nobilis chloroplast genome. A detailed analysis of the structure of the genome is presented, and the evolution of the genome is inferred by comparison with the chloroplast sequences of other members of the Araucariaceae and the related family Podocarpaceae. Pairwise alignments of whole genome sequences, and the presence of unique pseudogenes, gene duplications and insertions in W. nobilis and Araucariaceae, indicate that the W. nobilis chloroplast genome is most similar to that of its sister taxon Agathis. However, the W. nobilis genome contains an unusually high number of repetitive sequences, and these could be used in future studies to investigate and conserve any remnant genetic diversity in the Wollemi pine. PMID:26061691

  19. High-level phylogeny of the Coleoptera inferred with mitochondrial genome sequences.

    PubMed

    Yuan, Ming-Long; Zhang, Qi-Lin; Zhang, Li; Guo, Zhong-Long; Liu, Yong-Jian; Shen, Yu-Ying; Shao, Renfu

    2016-11-01

    The Coleoptera (beetles) exhibits tremendous morphological, ecological, and behavioral diversity. To better understand the phylogenetics and evolution of beetles, we sequenced three complete mitogenomes from two families (Cleridae and Meloidae), which share conserved mitogenomic features with other completely sequenced beetles. We assessed the influence of six datasets and three inference methods on topology and nodal support within the Coleoptera. We found that both Bayesian inference and maximum likelihood with homogeneous-site models were greatly affected by nucleotide compositional heterogeneity, while the heterogeneous-site mixture model in PhyloBayes could provide better phylogenetic signals for the Coleoptera. The amino acid dataset generated more reliable tree topology at the higher taxonomic levels (i.e. suborders and series), where the inclusion of rRNA genes and the third positions of protein-coding genes improved phylogenetic inference at the superfamily level, especially under a heterogeneous-site model. We recovered the suborder relationships as (Archostemata+Adephaga)+(Myxophaga+Polyphaga). The series relationships within Polyphaga were recovered as (Scirtiformia+(Elateriformia+((Bostrichiformia+Scarabaeiformia+Staphyliniformia)+Cucujiformia))). All superfamilies within Cucujiformia were recovered as monophyletic. We obtained a cucujiform phylogeny of (Cleroidea+(Coccinelloidea+((Lymexyloidea+Tenebrionoidea)+(Cucujoidea+(Chrysomeloidea+Curculionoidea))))). This study showed that although tree topologies were sensitive to data types and inference methods, mitogenomic data could provide useful information for resolving the Coleoptera phylogeny at various taxonomic levels by using suitable datasets and heterogeneous-site models.

  20. Demographic History of the Genus Pan Inferred from Whole Mitochondrial Genome Reconstructions

    PubMed Central

    Tucci, Serena; de Manuel, Marc; Ghirotto, Silvia; Benazzo, Andrea; Prado-Martinez, Javier; Lorente-Galdos, Belen; Nam, Kiwoong; Dabad, Marc; Hernandez-Rodriguez, Jessica; Comas, David; Navarro, Arcadi; Schierup, Mikkel H.; Andres, Aida M.; Barbujani, Guido; Hvilsom, Christina; Marques-Bonet, Tomas

    2016-01-01

    The genus Pan is the closest genus to our own and it includes two species, Pan paniscus (bonobos) and Pan troglodytes (chimpanzees). The later is constituted by four subspecies, all highly endangered. The study of the Pan genera has been incessantly complicated by the intricate relationship among subspecies and the statistical limitations imposed by the reduced number of samples or genomic markers analyzed. Here, we present a new method to reconstruct complete mitochondrial genomes (mitogenomes) from whole genome shotgun (WGS) datasets, mtArchitect, showing that its reconstructions are highly accurate and consistent with long-range PCR mitogenomes. We used this approach to build the mitochondrial genomes of 20 newly sequenced samples which, together with available genomes, allowed us to analyze the hitherto most complete Pan mitochondrial genome dataset including 156 chimpanzee and 44 bonobo individuals, with a proportional contribution from all chimpanzee subspecies. We estimated the separation time between chimpanzees and bonobos around 1.15 million years ago (Mya) [0.81–1.49]. Further, we found that under the most probable genealogical model the two clades of chimpanzees, Western + Nigeria-Cameroon and Central + Eastern, separated at 0.59 Mya [0.41–0.78] with further internal separations at 0.32 Mya [0.22–0.43] and 0.16 Mya [0.17–0.34], respectively. Finally, for a subset of our samples, we compared nuclear versus mitochondrial genomes and we found that chimpanzee subspecies have different patterns of nuclear and mitochondrial diversity, which could be a result of either processes affecting the mitochondrial genome, such as hitchhiking or background selection, or a result of population dynamics. PMID:27345955

  1. Demographic History of the Genus Pan Inferred from Whole Mitochondrial Genome Reconstructions.

    PubMed

    Lobon, Irene; Tucci, Serena; de Manuel, Marc; Ghirotto, Silvia; Benazzo, Andrea; Prado-Martinez, Javier; Lorente-Galdos, Belen; Nam, Kiwoong; Dabad, Marc; Hernandez-Rodriguez, Jessica; Comas, David; Navarro, Arcadi; Schierup, Mikkel H; Andres, Aida M; Barbujani, Guido; Hvilsom, Christina; Marques-Bonet, Tomas

    2016-07-03

    The genus Pan is the closest genus to our own and it includes two species, Pan paniscus (bonobos) and Pan troglodytes (chimpanzees). The later is constituted by four subspecies, all highly endangered. The study of the Pan genera has been incessantly complicated by the intricate relationship among subspecies and the statistical limitations imposed by the reduced number of samples or genomic markers analyzed. Here, we present a new method to reconstruct complete mitochondrial genomes (mitogenomes) from whole genome shotgun (WGS) datasets, mtArchitect, showing that its reconstructions are highly accurate and consistent with long-range PCR mitogenomes. We used this approach to build the mitochondrial genomes of 20 newly sequenced samples which, together with available genomes, allowed us to analyze the hitherto most complete Pan mitochondrial genome dataset including 156 chimpanzee and 44 bonobo individuals, with a proportional contribution from all chimpanzee subspecies. We estimated the separation time between chimpanzees and bonobos around 1.15 million years ago (Mya) [0.81-1.49]. Further, we found that under the most probable genealogical model the two clades of chimpanzees, Western + Nigeria-Cameroon and Central + Eastern, separated at 0.59 Mya [0.41-0.78] with further internal separations at 0.32 Mya [0.22-0.43] and 0.16 Mya [0.17-0.34], respectively. Finally, for a subset of our samples, we compared nuclear versus mitochondrial genomes and we found that chimpanzee subspecies have different patterns of nuclear and mitochondrial diversity, which could be a result of either processes affecting the mitochondrial genome, such as hitchhiking or background selection, or a result of population dynamics.

  2. Inferring Meaning from Syntactic Structures in Acquisition: The Case of Transitivity and Telicity

    ERIC Educational Resources Information Center

    Wagner, Laura

    2010-01-01

    This paper investigated children's ability to use syntactic structures to infer semantic information. The particular syntax-semantics link examined was the one between transitivity (transitive/intransitive structures) and telicity (telic/atelic perspectives; that is, boundedness). Although transitivity is an important syntactic reflex of telicity,…

  3. Chloroplast genome structure in Ilex (Aquifoliaceae)

    PubMed Central

    Yao, Xin; Tan, Yun-Hong; Liu, Ying-Ying; Song, Yu; Yang, Jun-Bo; Corlett, Richard T.

    2016-01-01

    Aquifoliaceae is the largest family in the campanulid order Aquifoliales. It consists of a single genus, Ilex, the hollies, which is the largest woody dioecious genus in the angiosperms. Most species are in East Asia or South America. The taxonomy and evolutionary history remain unclear due to the lack of a robust species-level phylogeny. We produced the first complete chloroplast genomes in this family, including seven Ilex species, by Illumina sequencing of long-range PCR products and subsequent reference-guided de novo assembly. These genomes have a typical bicyclic structure with a conserved genome arrangement and moderate divergence. The total length is 157,741 bp and there is one large single-copy region (LSC) with 87,109 bp, one small single-copy with 18,436 bp, and a pair of inverted repeat regions (IR) with 52,196 bp. A total of 144 genes were identified, including 96 protein-coding genes, 40 tRNA and 8 rRNA. Thirty-four repetitive sequences were identified in Ilex pubescens, with lengths >14 bp and identity >90%, and 11 divergence hotspot regions that could be targeted for phylogenetic markers. This study will contribute to improved resolution of deep branches of the Ilex phylogeny and facilitate identification of Ilex species. PMID:27378489

  4. Protein NMR Structure Refinement based on Bayesian Inference

    NASA Astrophysics Data System (ADS)

    Ikeya, Teppei; Ikeda, Shiro; Kigawa, Takanori; Ito, Yutaka; Güntert, Peter

    2016-03-01

    Nuclear Magnetic Resonance (NMR) spectroscopy is a tool to investigate threedimensional (3D) structures and dynamics of biomacromolecules at atomic resolution in solution or more natural environments such as living cells. Since NMR data are principally only spectra with peak signals, it is required to properly deduce structural information from the sparse experimental data with their imperfections and uncertainty, and to visualize 3D conformations by NMR structure calculation. In order to efficiently analyse the data, Rieping et al. proposed a new structure calculation method based on Bayes’ theorem. We implemented a similar approach into the program CYANA with some modifications. It allows us to handle automatic NOE cross peak assignments in unambiguous and ambiguous usages, and to create a prior distribution based on a physical force field with the generalized Born implicit water model. The sampling scheme for obtaining the posterior is performed by a hybrid Monte Carlo algorithm combined with Markov chain Monte Carlo (MCMC) by the Gibbs sampler, and molecular dynamics simulation (MD) for obtaining a canonical ensemble of conformations. Since it is not trivial to search the entire function space particularly for exploring the conformational prior due to the extraordinarily large conformation space of proteins, the replica exchange method is performed, in which several MCMC calculations with different temperatures run in parallel as replicas. It is shown with simulated data or randomly deleted experimental peaks that the new structure calculation method can provide accurate structures even with less peaks, especially compared with the conventional method. In particular, it dramatically improves in-cell structures of the proteins GB1 and TTHA1718 using exclusively information obtained in living Escherichia coli (E. coli) cells.

  5. Simplified DGS procedure for large-scale genome structural study.

    PubMed

    Jung, Yong-Chul; Xu, Jia; Chen, Jun; Kim, Yeong; Winchester, David; Wang, San Ming

    2009-11-01

    Ditag genome scanning (DGS) uses next-generation DNA sequencing to sequence the ends of ditag fragments produced by restriction enzymes. These sequences are compared to known genome sequences to determine their structure. In order to use DGS for large-scale genome structural studies, we have substantially revised the original protocol by replacing the in vivo genomic DNA cloning with in vitro adaptor ligation, eliminating the ditag concatemerization steps, and replacing the 454 sequencer with Solexa or SOLiD sequencers for ditag sequence collection. This revised protocol further increases genome coverage and resolution and allows DGS to be used to analyze multiple genomes simultaneously.

  6. Comparative genomics of four Liliales families inferred from the complete chloroplast genome sequence of Veratrum patulum O. Loes. (Melanthiaceae).

    PubMed

    Do, Hoang Dang Khoa; Kim, Jung Sung; Kim, Joo-Hwan

    2013-11-10

    The sequence of the chloroplast genome, which is inherited maternally, contains useful information for many scientific fields such as plant systematics, biogeography and biotechnology because its characteristics are highly conserved among species. There is an increase in chloroplast genomes of angiosperms that have been sequenced in recent years. In this study, the nucleotide sequence of the chloroplast genome (cpDNA) of Veratrum patulum Loes. (Melanthiaceae, Liliales) was analyzed completely. The circular double-stranded DNA of 153,699 bp consists of two inverted repeat (IR) regions of 26,360 bp each, a large single copy of 83,372 bp, and a small single copy of 17,607 bp. This plastome contains 81 protein-coding genes, 30 distinct tRNA and four genes of rRNA. In addition, there are six hypothetical coding regions (ycf1, ycf2, ycf3, ycf4, ycf15 and ycf68) and two open reading frames (ORF42 and ORF56), which are also found in the chloroplast genomes of the other species. The gene orders and gene contents of the V. patulum plastid genome are similar to that of Smilax china, Lilium longiflorum and Alstroemeria aurea, members of the Smilacaceae, Liliaceae and Alstroemeriaceae (Liliales), respectively. However, the loss rps16 exon 2 in V. patulum results in the difference in the large single copy regions in comparison with other species. The base substitution rate is quite similar among genes of these species. Additionally, the base substitution rate of inverted repeat region was smaller than that of single copy regions in all observed species of Liliales. The IR regions were expanded to trnH_GUG in V. patulum, a part of rps19 in L. longiflorum and A. aurea, and whole sequence of rps19 in S. china. Furthermore, the IGS lengths of rbcL-accD-psaI region were variable among Liliales species, suggesting that this region might be a hotspot of indel events and the informative site for phylogenetic studies in Liliales. In general, the whole chloroplast genome of V. patulum, a

  7. The Generator of the Event Structure Lexicon (GESL): Automatic Annotation of Event Structure for Textual Inference Tasks

    ERIC Educational Resources Information Center

    Im, Seohyun

    2013-01-01

    This dissertation aims to develop the Generator of the Event Structure Lexicon (GESL) which is a tool to automate annotating the event structure of verbs in text to support textual inference tasks related to lexically entailed subevents. The output of the GESL is the Event Structure Lexicon (ESL), which is a lexicon of verbs in text which includes…

  8. Alternative Multiple Imputation Inference for Mean and Covariance Structure Modeling

    ERIC Educational Resources Information Center

    Lee, Taehun; Cai, Li

    2012-01-01

    Model-based multiple imputation has become an indispensable method in the educational and behavioral sciences. Mean and covariance structure models are often fitted to multiply imputed data sets. However, the presence of multiple random imputations complicates model fit testing, which is an important aspect of mean and covariance structure…

  9. Non-Bayesian Inference: Causal Structure Trumps Correlation

    ERIC Educational Resources Information Center

    Bes, Benedicte; Sloman, Steven; Lucas, Christopher G.; Raufaste, Eric

    2012-01-01

    The study tests the hypothesis that conditional probability judgments can be influenced by causal links between the target event and the evidence even when the statistical relations among variables are held constant. Three experiments varied the causal structure relating three variables and found that (a) the target event was perceived as more…

  10. Statistical Inference for Detecting Structures and Anomalies in Networks

    DTIC Science & Technology

    2015-08-27

    community structure in dynamic networks, along with the discovery of a detectability phase transition as a function of the rate of change and the...local in- formation, about the known nodes and their neighbors. But when this fraction crosses a critical threshold, our knowledge becomes global

  11. Use of Bayesian Inference in Crystallographic Structure Refinement via Full Diffraction Profile Analysis

    PubMed Central

    Fancher, Chris M.; Han, Zhen; Levin, Igor; Page, Katharine; Reich, Brian J.; Smith, Ralph C.; Wilson, Alyson G.; Jones, Jacob L.

    2016-01-01

    A Bayesian inference method for refining crystallographic structures is presented. The distribution of model parameters is stochastically sampled using Markov chain Monte Carlo. Posterior probability distributions are constructed for all model parameters to properly quantify uncertainty by appropriately modeling the heteroskedasticity and correlation of the error structure. The proposed method is demonstrated by analyzing a National Institute of Standards and Technology silicon standard reference material. The results obtained by Bayesian inference are compared with those determined by Rietveld refinement. Posterior probability distributions of model parameters provide both estimates and uncertainties. The new method better estimates the true uncertainties in the model as compared to the Rietveld method. PMID:27550221

  12. Use of Bayesian Inference in Crystallographic Structure Refinement via Full Diffraction Profile Analysis.

    PubMed

    Fancher, Chris M; Han, Zhen; Levin, Igor; Page, Katharine; Reich, Brian J; Smith, Ralph C; Wilson, Alyson G; Jones, Jacob L

    2016-08-23

    A Bayesian inference method for refining crystallographic structures is presented. The distribution of model parameters is stochastically sampled using Markov chain Monte Carlo. Posterior probability distributions are constructed for all model parameters to properly quantify uncertainty by appropriately modeling the heteroskedasticity and correlation of the error structure. The proposed method is demonstrated by analyzing a National Institute of Standards and Technology silicon standard reference material. The results obtained by Bayesian inference are compared with those determined by Rietveld refinement. Posterior probability distributions of model parameters provide both estimates and uncertainties. The new method better estimates the true uncertainties in the model as compared to the Rietveld method.

  13. Stock portfolio structure of individual investors infers future trading behavior.

    PubMed

    Bohlin, Ludvig; Rosvall, Martin

    2014-01-01

    Although the understanding of and motivation behind individual trading behavior is an important puzzle in finance, little is known about the connection between an investor's portfolio structure and her trading behavior in practice. In this paper, we investigate the relation between what stocks investors hold, and what stocks they buy, and show that investors with similar portfolio structures to a great extent trade in a similar way. With data from the central register of shareholdings in Sweden, we model the market in a similarity network, by considering investors as nodes, connected with links representing portfolio similarity. From the network, we find investor groups that not only identify different investment strategies, but also represent individual investors trading in a similar way. These findings suggest that the stock portfolios of investors hold meaningful information, which could be used to earn a better understanding of stock market dynamics.

  14. Stock Portfolio Structure of Individual Investors Infers Future Trading Behavior

    PubMed Central

    Bohlin, Ludvig; Rosvall, Martin

    2014-01-01

    Although the understanding of and motivation behind individual trading behavior is an important puzzle in finance, little is known about the connection between an investor's portfolio structure and her trading behavior in practice. In this paper, we investigate the relation between what stocks investors hold, and what stocks they buy, and show that investors with similar portfolio structures to a great extent trade in a similar way. With data from the central register of shareholdings in Sweden, we model the market in a similarity network, by considering investors as nodes, connected with links representing portfolio similarity. From the network, we find investor groups that not only identify different investment strategies, but also represent individual investors trading in a similar way. These findings suggest that the stock portfolios of investors hold meaningful information, which could be used to earn a better understanding of stock market dynamics. PMID:25068302

  15. Parameter and Structure Inference for Nonlinear Dynamical Systems

    NASA Technical Reports Server (NTRS)

    Morris, Robin D.; Smelyanskiy, Vadim N.; Millonas, Mark

    2006-01-01

    A great many systems can be modeled in the non-linear dynamical systems framework, as x = f(x) + xi(t), where f() is the potential function for the system, and xi is the excitation noise. Modeling the potential using a set of basis functions, we derive the posterior for the basis coefficients. A more challenging problem is to determine the set of basis functions that are required to model a particular system. We show that using the Bayesian Information Criteria (BIC) to rank models, and the beam search technique, that we can accurately determine the structure of simple non-linear dynamical system models, and the structure of the coupling between non-linear dynamical systems where the individual systems are known. This last case has important ecological applications.

  16. Structural influence of gene networks on their inference: analysis of C3NET

    PubMed Central

    2011-01-01

    Background The availability of large-scale high-throughput data possesses considerable challenges toward their functional analysis. For this reason gene network inference methods gained considerable interest. However, our current knowledge, especially about the influence of the structure of a gene network on its inference, is limited. Results In this paper we present a comprehensive investigation of the structural influence of gene networks on the inferential characteristics of C3NET - a recently introduced gene network inference algorithm. We employ local as well as global performance metrics in combination with an ensemble approach. The results from our numerical study for various biological and synthetic network structures and simulation conditions, also comparing C3NET with other inference algorithms, lead a multitude of theoretical and practical insights into the working behavior of C3NET. In addition, in order to facilitate the practical usage of C3NET we provide an user-friendly R package, called c3net, and describe its functionality. It is available from https://r-forge.r-project.org/projects/c3net and from the CRAN package repository. Conclusions The availability of gene network inference algorithms with known inferential properties opens a new era of large-scale screening experiments that could be equally beneficial for basic biological and biomedical research with auspicious prospects. The availability of our easy to use software package c3net may contribute to the popularization of such methods. Reviewers This article was reviewed by Lev Klebanov, Joel Bader and Yuriy Gusev. PMID:21696592

  17. Inferences of drug responses in cancer cells from cancer genomic features and compound chemical and therapeutic properties

    NASA Astrophysics Data System (ADS)

    Wang, Yongcui; Fang, Jianwen; Chen, Shilong

    2016-09-01

    Accurately predicting the response of a cancer patient to a therapeutic agent is a core goal of precision medicine. Existing approaches were mainly relied primarily on genomic alterations in cancer cells that have been treated with different drugs. Here we focus on predicting drug response based on integration of the heterogeneously pharmacogenomics data from both cell and drug sides. Through a systematical approach, named as PDRCC (Predict Drug Response in Cancer Cells), the cancer genomic alterations and compound chemical and therapeutic properties were incorporated to determine the chemotherapeutic response in cancer patients. Using the Cancer Cell Line Encyclopedia (CCLE) study as the benchmark dataset, all pharmacogenomics data exhibited their roles in inferring the relationships between cancer cells and drugs. When integrating both genomic resources and compound information, the prediction coverage was significantly increased. The validity of PDRCC was also supported by its effective in uncovering the unknown cell-drug associations with database and literature evidences. It set the stage for clinical testing of novel therapeutic strategies, such as the sensitive association between cancer cell ‘A549_LUNG’ and compound ‘Topotecan’. In conclusion, PDRCC offers the possibility for faster, safer, and cheaper the development of novel anti-cancer therapeutics in the early-stage clinical trails.

  18. Inferences of drug responses in cancer cells from cancer genomic features and compound chemical and therapeutic properties

    PubMed Central

    Wang, Yongcui; Fang, Jianwen; Chen, Shilong

    2016-01-01

    Accurately predicting the response of a cancer patient to a therapeutic agent is a core goal of precision medicine. Existing approaches were mainly relied primarily on genomic alterations in cancer cells that have been treated with different drugs. Here we focus on predicting drug response based on integration of the heterogeneously pharmacogenomics data from both cell and drug sides. Through a systematical approach, named as PDRCC (Predict Drug Response in Cancer Cells), the cancer genomic alterations and compound chemical and therapeutic properties were incorporated to determine the chemotherapeutic response in cancer patients. Using the Cancer Cell Line Encyclopedia (CCLE) study as the benchmark dataset, all pharmacogenomics data exhibited their roles in inferring the relationships between cancer cells and drugs. When integrating both genomic resources and compound information, the prediction coverage was significantly increased. The validity of PDRCC was also supported by its effective in uncovering the unknown cell-drug associations with database and literature evidences. It set the stage for clinical testing of novel therapeutic strategies, such as the sensitive association between cancer cell ‘A549_LUNG’ and compound ‘Topotecan’. In conclusion, PDRCC offers the possibility for faster, safer, and cheaper the development of novel anti-cancer therapeutics in the early-stage clinical trails. PMID:27645580

  19. Epigenomics and the structure of the living genome.

    PubMed

    Friedman, Nir; Rando, Oliver J

    2015-10-01

    Eukaryotic genomes are packaged into an extensively folded state known as chromatin. Analysis of the structure of eukaryotic chromosomes has been revolutionized by development of a suite of genome-wide measurement technologies, collectively termed "epigenomics." We review major advances in epigenomic analysis of eukaryotic genomes, covering aspects of genome folding at scales ranging from whole chromosome folding down to nucleotide-resolution assays that provide structural insights into protein-DNA interactions. We then briefly outline several challenges remaining and highlight new developments such as single-cell epigenomic assays that will help provide us with a high-resolution structural understanding of eukaryotic genomes.

  20. Phylogenetic position of the coral symbiont Ostreobium (Ulvophyceae) inferred from chloroplast genome data.

    PubMed

    Verbruggen, Heroen; Marcelino, Vanessa R; Guiry, Michael D; Cremen, M Chiela M; Jackson, Christopher J

    2017-04-10

    The green algal genus Ostreobium is an important symbiont of corals, playing roles in reef decalcification and providing photosynthates to the coral during bleaching events. A chloroplast genome of a cultured strain of Ostreobium was available, but low taxon sampling and Ostreobium's early-branching nature left doubt about its phylogenetic position. Here we generate and describe chloroplast genomes from four Ostreobium strains as well as Avrainvillea mazei and Neomeris sp., strategically sampled early-branching lineages in the Bryopsidales and Dasycladales, respectively. At 80,584 bp, the chloroplast genome of Ostreobium sp. HV05042 is the most compact yet found in the Ulvophyceae. The Avrainvillea chloroplast genome is ca. 94 kbp and contains introns in infA and cysT that have nearly complete sequence identity except for an ORF in infA that is not present in cysT. In line with other bryopsidalean species, it also contains regions with possibly bacteria-derived ORFs. The Neomeris data did not assemble into a canonical circular chloroplast genome but a large number of contigs containing fragments of chloroplast genes and showing evidence of long introns and intergenic regions, and the Neomeris chloroplast genome size was estimated to exceed 1.87 Mb. Chloroplast phylogenomics and 18S nrDNA data showed strong support for the Ostreobium lineage being sister to the remaining Bryopsidales. There were differences in branch support when outgroups were varied, but the overall support for the placement of Ostreobium was strong. These results permitted us to validate two suborders and introduce a third, the Ostreobineae. This article is protected by copyright. All rights reserved.

  1. Crustal structure across southern Mexico inferred from gravity data

    NASA Astrophysics Data System (ADS)

    Campos-Enríquez, J. O.; Sánchez-Zamora, O.

    2000-11-01

    We present a gravity model of the crustal structure in southern Mexico based on interpretation of a detailed marine gravity profile perpendicularly across the Middle America Trench offshore from Acapulco, and a regional gravity transect extending into continental Mexico across the Sierra Madre del Sur, the central sector of the Trans-Mexican Volcanic Belt, the Sierra Madre Oriental, the Coastal Plain, and into the Gulf of Mexico. The elastic thickness of the Cocos lithospheric plate was found to be 30 km. In agreement with a previous seismic refraction study, no major differences in crustal structure were observed on both sides of the O'Gorman Fracture Zone. The gravity high seaward of the trench is interpreted as due to the incipient flexure and crustal thinning. The gravity low at the axis of the trench is explained by the increase in water depth and the existence of low-density accreted or continental-derived sediments (2.25 and 2.40 g/cm 3). A gravity high of 50 mGal extending about 100 km landward is interpreted as caused by local shoaling of the Moho. The crust attains a thickness of 42 km under the Trans-Mexican Volcanic Belt but thins beneath the Coastal Plain and the continental slope of the Gulf of Mexico. Gravity highs around the Sierra de Tamaulipas are interpreted in terms of relief of the lower-upper crustal interface, implying a shallow basement.

  2. Sparse Bayesian Inference and the Temperature Structure of the Solar Corona

    NASA Astrophysics Data System (ADS)

    Warren, Harry P.; Byers, Jeff M.; Crump, Nicholas A.

    2017-02-01

    Measuring the temperature structure of the solar atmosphere is critical to understanding how it is heated to high temperatures. Unfortunately, the temperature of the upper atmosphere cannot be observed directly, but must be inferred from spectrally resolved observations of individual emission lines that span a wide range of temperatures. Such observations are “inverted” to determine the distribution of plasma temperatures along the line of sight. This inversion is ill posed and, in the absence of regularization, tends to produce wildly oscillatory solutions. We introduce the application of sparse Bayesian inference to the problem of inferring the temperature structure of the solar corona. Within a Bayesian framework a preference for solutions that utilize a minimum number of basis functions can be encoded into the prior and many ad hoc assumptions can be avoided. We demonstrate the efficacy of the Bayesian approach by considering a test library of 40 assumed temperature distributions.

  3. Inference of Candidate Germline Mutator Loci in Humans from Genome-Wide Haplotype Data

    PubMed Central

    2017-01-01

    The rate of germline mutation varies widely between species but little is known about the extent of variation in the germline mutation rate between individuals of the same species. Here we demonstrate that an allele that increases the rate of germline mutation can result in a distinctive signature in the genomic region linked to the affected locus, characterized by a number of haplotypes with a locally high proportion of derived alleles, against a background of haplotypes carrying a typical proportion of derived alleles. We searched for this signature in human haplotype data from phase 3 of the 1000 Genomes Project and report a number of candidate mutator loci, several of which are located close to or within genes involved in DNA repair or the DNA damage response. To investigate whether mutator alleles remained active at any of these loci, we used de novo mutation counts from human parent-offspring trios in the 1000 Genomes and Genome of the Netherlands cohorts, looking for an elevated number of de novo mutations in the offspring of parents carrying a candidate mutator haplotype at each of these loci. We found some support for two of the candidate loci, including one locus just upstream of the BRSK2 gene, which is expressed in the testis and has been reported to be involved in the response to DNA damage. PMID:28095480

  4. Submarine structure of Reunion Island (Indian Ocean) inferred from gravity

    NASA Astrophysics Data System (ADS)

    Gailler, L.; Lénat, J.

    2008-12-01

    La Reunion is a large (diameter: 220 km; height: 7 km), mostly immerged (97%) oceanic volcanic system. New land and marine gravity data are used to study the structure of its submarine part. The gravity models are interpreted jointly with the published geology interpretations and compared with magnetic models. This allows us to derive a new model of the shallow and internal structure of the submarine flanks. Recent cruises have collected high quality gravity, magnetic and multi-beam swath bathymetry data over the submarine flanks of La Réunion and the surrounding oceanic plate. A new Bouguer anomaly map has been computed for a reduction density of 2.67.103 kg m-3. A magnetic anomalies map covering the same area has been also built. Studies based on bathymetric and acoustic data have previously shown the presence of different types of submarine features: a coastal shelf, huge bulges built by debris avalanches and sediment deposits, erosion canyons, volcanic constructions near the coast, isolated seamounts offshore, and elongate volcanic ridges on the Mascarene plate. On the new Bouguer anomaly map, all these features are associated with negative anomalies. They have been modeled using 2 3/4 D modeling techniques. The short wavelength anomalies over the coastal shelf area can be explained by piles of low density layers. This suggests that they are mostly built by hyaloclastites which are generally characterized by lower densities than lava flows. The voluminous debris avalanche deposits which formed the huge Submarine Bulges to the east, north, west, and south of the island have also been modeled as low density formations. Each bulge is modeled with an overall density less than 2.67.103 kg m-3, in order to account for its long wavelength anomaly. Some shorter wavelength features are superimposed on these long wavelength negative anomalies. They probably represent heterogeneities within the bulges. Some shallow ones can be associated with observed surface geological

  5. Crustal structure beneath northeast India inferred from receiver function modeling

    NASA Astrophysics Data System (ADS)

    Borah, Kajaljyoti; Bora, Dipok K.; Goyal, Ayush; Kumar, Raju

    2016-09-01

    We estimated crustal shear velocity structure beneath ten broadband seismic stations of northeast India, by using H-Vp/Vs stacking method and a non-linear direct search approach, Neighbourhood Algorithm (NA) technique followed by joint inversion of Rayleigh wave group velocity and receiver function, calculated from teleseismic earthquakes data. Results show significant variations of thickness, shear velocities (Vs) and Vp/Vs ratio in the crust of the study region. The inverted shear wave velocity models show crustal thickness variations of 32-36 km in Shillong Plateau (North), 36-40 in Assam Valley and ∼44 km in Lesser Himalaya (South). Average Vp/Vs ratio in Shillong Plateau is less (1.73-1.77) compared to Assam Valley and Lesser Himalaya (∼1.80). Average crustal shear velocity beneath the study region varies from 3.4 to 3.5 km/s. Sediment structure beneath Shillong Plateau and Assam Valley shows 1-2 km thick sediment layer with low Vs (2.5-2.9 km/s) and high Vp/Vs ratio (1.8-2.1), while it is observed to be of greater thickness (4 km) with similar Vs and high Vp/Vs (∼2.5) in RUP (Lesser Himalaya). Both Shillong Plateau and Assam Valley show thick upper and middle crust (10-20 km), and thin (4-9 km) lower crust. Average Vp/Vs ratio in Assam Valley and Shillong Plateau suggest that the crust is felsic-to-intermediate and intermediate-to-mafic beneath Shillong Plateau and Assam Valley, respectively. Results show that lower crust rocks beneath the Shillong Plateau and Assam Valley lies between mafic granulite and mafic garnet granulite.

  6. A Novel Candidate Vaccine for Cytauxzoonosis Inferred from Comparative Apicomplexan Genomics

    PubMed Central

    Tarigo, Jaime L.; Scholl, Elizabeth H.; Bird, David McK.; Brown, Corrie C.; Cohn, Leah A.; Dean, Gregg A.; Levy, Michael G.; Doolan, Denise L.; Trieu, Angela; Nordone, Shila K.; Felgner, Philip L.; Vigil, Adam; Birkenheuer, Adam J.

    2013-01-01

    Cytauxzoonosis is an emerging infectious disease of domestic cats (Felis catus) caused by the apicomplexan protozoan parasite Cytauxzoon felis. The growing epidemic, with its high morbidity and mortality points to the need for a protective vaccine against cytauxzoonosis. Unfortunately, the causative agent has yet to be cultured continuously in vitro, rendering traditional vaccine development approaches beyond reach. Here we report the use of comparative genomics to computationally and experimentally interpret the C. felis genome to identify a novel candidate vaccine antigen for cytauxzoonosis. As a starting point we sequenced, assembled, and annotated the C. felis genome and the proteins it encodes. Whole genome alignment revealed considerable conserved synteny with other apicomplexans. In particular, alignments with the bovine parasite Theileria parva revealed that a C. felis gene, cf76, is syntenic to p67 (the leading vaccine candidate for bovine theileriosis), despite a lack of significant sequence similarity. Recombinant subdomains of cf76 were challenged with survivor-cat antiserum and found to be highly seroreactive. Comparison of eleven geographically diverse samples from the south-central and southeastern USA demonstrated 91–100% amino acid sequence identity across cf76, including a high level of conservation in an immunogenic 226 amino acid (24 kDa) carboxyl terminal domain. Using in situ hybridization, transcription of cf76 was documented in the schizogenous stage of parasite replication, the life stage that is believed to be the most important for development of a protective immune response. Collectively, these data point to identification of the first potential vaccine candidate antigen for cytauxzoonosis. Further, our bioinformatic approach emphasizes the use of comparative genomics as an accelerated path to developing vaccines against experimentally intractable pathogens. PMID:23977000

  7. Chromosomal instability in Afrotheria: fragile sites, evolutionary breakpoints and phylogenetic inference from genome sequence assemblies

    PubMed Central

    Ruiz-Herrera, Aurora; Robinson, Terence J

    2007-01-01

    Background Extant placental mammals are divided into four major clades (Laurasiatheria, Supraprimates, Xenarthra and Afrotheria). Given that Afrotheria is generally thought to root the eutherian tree in phylogenetic analysis of large nuclear gene data sets, the study of the organization of the genomes of afrotherian species provides new insights into the dynamics of mammalian chromosomal evolution. Here we test if there are chromosomal bands with a high tendency to break and reorganize in Afrotheria, and by analyzing the expression of aphidicolin-induced common fragile sites in three afrotherian species, whether these are coincidental with recognized evolutionary breakpoints. Results We described 29 fragile sites in the aardvark (OAF) genome, 27 in the golden mole (CAS), and 35 in the elephant-shrew (EED) genome. We show that fragile sites are conserved among afrotherian species and these are correlated with evolutionary breakpoints when compared to the human (HSA) genome. Inddition, by computationally scanning the newly released opossum (Monodelphis domestica) and chicken sequence assemblies for use as outgroups to Placentalia, we validate the HSA 3/21/5 chromosomal synteny as a rare genomic change that defines the monophyly of this ancient African clade of mammals. On the other hand, support for HSA 1/19p, which is also thought to underpin Afrotheria, is currently ambiguous. Conclusion We provide evidence that (i) the evolutionary breakpoints that characterise human syntenies detected in the basal Afrotheria correspond at the chromosomal band level with fragile sites, (ii) that HSA 3p/21 was in the amniote ancestor (i.e., common to turtles, lepidosaurs, crocodilians, birds and mammals) and was subsequently disrupted in the lineage leading to marsupials. Its expansion to include HSA 5 in Afrotheria is unique and (iii) that its fragmentation to HSA 3p/21 + HSA 5/21 in elephant and manatee was due to a fission within HSA 21 that is probably shared by all

  8. Phylogeny and physiology of candidate phylum 'Atribacteria' (OP9/JS1) inferred from cultivation-independent genomics.

    PubMed

    Nobu, Masaru K; Dodsworth, Jeremy A; Murugapiran, Senthil K; Rinke, Christian; Gies, Esther A; Webster, Gordon; Schwientek, Patrick; Kille, Peter; Parkes, R John; Sass, Henrik; Jørgensen, Bo B; Weightman, Andrew J; Liu, Wen-Tso; Hallam, Steven J; Tsiamis, George; Woyke, Tanja; Hedlund, Brian P

    2016-02-01

    The 'Atribacteria' is a candidate phylum in the Bacteria recently proposed to include members of the OP9 and JS1 lineages. OP9 and JS1 are globally distributed, and in some cases abundant, in anaerobic marine sediments, geothermal environments, anaerobic digesters and reactors and petroleum reservoirs. However, the monophyly of OP9 and JS1 has been questioned and their physiology and ecology remain largely enigmatic due to a lack of cultivated representatives. Here cultivation-independent genomic approaches were used to provide a first comprehensive view of the phylogeny, conserved genomic features and metabolic potential of members of this ubiquitous candidate phylum. Previously available and heretofore unpublished OP9 and JS1 single-cell genomic data sets were used as recruitment platforms for the reconstruction of atribacterial metagenome bins from a terephthalate-degrading reactor biofilm and from the monimolimnion of meromictic Sakinaw Lake. The single-cell genomes and metagenome bins together comprise six species- to genus-level groups that represent most major lineages within OP9 and JS1. Phylogenomic analyses of these combined data sets confirmed the monophyly of the 'Atribacteria' inclusive of OP9 and JS1. Additional conserved features within the 'Atribacteria' were identified, including a gene cluster encoding putative bacterial microcompartments that may be involved in aldehyde and sugar metabolism, energy conservation and carbon storage. Comparative analysis of the metabolic potential inferred from these data sets revealed that members of the 'Atribacteria' are likely to be heterotrophic anaerobes that lack respiratory capacity, with some lineages predicted to specialize in either primary fermentation of carbohydrates or secondary fermentation of organic acids, such as propionate.

  9. Phylogeny and physiology of candidate phylum ‘Atribacteria' (OP9/JS1) inferred from cultivation-independent genomics

    PubMed Central

    Nobu, Masaru K; Dodsworth, Jeremy A; Murugapiran, Senthil K; Rinke, Christian; Gies, Esther A; Webster, Gordon; Schwientek, Patrick; Kille, Peter; Parkes, R John; Sass, Henrik; Jørgensen, Bo B; Weightman, Andrew J; Liu, Wen-Tso; Hallam, Steven J; Tsiamis, George; Woyke, Tanja; Hedlund, Brian P

    2016-01-01

    The ‘Atribacteria' is a candidate phylum in the Bacteria recently proposed to include members of the OP9 and JS1 lineages. OP9 and JS1 are globally distributed, and in some cases abundant, in anaerobic marine sediments, geothermal environments, anaerobic digesters and reactors and petroleum reservoirs. However, the monophyly of OP9 and JS1 has been questioned and their physiology and ecology remain largely enigmatic due to a lack of cultivated representatives. Here cultivation-independent genomic approaches were used to provide a first comprehensive view of the phylogeny, conserved genomic features and metabolic potential of members of this ubiquitous candidate phylum. Previously available and heretofore unpublished OP9 and JS1 single-cell genomic data sets were used as recruitment platforms for the reconstruction of atribacterial metagenome bins from a terephthalate-degrading reactor biofilm and from the monimolimnion of meromictic Sakinaw Lake. The single-cell genomes and metagenome bins together comprise six species- to genus-level groups that represent most major lineages within OP9 and JS1. Phylogenomic analyses of these combined data sets confirmed the monophyly of the ‘Atribacteria' inclusive of OP9 and JS1. Additional conserved features within the ‘Atribacteria' were identified, including a gene cluster encoding putative bacterial microcompartments that may be involved in aldehyde and sugar metabolism, energy conservation and carbon storage. Comparative analysis of the metabolic potential inferred from these data sets revealed that members of the ‘Atribacteria' are likely to be heterotrophic anaerobes that lack respiratory capacity, with some lineages predicted to specialize in either primary fermentation of carbohydrates or secondary fermentation of organic acids, such as propionate. PMID:26090992

  10. Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach

    PubMed Central

    Boitard, Simon; Rodríguez, Willy; Jay, Flora; Mona, Stefano; Austerlitz, Frédéric

    2016-01-01

    Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles. PMID:26943927

  11. Seismic structure of the oceanic lithosphere inferred from guided wave

    NASA Astrophysics Data System (ADS)

    Shito, A.; Suetsugu, D.; Furumura, T.; Sugioka, H.; Ito, A.

    2012-12-01

    Characteristic seismic waves are observed by seismological experiment using Broad-Band Ocean Bottom Seismometers (BBOBSs) conducted in the northwestern Pacific from 2007 to 2008 and from 2010 to 2011. The seismic waves have low frequency onset (< 1 Hz) followed by high frequency later phases (2.5-10 Hz). The high frequency later phases have large amplitude and long duration for both P and S waves. The seismic waves are observed commonly at the BBOBS array from events in the subducting Pacific plate. To investigate generation and propagation mechanisms of the seismic wave will help us to understand the seismic structure and the origin of the oceanic lithosphere. High frequency phases travelling efficiently through the oceanic lithosphere more than 3000 km are well known phenomenon. These phases were previously called as Po/So waves. Po/So waves were observed as early as 1935, and were studied actively from the 1970s to 1990s. However, the mechanism of generation and propagation of the phases are still controversial. The guided waves propagating in subducting plate are also common phenomenon in the subduction zone. The waves are generally characterized by separation of low frequency and high frequency components. In order to explain the separation, Martin and Rietbrock [2003] considered the trapping of waves in the waveguide formed by thin low velocity former oceanic crust at the top of the plate. However, large amplitude and long duration of the high frequency component cannot be achieved by the model. From the analysis of waveform observed at the eastern seaboard of northern Japan and numerical simulation of seismic wave propagation, Furumura and Kennet [2005] demonstrate that the guided wave travelling in the subducting plate is produced by multiple forward scattering of high-frequency seismic waves due to small-scale random heterogeneity in the plate structure. We apply the method proposed by Furumura and Kennett [2005] to reproduce the seismograms recorded by

  12. Upper Mantle Structure beneath Afar: inferences from surface waves.

    NASA Astrophysics Data System (ADS)

    Sicilia, D.; Montagner, J.; Debayle, E.; Lepine, J.; Leveque, J.; Cara, M.; Ataley, A.; Sholan, J.

    2001-12-01

    The Afar hotspot is related to one of the most important plume from a geodynamic point of view. It has been advocated to be the surface expression of the South-West African Superswell. Below the lithosphere, the Afar plume might feed other hotspots in central Africa (Hadiouche et al., 1989; Ebinger & Sleep, 1998). The processes of interaction between crust, lithosphere and plume are not well understood. In order to gain insight into the scientific issue, we have performed a surface-wave tomography covering the Horn of Africa. A data set of 1404 paths for Rayleigh waves and 473 paths for Love waves was selected in the period range 45-200s. They were collected from the permanent IRIS and GEOSCOPE networks and from the PASSCAL experiment, in Tanzania and Saudi Arabia. Other data come from the broadband stations deployed in Ethiopia and Yemen in the framework of the French INSU program ``Horn of Africa''. The results presented here come from a path average phase velocities obtained with a method based on a least-squares minimization (Beucler et al., 2000). The local phase velocity distribution and the azimuthal anisotropy were simultaneously retrieved by using the tomographic technique of Montagner (1986). A correction of the data is applied according to the crustal structure of the 3SMAC model (Nataf & Ricard, 1996). We find low velocities down to 200 km depth beneath the Red Sea, the Gulf of Aden, Afars, the Ethiopian Plateau and southern Arabia. High velocities are present in the eastern Arabia and the Tanzania Craton. The anisotropy beneath Afar seems to be complex, but enables to map the flow pattern at the interface lithosphere-asthenosphere. The results presented here are complementary to those obtained by Debayle et al. (2001) at upper-mantle transition zone depths using waveform inversion of higher Rayle igh modes.

  13. Linear Models for Item Scores: Reliability, Covariance Structure, and Psychometric Inference.

    ERIC Educational Resources Information Center

    Woodruff, David

    Two analyses of variance (ANOVA) models for item scores are compared. The first is an items by subject random effect ANOVA. The second is a mixed effects ANOVA with items fixed and subjects random. Comparisons regarding reliability, Cronbach's alpha coefficient, psychometric inference, and inter-item covariance structure are made between the…

  14. Karyotypic evolution of the family Sciuridae: inferences from the genome organizations of ground squirrels.

    PubMed

    Li, T; Wang, J; Su, W; Nie, W; Yang, F

    2006-01-01

    Cross-species chromosome painting has made a great contribution to our understanding of the evolution of karyotypes and genome organizations of mammals. Several recent papers of comparative painting between tree and flying squirrels have shed some light on the evolution of the family Sciuridae and the order Rodentia. In the present study we have extended the comparative painting to the Himalayan marmot (Marmotahimalayana) and the African ground squirrel (Xerus cf. erythropus), i.e. representative species from another important squirrel group--the ground squirrels--, and have established genome-wide comparative chromosome maps between human, eastern gray squirrel, and these two ground squirrels. The results show that 1) the squirrels so far studied all have conserved karyotypes that resemble the ancestral karyotype of the order Rodentia; 2) the African ground squirrels could have retained the ancestral karyotype of the family Sciuridae. Furthermore, we have mapped the evolutionary rearrangements onto a molecular-based consensus phylogenetic tree of the family Sciuridae.

  15. Simple Math is Enough: Two Examples of Inferring Functional Associations from Genomic Data

    NASA Technical Reports Server (NTRS)

    Liang, Shoudan

    2003-01-01

    Non-random features in the genomic data are usually biologically meaningful. The key is to choose the feature well. Having a p-value based score prioritizes the findings. If two proteins share a unusually large number of common interaction partners, they tend to be involved in the same biological process. We used this finding to predict the functions of 81 un-annotated proteins in yeast.

  16. Low rate of genomic repatterning in Xenarthra inferred from chromosome painting data.

    PubMed

    Dobigny, G; Yang, F; O'Brien, P C M; Volobouev, V; Kovács, A; Pieczarka, J C; Ferguson-Smith, M A; Robinson, T J

    2005-01-01

    Comparative cytogenetic studies on Xenarthra, one of the most basal mammalian clades in the Placentalia, are virtually absent, being restricted largely to descriptions of conventional karyotypes and diploid numbers. We present a molecular cytogenetic comparison of chromosomes from the two-toed (Choloepus didactylus, 2n = 65) and three-toed sloth species (Bradypus tridactylus, 2n = 52), an anteater (Tamandua tetradactyla, 2n = 54) which, together with some data on the six-banded armadillo (Euphractus sexcinctus, 2n = 58), collectively represent all the major xenarthran lineages. Our results, based on interspecific chromosome painting using flow-sorted two-toed sloth chromosomes as painting probes, show the sloth species to be karyotypically closely related but markedly different from the anteater. We also test the synteny disruptions and segmental associations identified within Pilosa (anteaters and sloths) against the chromosomes of the six-banded armadillo as outgroup taxon. We could thus polarize the 35 non-ambiguously identified chromosomal changes characterizing the evolution of the anteater and sloth genomes and map these to a published sequence-based phylogeny for the group. These data suggest a low rate of genomic repatterning when placed in the context of divergence estimates based on molecular and fossil data. Finally, our results provide a glimpse of a likely ancestral karyotype for the extant Xenarthra, a pivotal group for understanding eutherian genome evolution.

  17. Inferring action structure and causal relationships in continuous sequences of human action.

    PubMed

    Buchsbaum, Daphna; Griffiths, Thomas L; Plunkett, Dillon; Gopnik, Alison; Baldwin, Dare

    2015-02-01

    In the real world, causal variables do not come pre-identified or occur in isolation, but instead are embedded within a continuous temporal stream of events. A challenge faced by both human learners and machine learning algorithms is identifying subsequences that correspond to the appropriate variables for causal inference. A specific instance of this problem is action segmentation: dividing a sequence of observed behavior into meaningful actions, and determining which of those actions lead to effects in the world. Here we present a Bayesian analysis of how statistical and causal cues to segmentation should optimally be combined, as well as four experiments investigating human action segmentation and causal inference. We find that both people and our model are sensitive to statistical regularities and causal structure in continuous action, and are able to combine these sources of information in order to correctly infer both causal relationships and segmentation boundaries.

  18. A Symmetry-Based Method to Infer Structural Brain Networks from Probabilistic Tractography Data

    PubMed Central

    Shadi, Kamal; Bakhshi, Saideh; Gutman, David A.; Mayberg, Helen S.; Dovrolis, Constantine

    2016-01-01

    Recent progress in diffusion MRI and tractography algorithms as well as the launch of the Human Connectome Project (HCP)1 have provided brain research with an abundance of structural connectivity data. In this work, we describe and evaluate a method that can infer the structural brain network that interconnects a given set of Regions of Interest (ROIs) from probabilistic tractography data. The proposed method, referred to as Minimum Asymmetry Network Inference Algorithm (MANIA), does not determine the connectivity between two ROIs based on an arbitrary connectivity threshold. Instead, we exploit a basic limitation of the tractography process: the observed streamlines from a source to a target do not provide any information about the polarity of the underlying white matter, and so if there are some fibers connecting two voxels (or two ROIs) X and Y, tractography should be able in principle to follow this connection in both directions, from X to Y and from Y to X. We leverage this limitation to formulate the network inference process as an optimization problem that minimizes the (appropriately normalized) asymmetry of the observed network. We evaluate the proposed method using both the FiberCup dataset and based on a noise model that randomly corrupts the observed connectivity of synthetic networks. As a case-study, we apply MANIA on diffusion MRI data from 28 healthy subjects to infer the structural network between 18 corticolimbic ROIs that are associated with various neuropsychiatric conditions including depression, anxiety and addiction. PMID:27867354

  19. Climate-induced changes in lake ecosystem structure inferred from coupled neo- and paleoecological approaches

    USGS Publications Warehouse

    Saros, Jasmine E.; Stone, Jeffery R.; Pederson, Gregory T.; Slemmons, Krista; Spanbauer, Trisha; Schliep, Anna; Cahl, Douglas; Williamson, Craig E.; Engstrom, Daniel R.

    2015-01-01

    Over the 20th century, surface water temperatures have increased in many lake ecosystems around the world, but long-term trends in the vertical thermal structure of lakes remain unclear, despite the strong control that thermal stratification exerts on the biological response of lakes to climate change. Here we used both neo- and paleoecological approaches to develop a fossil-based inference model for lake mixing depths and thereby refine understanding of lake thermal structure change. We focused on three common planktonic diatom taxa, the distributions of which previous research suggests might be affected by mixing depth. Comparative lake surveys and growth rate experiments revealed that these species respond to lake thermal structure when nitrogen is sufficient, with species optima ranging from shallower to deeper mixing depths. The diatom-based mixing depth model was applied to sedimentary diatom profiles extending back to 1750 AD in two lakes with moderate nitrate concentrations but differing climate settings. Thermal reconstructions were consistent with expected changes, with shallower mixing depths inferred for an alpine lake where treeline has advanced, and deeper mixing depths inferred for a boreal lake where wind strength has increased. The inference model developed here provides a new tool to expand and refine understanding of climate-induced changes in lake ecosystems.

  20. Models of earth structure inferred from neodymium and strontium isotopic abundances

    PubMed Central

    Wasserburg, G. J.; DePaolo, D. J.

    1979-01-01

    A simplified model of earth structure based on the Nd and Sr isotopic characteristics of oceanic and continental tholeiitic flood basalts is presented, taking into account the motion of crustal plates and a chemical balance for trace elements. The resulting structure that is inferred consists of a lower mantle that is still essentially undifferentiated, overlain by an upper mantle that is the residue of the original source from which the continents were derived. PMID:16592688

  1. ARG-walker: inference of individual specific strengths of meiotic recombination hotspots by population genomics analysis

    PubMed Central

    2015-01-01

    Background Meiotic recombination hotspots play important roles in various aspects of genomics, but the underlying mechanisms for regulating the locations and strengths of recombination hotspots are not yet fully revealed. Most existing algorithms for estimating recombination rates from sequence polymorphism data can only output average recombination rates of a population, although there is evidence for the heterogeneity in recombination rates among individuals. For genome-wide association studies (GWAS) of recombination hotspots, an efficient algorithm that estimates the individualized strengths of recombination hotspots is highly desirable. Results In this work, we propose a novel graph mining algorithm named ARG-walker, based on random walks on ancestral recombination graphs (ARG), to estimate individual-specific recombination hotspot strengths. Extensive simulations demonstrate that ARG-walker is able to distinguish the hot allele of a recombination hotspot from the cold allele. Integrated with output of ARG-walker, we performed GWAS on the phased haplotype data of the 22 autosome chromosomes of the HapMap Asian population samples of Chinese and Japanese (JPT+CHB). Significant cis-regulatory signals have been detected, which is corroborated by the enrichment of the well-known 13-mer motif CCNCCNTNNCCNC of PRDM9 protein. Moreover, two new DNA motifs have been identified in the flanking regions of the significantly associated SNPs (single nucleotide polymorphisms), which are likely to be new cis-regulatory elements of meiotic recombination hotspots of the human genome. Conclusions Our results on both simulated and real data suggest that ARG-walker is a promising new method for estimating the individual recombination variations. In the future, it could be used to uncover the mechanisms of recombination regulation and human diseases related with recombination hotspots. PMID:26679564

  2. Genomic hypomethylation in the human germline associates with selective structural mutability in the human genome.

    PubMed

    Li, Jian; Harris, R Alan; Cheung, Sau Wai; Coarfa, Cristian; Jeong, Mira; Goodell, Margaret A; White, Lisa D; Patel, Ankita; Kang, Sung-Hae; Shaw, Chad; Chinault, A Craig; Gambin, Tomasz; Gambin, Anna; Lupski, James R; Milosavljevic, Aleksandar

    2012-01-01

    The hotspots of structural polymorphisms and structural mutability in the human genome remain to be explained mechanistically. We examine associations of structural mutability with germline DNA methylation and with non-allelic homologous recombination (NAHR) mediated by low-copy repeats (LCRs). Combined evidence from four human sperm methylome maps, human genome evolution, structural polymorphisms in the human population, and previous genomic and disease studies consistently points to a strong association of germline hypomethylation and genomic instability. Specifically, methylation deserts, the ~1% fraction of the human genome with the lowest methylation in the germline, show a tenfold enrichment for structural rearrangements that occurred in the human genome since the branching of chimpanzee and are highly enriched for fast-evolving loci that regulate tissue-specific gene expression. Analysis of copy number variants (CNVs) from 400 human samples identified using a custom-designed array comparative genomic hybridization (aCGH) chip, combined with publicly available structural variation data, indicates that association of structural mutability with germline hypomethylation is comparable in magnitude to the association of structural mutability with LCR-mediated NAHR. Moreover, rare CNVs occurring in the genomes of individuals diagnosed with schizophrenia, bipolar disorder, and developmental delay and de novo CNVs occurring in those diagnosed with autism are significantly more concentrated within hypomethylated regions. These findings suggest a new connection between the epigenome, selective mutability, evolution, and human disease.

  3. Inferring Properties of Ancient Cyanobacteria from Biogeochemical Activity and Genomes of Siderophilic Cyanobacteria

    NASA Technical Reports Server (NTRS)

    McKay, David S.; Brown, I. I.; Tringe, S. G.; Thomas-Keprta, K. E.; Bryant, D. A.; Sarkisova, S. S.; Malley, K.; Sosa, O.; Klatt, C. G.; McKay, D. S.

    2010-01-01

    Interrelationships between life and the planetary system could have simultaneously left landmarks in genomes of microbes and physicochemical signatures in the lithosphere. Verifying the links between genomic features in living organisms and the mineralized signatures generated by these organisms will help to reveal traces of life on Earth and beyond. Among contemporary environments, iron-depositing hot springs (IDHS) may represent one of the most appropriate natural models [1] for insights into ancient life since organisms may have originated on Earth and probably Mars in association with hydrothermal activity [2,3]. IDHS also seem to be appropriate models for studying certain biogeochemical processes that could have taken place in the late Archean and,-or early Paleoproterozoic eras [4, 5]. It has been suggested that inorganic polyphosphate (PPi), in chains of tens to hundreds of phosphate residues linked by high-energy bonds, is environmentally ubiquitous and abundant [6]. Cyanobacteria (CB) react to increased heavy metal concentrations and UV by enhanced generation of PPi bodies (PPB) [7], which are believed to be signatures of life [8]. However, the role of PPi in oxygenic prokaryotes for the suppression of oxidative stress induced by high Fe is poorly studied. Here we present preliminary results of a new mechanism of Fe mineralization in oxygenic prokaryotes, the effect of Fe on the generation of PPi bodies in CB, as well as preliminary analysis of the diversity and phylogeny of proteins involved in the prevention of oxidative stress in phototrophs inhabiting IDHS.

  4. Inferring the choreography of parental genomes during fertilization from ultralarge-scale whole-transcriptome analysis.

    PubMed

    Park, Sung-Joon; Komata, Makiko; Inoue, Fukashi; Yamada, Kaori; Nakai, Kenta; Ohsugi, Miho; Shirahige, Katsuhiko

    2013-12-15

    Fertilization precisely choreographs parental genomes by using gamete-derived cellular factors and activating genome regulatory programs. However, the mechanism remains elusive owing to the technical difficulties of preparing large numbers of high-quality preimplantation cells. Here, we collected >14 × 10(4) high-quality mouse metaphase II oocytes and used these to establish detailed transcriptional profiles for four early embryo stages and parthenogenetic development. By combining these profiles with other public resources, we found evidence that gene silencing appeared to be mediated in part by noncoding RNAs and that this was a prerequisite for post-fertilization development. Notably, we identified 817 genes that were differentially expressed in embryos after fertilization compared with parthenotes. The regulation of these genes was distinctly different from those expressed in parthenotes, suggesting functional specialization of particular transcription factors prior to first cell cleavage. We identified five transcription factors that were potentially necessary for developmental progression: Foxd1, Nkx2-5, Sox18, Myod1, and Runx1. Our very large-scale whole-transcriptome profile of early mouse embryos yielded a novel and valuable resource for studies in developmental biology and stem cell research. The database is available at http://dbtmee.hgc.jp.

  5. Primate phylogenetic relationships and divergence dates inferred from complete mitochondrial genomes

    PubMed Central

    Hodgson, Jason A.; Burrell, Andrew S.; Sterner, Kirstin N.; Raaum, Ryan L.; Disotell, Todd R.

    2014-01-01

    The origins and the divergence times of the most basal lineages within primates have been difficult to resolve mainly due to the incomplete sampling of early fossil taxa. The main source of contention is related to the discordance between molecular and fossil estimates: while there are no crown primate fossils older than 56 Ma, most molecule-based estimates extend the origins of crown primates into the Cretaceous. Here we present a comprehensive mitogenomic study of primates. We assembled 87 mammalian mitochondrial genomes, including 62 primate species representing all the families of the order. We newly sequenced eleven mitochondrial genomes, including eight Old World monkeys and three strepsirrhines. Phylogenetic analyses support a strong topology, confirming the monophyly for all the major primate clades. In contrast to previous mitogenomic studies, the positions of tarsiers and colugos relative to strepsirrhines and anthropoids are well resolved. In order to improve our understanding of how fossil calibrations affect age estimates within primates, we explore the effect of seventeen fossil calibrations across primates and other mammalian groups and we select a subset of calibrations to date our mitogenomic tree. The divergence date estimates of the Strepsirrhine/Haplorhine split support an origin of crown primates in the Late Cretaceous, at around 74 Ma. This result supports a short fuse model of primate origins, whereby relatively little time passed between the origin of the order and the diversification of its major clades. It also suggests that the early primate fossil record is likely poorly sampled. PMID:24583291

  6. Structure and function of the mammalian middle ear. II: Inferring function from structure.

    PubMed

    Mason, Matthew J

    2016-02-01

    Anatomists and zoologists who study middle ear morphology are often interested to know what the structure of an ear can reveal about the auditory acuity and hearing range of the animal in question. This paper represents an introduction to middle ear function targetted towards biological scientists with little experience in the field of auditory acoustics. Simple models of impedance matching are first described, based on the familiar concepts of the area and lever ratios of the middle ear. However, using the Mongolian gerbil Meriones unguiculatus as a test case, it is shown that the predictions made by such 'ideal transformer' models are generally not consistent with measurements derived from recent experimental studies. Electrical analogue models represent a better way to understand some of the complex, frequency-dependent responses of the middle ear: these have been used to model the effects of middle ear subcavities, and the possible function of the auditory ossicles as a transmission line. The concepts behind such models are explained here, again aimed at those with little background knowledge. Functional inferences based on middle ear anatomy are more likely to be valid at low frequencies. Acoustic impedance at low frequencies is dominated by compliance; expanded middle ear cavities, found in small desert mammals including gerbils, jerboas and the sengi Macroscelides, are expected to improve low-frequency sound transmission, as long as the ossicular system is not too stiff.

  7. Child Development and Structural Variation in the Human Genome

    ERIC Educational Resources Information Center

    Zhang, Ying; Haraksingh, Rajini; Grubert, Fabian; Abyzov, Alexej; Gerstein, Mark; Weissman, Sherman; Urban, Alexander E.

    2013-01-01

    Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretches of DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural variation has been shown to be far more common in human genomes than previously thought. Very little is currently known about the effects…

  8. King penguin demography since the last glaciation inferred from genome-wide data

    PubMed Central

    Trucchi, Emiliano; Gratton, Paolo; Whittington, Jason D.; Cristofari, Robin; Le Maho, Yvon; Stenseth, Nils Chr; Le Bohec, Céline

    2014-01-01

    How natural climate cycles, such as past glacial/interglacial patterns, have shaped species distributions at the high-latitude regions of the Southern Hemisphere is still largely unclear. Here, we show how the post-glacial warming following the Last Glacial Maximum (ca 18 000 years ago), allowed the (re)colonization of the fragmented sub-Antarctic habitat by an upper-level marine predator, the king penguin Aptenodytes patagonicus. Using restriction site-associated DNA sequencing and standard mitochondrial data, we tested the behaviour of subsets of anonymous nuclear loci in inferring past demography through coalescent-based and allele frequency spectrum analyses. Our results show that the king penguin population breeding on Crozet archipelago steeply increased in size, closely following the Holocene warming recorded in the Epica Dome C ice core. The following population growth can be explained by a threshold model in which the ecological requirements of this species (year-round ice-free habitat for breeding and access to a major source of food such as the Antarctic Polar Front) were met on Crozet soon after the Pleistocene/Holocene climatic transition. PMID:24920481

  9. King penguin demography since the last glaciation inferred from genome-wide data.

    PubMed

    Trucchi, Emiliano; Gratton, Paolo; Whittington, Jason D; Cristofari, Robin; Le Maho, Yvon; Stenseth, Nils Chr; Le Bohec, Céline

    2014-07-22

    How natural climate cycles, such as past glacial/interglacial patterns, have shaped species distributions at the high-latitude regions of the Southern Hemisphere is still largely unclear. Here, we show how the post-glacial warming following the Last Glacial Maximum (ca 18 000 years ago), allowed the (re)colonization of the fragmented sub-Antarctic habitat by an upper-level marine predator, the king penguin Aptenodytes patagonicus. Using restriction site-associated DNA sequencing and standard mitochondrial data, we tested the behaviour of subsets of anonymous nuclear loci in inferring past demography through coalescent-based and allele frequency spectrum analyses. Our results show that the king penguin population breeding on Crozet archipelago steeply increased in size, closely following the Holocene warming recorded in the Epica Dome C ice core. The following population growth can be explained by a threshold model in which the ecological requirements of this species (year-round ice-free habitat for breeding and access to a major source of food such as the Antarctic Polar Front) were met on Crozet soon after the Pleistocene/Holocene climatic transition.

  10. Evolutionary landscape of amphibians emerging from ancient freshwater fish inferred from complete mitochondrial genomes.

    PubMed

    Wang, Xiao-Tong; Zhang, Yan-Feng; Wu, Qian; Zhang, Hao

    2012-05-04

    It is very interesting that the only extant marine amphibian is the marine frog, Fejervarya cancrivora. This study investigated the reasons for this apparent rarity by conducting a phylogenetic tree analysis of the complete mitochondrial genomes from 14 amphibians, 67 freshwater fishes, four migratory fishes, 35 saltwater fishes, and one hemichordate. The results showed that amphibians, living fossil fishes, and the common ancestors of modern fishes are phylogenetically separated. In general, amphibians, living fossil fishes, saltwater fishes, and freshwater fishes are clustered in different clades. This suggests that the ancestor of living amphibians arose from a type of primordial freshwater fish, rather than the coelacanth, lungfish, or modern saltwater fish. Modern freshwater fish and modern saltwater fish were probably separated from a common ancestor by a single event, caused by crustal movement.

  11. Conflicting genomic signals affect phylogenetic inference in four species of North American pines

    PubMed Central

    Koralewski, Tomasz E.; Mateos, Mariana; Krutovsky, Konstantin V.

    2016-01-01

    Adaptive evolutionary processes in plants may be accompanied by episodes of introgression, parallel evolution and incomplete lineage sorting that pose challenges in untangling species evolutionary history. Genus Pinus (pines) is one of the most abundant and most studied groups among gymnosperms, and a good example of a lineage where these phenomena have been observed. Pines are among the most ecologically and economically important plant species. Some, such as the pines of the southeastern USA (southern pines in subsection Australes), are subjects of intensive breeding programmes. Despite numerous published studies, the evolutionary history of Australes remains ambiguous and often controversial. We studied the phylogeny of four major southern pine species: shortleaf (Pinus echinata), slash (P. elliottii), longleaf (P. palustris) and loblolly (P. taeda), using sequences from 11 nuclear loci and maximum likelihood and Bayesian methods. Our analysis encountered resolution difficulties similar to earlier published studies. Although incomplete lineage sorting and introgression are two phenomena presumptively underlying our results, the phylogenetic inferences seem to be also influenced by the genes examined, with certain topologies supported by sets of genes sharing common putative functionalities. For example, genes involved in wood formation supported the clade echinata–taeda, genes linked to plant defence supported the clade echinata–elliottii and genes linked to water management properties supported the clade echinata–palustris. The support for these clades was very high and consistent across methods. We discuss the potential factors that could underlie these observations, including incomplete lineage sorting, hybridization and parallel or adaptive evolution. Our results likely reflect the relatively short evolutionary history of the subsection that is thought to have begun during the middle Miocene and has been influenced by climate fluctuations. PMID

  12. Conflicting genomic signals affect phylogenetic inference in four species of North American pines.

    PubMed

    Koralewski, Tomasz E; Mateos, Mariana; Krutovsky, Konstantin V

    2016-01-01

    Adaptive evolutionary processes in plants may be accompanied by episodes of introgression, parallel evolution and incomplete lineage sorting that pose challenges in untangling species evolutionary history. Genus Pinus (pines) is one of the most abundant and most studied groups among gymnosperms, and a good example of a lineage where these phenomena have been observed. Pines are among the most ecologically and economically important plant species. Some, such as the pines of the southeastern USA (southern pines in subsection Australes), are subjects of intensive breeding programmes. Despite numerous published studies, the evolutionary history of Australes remains ambiguous and often controversial. We studied the phylogeny of four major southern pine species: shortleaf (Pinus echinata), slash (P. elliottii), longleaf (P. palustris) and loblolly (P. taeda), using sequences from 11 nuclear loci and maximum likelihood and Bayesian methods. Our analysis encountered resolution difficulties similar to earlier published studies. Although incomplete lineage sorting and introgression are two phenomena presumptively underlying our results, the phylogenetic inferences seem to be also influenced by the genes examined, with certain topologies supported by sets of genes sharing common putative functionalities. For example, genes involved in wood formation supported the clade echinata-taeda, genes linked to plant defence supported the clade echinata-elliottii and genes linked to water management properties supported the clade echinata-palustris The support for these clades was very high and consistent across methods. We discuss the potential factors that could underlie these observations, including incomplete lineage sorting, hybridization and parallel or adaptive evolution. Our results likely reflect the relatively short evolutionary history of the subsection that is thought to have begun during the middle Miocene and has been influenced by climate fluctuations.

  13. Phylogeography of the fire-bellied toads Bombina: independent Pleistocene histories inferred from mitochondrial genomes.

    PubMed

    Hofman, Sebastian; Spolsky, Christina; Uzzell, Thomas; Cogălniceanu, Dan; Babik, Wiesław; Szymura, Jacek M

    2007-06-01

    The fire-bellied toads Bombina bombina and Bombina variegata, interbreed in a long, narrow zone maintained by a balance between selection and dispersal. Hybridization takes place between local, genetically differentiated groups. To quantify divergence between these groups and reconstruct their history and demography, we analysed nucleotide variation at the mitochondrial cytochrome b gene (1096 bp) in 364 individuals from 156 sites representing the entire range of both species. Three distinct clades with high sequence divergence (K2P = 8-11%) were distinguished. One clade grouped B. bombina haplotypes; the two other clades grouped B. variegata haplotypes. One B. variegata clade included only Carpathian individuals; the other represented B. variegata from the southwestern parts of its distribution: Southern and Western Europe (Balkano-Western lineage), Apennines, and the Rhodope Mountains. Differentiation between the Carpathian and Balkano-Western lineages, K2P approximately 8%, approached interspecific divergence. Deep divergence among European Bombina lineages suggests their preglacial origin, and implies long and largely independent evolutionary histories of the species. Multiple glacial refugia were identified in the lowlands adjoining the Black Sea, in the Carpathians, in the Balkans, and in the Apennines. The results of the nested clade and demographic analyses suggest drastic reductions of population sizes during the last glacial period, and significant demographic growth related to postglacial colonization. Inferred history, supported by fossil evidence, demonstrates that Bombina ranges underwent repeated contractions and expansions. Geographical concordance between morphology, allozymes, and mtDNA shows that previous episodes of interspecific hybridization have left no detectable mtDNA introgression. Either the admixed populations went extinct, or selection against hybrids hindered mtDNA gene flow in ancient hybrid zones.

  14. Module Anchored Network Inference: A Sequential Module-Based Approach to Novel Gene Network Construction from Genomic Expression Data on Human Disease Mechanism

    PubMed Central

    Keller, Susanna R.; Lee, Jae K.

    2017-01-01

    Different computational approaches have been examined and compared for inferring network relationships from time-series genomic data on human disease mechanisms under the recent Dialogue on Reverse Engineering Assessment and Methods (DREAM) challenge. Many of these approaches infer all possible relationships among all candidate genes, often resulting in extremely crowded candidate network relationships with many more False Positives than True Positives. To overcome this limitation, we introduce a novel approach, Module Anchored Network Inference (MANI), that constructs networks by analyzing sequentially small adjacent building blocks (modules). Using MANI, we inferred a 7-gene adipogenesis network based on time-series gene expression data during adipocyte differentiation. MANI was also applied to infer two 10-gene networks based on time-course perturbation datasets from DREAM3 and DREAM4 challenges. MANI well inferred and distinguished serial, parallel, and time-dependent gene interactions and network cascades in these applications showing a superior performance to other in silico network inference techniques for discovering and reconstructing gene network relationships. PMID:28197408

  15. Structure of the germline genome of Tetrahymena thermophila and relationship to the massively rearranged somatic genome.

    PubMed

    Hamilton, Eileen P; Kapusta, Aurélie; Huvos, Piroska E; Bidwell, Shelby L; Zafar, Nikhat; Tang, Haibao; Hadjithomas, Michalis; Krishnakumar, Vivek; Badger, Jonathan H; Caler, Elisabet V; Russ, Carsten; Zeng, Qiandong; Fan, Lin; Levin, Joshua Z; Shea, Terrance; Young, Sarah K; Hegarty, Ryan; Daza, Riza; Gujja, Sharvari; Wortman, Jennifer R; Birren, Bruce W; Nusbaum, Chad; Thomas, Jainy; Carey, Clayton M; Pritham, Ellen J; Feschotte, Cédric; Noto, Tomoko; Mochizuki, Kazufumi; Papazyan, Romeo; Taverna, Sean D; Dear, Paul H; Cassidy-Hanley, Donna M; Xiong, Jie; Miao, Wei; Orias, Eduardo; Coyne, Robert S

    2016-11-28

    The germline genome of the binucleated ciliate Tetrahymena thermophila undergoes programmed chromosome breakage and massive DNA elimination to generate the somatic genome. Here, we present a complete sequence assembly of the germline genome and analyze multiple features of its structure and its relationship to the somatic genome, shedding light on the mechanisms of genome rearrangement as well as the evolutionary history of this remarkable germline/soma differentiation. Our results strengthen the notion that a complex, dynamic, and ongoing interplay between mobile DNA elements and the host genome have shaped Tetrahymena chromosome structure, locally and globally. Non-standard outcomes of rearrangement events, including the generation of short-lived somatic chromosomes and excision of DNA interrupting protein-coding regions, may represent novel forms of developmental gene regulation. We also compare Tetrahymena's germline/soma differentiation to that of other characterized ciliates, illustrating the wide diversity of adaptations that have occurred within this phylum.

  16. Inferring regulatory elements from a whole genome. An analysis of Helicobacter pylori sigma(80) family of promoter signals.

    PubMed

    Vanet, A; Marsan, L; Labigne, A; Sagot, M F

    2000-03-24

    Helicobacter pylori is adapted to life in a unique niche, the gastric epithelium of primates. Its promoters may therefore be different from those of other bacteria. Here, we determine motifs possibly involved in the recognition of such promoter sequences by the RNA polymerase using a new motif identification method. An important feature of this method is that the motifs are sought with the least possible assumptions about what they may look like. The method starts by considering the whole genome of H. pylori and attempts to infer directly from it a description for a family of promoters. Thus, this approach differs from searching for such promoters with a previously established description. The two algorithms are based on the idea of inferring motifs by flexibly comparing words in the sequences with an external object, instead of between themselves. The first algorithm infers single motifs, the second a combination of two motifs separated from one another by strictly defined, sterically constrained distances. Besides independently finding motifs known to be present in other bacteria, such as the Shine-Dalgarno sequence and the TATA-box, this approach suggests the existence in H. pylori of a new, combined motif, TTAAGC, followed optimally 21 bp downstream by TATAAT. Between these two motifs, there is in some cases another, TTTTAA or, less frequently, a repetition of TTAAGC separated optimally from the TATA-box by 12 bp. The combined motif TTAAGCx(21+/-2)TATAAT is present with no errors immediately upstream from the only two copies of the ribosomal 23 S-5 S RNA genes in H. pylori, and with one error upstream from the only two copies of the ribosomal 16 S RNA genes. The operons of both ribosomal RNA molecules are strongly expressed, representing an encouraging sign of the pertinence of the motifs found by the algorithms. In 25 cases out of a possible 30, the combined motif is found with no more than three substitutions immediately upstream from ribosomal proteins, or

  17. Morphological homoplasy, life history evolution, and historical biogeography of plethodontid salamanders inferred from complete mitochondrial genomes

    PubMed Central

    Mueller, Rachel Lockridge; Macey, J. Robert; Jaekel, Martin; Wake, David B.; Boore, Jeffrey L.

    2004-01-01

    The evolutionary history of the largest salamander family (Plethodontidae) is characterized by extreme morphological homoplasy. Analysis of the mechanisms generating such homoplasy requires an independent molecular phylogeny. To this end, we sequenced 24 complete mitochondrial genomes (22 plethodontids and two outgroup taxa), added data for three species from GenBank, and performed partitioned and unpartitioned Bayesian, maximum likelihood, and maximum parsimony phylogenetic analyses. We explored four dataset partitioning strategies to account for evolutionary process heterogeneity among genes and codon positions, all of which yielded increased model likelihoods and decreased numbers of supported nodes in the topologies (Bayesian posterior probability >0.95) relative to the unpartitioned analysis. Our phylogenetic analyses yielded congruent trees that contrast with the traditional morphology-based taxonomy; the monophyly of three of four major groups is rejected. Reanalysis of current hypotheses in light of these evolutionary relationships suggests that (i) a larval life history stage reevolved from a direct-developing ancestor multiple times; (ii) there is no phylogenetic support for the “Out of Appalachia” hypothesis of plethodontid origins; and (iii) novel scenarios must be reconstructed for the convergent evolution of projectile tongues, reduction in toe number, and specialization for defensive tail loss. Some of these scenarios imply morphological transformation series that proceed in the opposite direction than was previously thought. In addition, they suggest surprising evolutionary lability in traits previously interpreted to be conservative. PMID:15365171

  18. Morphological homoplasy, life history evolution, and historical biogeography of plethodontid salamanders inferred from complete mitochondrial genomes

    SciTech Connect

    Mueller, Rachel Lockridge; Macey, J. Robert; Jaekel, Martin; Wake, David B.; Boore, Jeffrey L.

    2004-08-01

    The evolutionary history of the largest salamander family (Plethodontidae) is characterized by extreme morphological homoplasy. Analysis of the mechanisms generating such homoplasy requires an independent, molecular phylogeny. To this end, we sequenced 24 complete mitochondrial genomes (22 plethodontids and two outgroup taxa), added data for three species from GenBank, and performed partitioned and unpartitioned Bayesian, ML, and MP phylogenetic analyses. We explored four dataset partitioning strategies to account for evolutionary process heterogeneity among genes and codon positions, all of which yielded increased model likelihoods and decreased numbers of supported nodes in the topologies (PP > 0.95) relative to the unpartitioned analysis. Our phylogenetic analyses yielded congruent trees that contrast with the traditional morphology-based taxonomy; the monophyly of three out of four major groups is rejected. Reanalysis of current hypotheses in light of these new evolutionary relationships suggests that (1) a larval life history stage re-evolved from a direct-developing ancestor multiple times, (2) there is no phylogenetic support for the ''Out of Appalachia'' hypothesis of plethodontid origins, and (3) novel scenarios must be reconstructed for the convergent evolution of projectile tongues, reduction in toe number, and specialization for defensive tail loss. Some of these novel scenarios imply morphological transformation series that proceed in the opposite direction than was previously thought. In addition, they suggest surprising evolutionary lability in traits previously interpreted to be conservative.

  19. Inverse Bayesian inference as a key of consciousness featuring a macroscopic quantum logical structure.

    PubMed

    Gunji, Yukio-Pegio; Shinohara, Shuji; Haruna, Taichi; Basios, Vasileios

    2017-02-01

    To overcome the dualism between mind and matter and to implement consciousness in science, a physical entity has to be embedded with a measurement process. Although quantum mechanics have been regarded as a candidate for implementing consciousness, nature at its macroscopic level is inconsistent with quantum mechanics. We propose a measurement-oriented inference system comprising Bayesian and inverse Bayesian inferences. While Bayesian inference contracts probability space, the newly defined inverse one relaxes the space. These two inferences allow an agent to make a decision corresponding to an immediate change in their environment. They generate a particular pattern of joint probability for data and hypotheses, comprising multiple diagonal and noisy matrices. This is expressed as a nondistributive orthomodular lattice equivalent to quantum logic. We also show that an orthomodular lattice can reveal information generated by inverse syllogism as well as the solutions to the frame and symbol-grounding problems. Our model is the first to connect macroscopic cognitive processes with the mathematical structure of quantum mechanics with no additional assumptions.

  20. Gorilla genome structural variation reveals evolutionary parallelisms with chimpanzee.

    PubMed

    Ventura, Mario; Catacchio, Claudia R; Alkan, Can; Marques-Bonet, Tomas; Sajjadian, Saba; Graves, Tina A; Hormozdiari, Fereydoun; Navarro, Arcadi; Malig, Maika; Baker, Carl; Lee, Choli; Turner, Emily H; Chen, Lin; Kidd, Jeffrey M; Archidiacono, Nicoletta; Shendure, Jay; Wilson, Richard K; Eichler, Evan E

    2011-10-01

    Structural variation has played an important role in the evolutionary restructuring of human and great ape genomes. Recent analyses have suggested that the genomes of chimpanzee and human have been particularly enriched for this form of genetic variation. Here, we set out to assess the extent of structural variation in the gorilla lineage by generating 10-fold genomic sequence coverage from a western lowland gorilla and integrating these data into a physical and cytogenetic framework of structural variation. We discovered and validated over 7665 structural changes within the gorilla lineage, including sequence resolution of inversions, deletions, duplications, and mobile element insertions. A comparison with human and other ape genomes shows that the gorilla genome has been subjected to the highest rate of segmental duplication. We show that both the gorilla and chimpanzee genomes have experienced independent yet convergent patterns of structural mutation that have not occurred in humans, including the formation of subtelomeric heterochromatic caps, the hyperexpansion of segmental duplications, and bursts of retroviral integrations. Our analysis suggests that the chimpanzee and gorilla genomes are structurally more derived than either orangutan or human genomes.

  1. Gorilla genome structural variation reveals evolutionary parallelisms with chimpanzee

    PubMed Central

    Ventura, Mario; Catacchio, Claudia R.; Alkan, Can; Marques-Bonet, Tomas; Sajjadian, Saba; Graves, Tina A.; Hormozdiari, Fereydoun; Navarro, Arcadi; Malig, Maika; Baker, Carl; Lee, Choli; Turner, Emily H.; Chen, Lin; Kidd, Jeffrey M.; Archidiacono, Nicoletta; Shendure, Jay; Wilson, Richard K.; Eichler, Evan E.

    2011-01-01

    Structural variation has played an important role in the evolutionary restructuring of human and great ape genomes. Recent analyses have suggested that the genomes of chimpanzee and human have been particularly enriched for this form of genetic variation. Here, we set out to assess the extent of structural variation in the gorilla lineage by generating 10-fold genomic sequence coverage from a western lowland gorilla and integrating these data into a physical and cytogenetic framework of structural variation. We discovered and validated over 7665 structural changes within the gorilla lineage, including sequence resolution of inversions, deletions, duplications, and mobile element insertions. A comparison with human and other ape genomes shows that the gorilla genome has been subjected to the highest rate of segmental duplication. We show that both the gorilla and chimpanzee genomes have experienced independent yet convergent patterns of structural mutation that have not occurred in humans, including the formation of subtelomeric heterochromatic caps, the hyperexpansion of segmental duplications, and bursts of retroviral integrations. Our analysis suggests that the chimpanzee and gorilla genomes are structurally more derived than either orangutan or human genomes. PMID:21685127

  2. PyClone: statistical inference of clonal population structure in cancer.

    PubMed

    Roth, Andrew; Khattra, Jaswinder; Yap, Damian; Wan, Adrian; Laks, Emma; Biele, Justina; Ha, Gavin; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P

    2014-04-01

    We introduce PyClone, a statistical model for inference of clonal population structures in cancers. PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy-number changes and normal-cell contamination. Single-cell sequencing validation demonstrates PyClone's accuracy.

  3. Phylogeny and genetic history of the Siberian salamander (Salamandrella keyserlingii, Dybowski, 1870) inferred from complete mitochondrial genomes.

    PubMed

    Malyarchuk, Boris; Derenko, Miroslava; Denisova, Galina

    2013-05-01

    We assessed phylogeny of the Siberian salamander (Salamandrella keyserlingii, Dybowski, 1870), the most northern ectothermic, terrestrial vertebrate in Eurasia, by sequence analysis of complete mitochondrial genomes in 26 specimens from different localities (China, Khabarovsk region, Sakhalin, Yakutia, Magadan region, Chukotka, Kamchatka, Ural, European part of Russia). In addition, a complete mitochondrial genome of the Schrenck salamander, Salamandrella schrenckii, was determined for the first time. Bayesian phylogenetic analysis of the entire mtDNA genomes of S. keyserlingii demonstrates that two haplotype clades, AB and C, radiated about 1.4 million years ago (Mya). Bayesian skyline plots of population size change through time show an expansion around 250 thousand years ago (kya) and then a decline around the Last Glacial Maximum (25 kya) with subsequent restoration of population size. Climatic changes during the Quaternary period have dramatically affected the population genetic structure of the Siberian salamanders. In addition, complete mtDNA sequence analysis allowed us to recognize that the vast area of Northern Eurasia was colonized only by the Siberian salamander clade C1b during the last 150 kya. Meanwhile, we were unable to find evidence of molecular adaptation in this clade by analyzing the whole mitochondrial genomes of the Siberian salamanders.

  4. Inference and Analysis of Population Structure Using Genetic Data and Network Theory.

    PubMed

    Greenbaum, Gili; Templeton, Alan R; Bar-David, Shirli

    2016-04-01

    Clustering individuals to subpopulations based on genetic data has become commonplace in many genetic studies. Inference about population structure is most often done by applying model-based approaches, aided by visualization using distance-based approaches such as multidimensional scaling. While existing distance-based approaches suffer from a lack of statistical rigor, model-based approaches entail assumptions of prior conditions such as that the subpopulations are at Hardy-Weinberg equilibria. Here we present a distance-based approach for inference about population structure using genetic data by defining population structure using network theory terminology and methods. A network is constructed from a pairwise genetic-similarity matrix of all sampled individuals. The community partition, a partition of a network to dense subgraphs, is equated with population structure, a partition of the population to genetically related groups. Community-detection algorithms are used to partition the network into communities, interpreted as a partition of the population to subpopulations. The statistical significance of the structure can be estimated by using permutation tests to evaluate the significance of the partition's modularity, a network theory measure indicating the quality of community partitions. To further characterize population structure, a new measure of the strength of association (SA) for an individual to its assigned community is presented. The strength of association distribution (SAD) of the communities is analyzed to provide additional population structure characteristics, such as the relative amount of gene flow experienced by the different subpopulations and identification of hybrid individuals. Human genetic data and simulations are used to demonstrate the applicability of the analyses. The approach presented here provides a novel, computationally efficient model-free method for inference about population structure that does not entail assumption of

  5. Phylogenetic Diversity of the Enteric Pathogen Salmonella enterica subsp. enterica Inferred from Genome-Wide Reference-Free SNP Characters

    PubMed Central

    Timme, Ruth E.; Pettengill, James B.; Allard, Marc W.; Strain, Errol; Barrangou, Rodolphe; Wehnes, Chris; Van Kessel, JoAnn S.; Karns, Jeffrey S.; Musser, Steven M.; Brown, Eric W.

    2013-01-01

    The enteric pathogen Salmonella enterica is one of the leading causes of foodborne illness in the world. The species is extremely diverse, containing more than 2,500 named serovars that are designated for their unique antigen characters and pathogenicity profiles—some are known to be virulent pathogens, while others are not. Questions regarding the evolution of pathogenicity, significance of antigen characters, diversity of clustered regularly interspaced short palindromic repeat (CRISPR) loci, among others, will remain elusive until a strong evolutionary framework is established. We present the first large-scale S. enterica subsp. enterica phylogeny inferred from a new reference-free k-mer approach of gathering single nucleotide polymorphisms (SNPs) from whole genomes. The phylogeny of 156 isolates representing 78 serovars (102 were newly sequenced) reveals two major lineages, each with many strongly supported sublineages. One of these lineages is the S. Typhi group; well nested within the phylogeny. Lineage-through-time analyses suggest there have been two instances of accelerated rates of diversification within the subspecies. We also found that antigen characters and CRISPR loci reveal different evolutionary patterns than that of the phylogeny, suggesting that a horizontal gene transfer or possibly a shared environmental acquisition might have influenced the present character distribution. Our study also shows the ability to extract reference-free SNPs from a large set of genomes and then to use these SNPs for phylogenetic reconstruction. This automated, annotation-free approach is an important step forward for bacterial disease tracking and in efficiently elucidating the evolutionary history of highly clonal organisms. PMID:24158624

  6. Phylogeny and biogeography of the family Salamandridae (Amphibia: Caudata) inferred from complete mitochondrial genomes.

    PubMed

    Zhang, Peng; Papenfuss, Theodore J; Wake, Marvalee H; Qu, Lianghu; Wake, David B

    2008-11-01

    Phylogenetic relationships of members of the salamander family Salamandridae were examined using complete mitochondrial genomes collected from 42 species representing all 20 salamandrid genera and five outgroup taxa. Weighted maximum parsimony, partitioned maximum likelihood, and partitioned Bayesian approaches all produce an identical, well-resolved phylogeny; most branches are strongly supported with greater than 90% bootstrap values and 1.0 Bayesian posterior probabilities. Our results support recent taxonomic changes in finding the traditional genera Mertensiella, Euproctus, and Triturus to be non-monophyletic species assemblages. We successfully resolved the current polytomy at the base of the salamandrid tree: the Italian newt genus Salamandrina is sister to all remaining salamandrids. Beyond Salamandrina, a clade comprising all remaining newts is separated from a clade containing the true salamanders. Among these newts, the branching orders of well-supported clades are: primitive newts (Echinotriton, Pleurodeles, and Tylototriton), New World newts (Notophthalmus-Taricha), Corsica-Sardinia newts (Euproctus), and modern European newts (Calotriton, Lissotriton, Mesotriton, Neurergus, Ommatotriton, and Triturus) plus modern Asian newts (Cynops, Pachytriton, and Paramesotriton).Two alternative sets of calibration points and two Bayesian dating methods (BEAST and MultiDivTime) were used to estimate timescales for salamandrid evolution. The estimation difference by dating methods is slight and we propose two sets of timescales based on different calibration choices. The two timescales suggest that the initial diversification of extant salamandrids took place in Europe about 97 or 69Ma. North American salamandrids were derived from their European ancestors by dispersal through North Atlantic Land Bridges in the Late Cretaceous ( approximately 69Ma) or Middle Eocene ( approximately 43Ma). Ancestors of Asian salamandrids most probably dispersed to the eastern Asia

  7. Minimum message length inference of secondary structure from protein coordinate data

    PubMed Central

    Konagurthu, Arun S.; Lesk, Arthur M.; Allison, Lloyd

    2012-01-01

    Motivation: Secondary structure underpins the folding pattern and architecture of most proteins. Accurate assignment of the secondary structure elements is therefore an important problem. Although many approximate solutions of the secondary structure assignment problem exist, the statement of the problem has resisted a consistent and mathematically rigorous definition. A variety of comparative studies have highlighted major disagreements in the way the available methods define and assign secondary structure to coordinate data. Results: We report a new method to infer secondary structure based on the Bayesian method of minimum message length inference. It treats assignments of secondary structure as hypotheses that explain the given coordinate data. The method seeks to maximize the joint probability of a hypothesis and the data. There is a natural null hypothesis and any assignment that cannot better it is unacceptable. We developed a program SST based on this approach and compared it with popular programs, such as DSSP and STRIDE among others. Our evaluation suggests that SST gives reliable assignments even on low-resolution structures. Availability: http://www.csse.monash.edu.au/~karun/sst Contact: arun.konagurthu@monash.edu (or lloyd.allison@monash.edu) PMID:22689785

  8. SHIPS: Spectral Hierarchical clustering for the Inference of Population Structure in genetic studies.

    PubMed

    Bouaziz, Matthieu; Paccard, Caroline; Guedj, Mickael; Ambroise, Christophe

    2012-01-01

    Inferring the structure of populations has many applications for genetic research. In addition to providing information for evolutionary studies, it can be used to account for the bias induced by population stratification in association studies. To this end, many algorithms have been proposed to cluster individuals into genetically homogeneous sub-populations. The parametric algorithms, such as Structure, are very popular but their underlying complexity and their high computational cost led to the development of faster parametric alternatives such as Admixture. Alternatives to these methods are the non-parametric approaches. Among this category, AWclust has proven efficient but fails to properly identify population structure for complex datasets. We present in this article a new clustering algorithm called Spectral Hierarchical clustering for the Inference of Population Structure (SHIPS), based on a divisive hierarchical clustering strategy, allowing a progressive investigation of population structure. This method takes genetic data as input to cluster individuals into homogeneous sub-populations and with the use of the gap statistic estimates the optimal number of such sub-populations. SHIPS was applied to a set of simulated discrete and admixed datasets and to real SNP datasets, that are data from the HapMap and Pan-Asian SNP consortium. The programs Structure, Admixture, AWclust and PCAclust were also investigated in a comparison study. SHIPS and the parametric approach Structure were the most accurate when applied to simulated datasets both in terms of individual assignments and estimation of the correct number of clusters. The analysis of the results on the real datasets highlighted that the clusterings of SHIPS were the more consistent with the population labels or those produced by the Admixture program. The performances of SHIPS when applied to SNP data, along with its relatively low computational cost and its ease of use make this method a promising

  9. Multiple genome alignment for identifying the core structure among moderately related microbial genomes

    PubMed Central

    Uchiyama, Ikuo

    2008-01-01

    Background Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult when more distantly related genomes are compared. Here, we consider the core structure as a set of sufficiently long segments in which gene orders are conserved so that they are likely to have been inherited mainly through vertical transfer, and developed a method for identifying the core structure by finding the order of pre-identified orthologous groups (OGs) that maximally retains the conserved gene orders. Results The method was applied to genome comparisons of two well-characterized families, Bacillaceae and Enterobacteriaceae, and identified their core structures comprising 1438 and 2125 OGs, respectively. The core sets contained most of the essential genes and their related genes, which were primarily included in the intersection of the two core sets comprising around 700 OGs. The definition of the genomic core based on gene order conservation was demonstrated to be more robust than the simpler approach based only on gene conservation. We also investigated the core structures in terms of G+C content homogeneity and phylogenetic congruence, and found that the core genes primarily exhibited the expected characteristic, i.e., being indigenous and sharing the same history, more than the non-core genes. Conclusion The results demonstrate that our strategy of genome alignment based on gene order conservation can provide an effective approach to identify the genomic core among moderately related microbial genomes. PMID:18976470

  10. Evolution of genomic structural variation and genomic architecture in the adaptive radiations of African cichlid fishes

    PubMed Central

    Fan, Shaohua; Meyer, Axel

    2014-01-01

    African cichlid fishes are an ideal system for studying explosive rates of speciation and the origin of diversity in adaptive radiation. Within the last few million years, more than 2000 species have evolved in the Great Lakes of East Africa, the largest adaptive radiation in vertebrates. These young species show spectacular diversity in their coloration, morphology and behavior. However, little is known about the genomic basis of this astonishing diversity. Recently, five African cichlid genomes were sequenced, including that of the Nile Tilapia (Oreochromis niloticus), a basal and only relatively moderately diversified lineage, and the genomes of four representative endemic species of the adaptive radiations, Neolamprologus brichardi, Astatotilapia burtoni, Metriaclima zebra, and Pundamila nyererei. Using the Tilapia genome as a reference genome, we generated a high-resolution genomic variation map, consisting of single nucleotide polymorphisms (SNPs), short insertions and deletions (indels), inversions and deletions. In total, around 18.8, 17.7, 17.0, and 17.0 million SNPs, 2.3, 2.2, 1.4, and 1.9 million indels, 262, 306, 162, and 154 inversions, and 3509, 2705, 2710, and 2634 deletions were inferred to have evolved in N. brichardi, A. burtoni, P. nyererei, and M. zebra, respectively. Many of these variations affected the annotated gene regions in the genome. Different patterns of genetic variation were detected during the adaptive radiation of African cichlid fishes. For SNPs, the highest rate of evolution was detected in the common ancestor of N. brichardi, A. burtoni, P. nyererei, and M. zebra. However, for the evolution of inversions and deletions, we found that the rates at the terminal taxa are substantially higher than the rates at the ancestral lineages. The high-resolution map provides an ideal opportunity to understand the genomic bases of the adaptive radiation of African cichlid fishes. PMID:24917883

  11. Phylogenetic relationships and divergence dates of softshell turtles (Testudines: Trionychidae) inferred from complete mitochondrial genomes.

    PubMed

    Li, Haifeng; Liu, Juanjuan; Xiong, Lei; Zhang, Huanhuan; Zhou, Huaxing; Yin, Huazong; Jing, Wanxing; Li, Jun; Shi, Qiong; Wang, Yuqin; Liu, Jianjun; Nie, Liuwang

    2017-03-15

    The softshell turtles (Trionychidae) are one of the most widely distributed reptile groups in the world, and fossils have been found on all continents except Antarctica. The phylogenetic relationships among members of this group have been previously studied; however, there are disagreements regarding its taxonomy, its phylogeography and divergence times are still poorly understood as well. Here we present a comprehensive mitogenomic study of softshell turtles. We sequenced the complete mitochondrial genomes of 10 softshell turtles, in addition to the GenBank sequence of Dogania subplana, Lissemys punctata, Trionyx triunguis, which cover all extant genera within Trionychidae except for Cyclanorbis and Cycloderma. These data were combined with other mitogenomes of turtles for phylogenetic analyses. Divergence time-calibration and ancestral reconstruction were calculated using BEAST and RASP software, respectively. Our phylogenetic analyses indicate that Trionychidae is the sister taxon of Carettochelyidae, and support the monophyly of Trionychinae and Cyclanorbinae, which is consistent with morphological data and molecular analysis. Our phylogenetic analyses have established a sister taxon relationship between the Asian Rafetus and the Asian Palea + Pelodiscus + Dogania + Nilssonia + Amyda, whereas a previous study grouped the Asian Rafetus with the American Apalone. The results of divergence time estimates and area ancestral reconstruction show that extant Trionychidae originated in Asia at around 108 million years ago (MA), and radiations mainly occurred during two warm periods, namely, Late Cretaceous-Early Eocene and Oligocene. By combining the estimateddivergence time and the reconstructed ancestral area of softshell turtles, we determined that the dispersal of softshell turtles out of Asia may have taken three routes. Furthermore, the times of dispersal seem to be in agreement with the time of the India-Asia collision and opening of the Bering Strait, which

  12. Higher-level salamander relationships and divergence dates inferred from complete mitochondrial genomes.

    PubMed

    Zhang, Peng; Wake, David B

    2009-11-01

    Phylogenetic relationships among the salamander families have been difficult to resolve, largely because the window of time in which major lineages diverged was very short relative to the subsequently long evolutionary history of each family. We present seven new complete mitochondrial genomes representing five salamander families that have no or few mitogenome records in GenBank in order to assess the phylogenetic relationships of all salamander families from a mitogenomic perspective. Phylogenetic analyses of two data sets-one combining the entire mitogenome sequence except for the D-loop, and the other combining the deduced amino acid sequences of all 13 mitochondrial protein-coding genes-produce nearly identical well-resolved topologies. The monophyly of each family is supported, including the controversial Proteidae. The internally fertilizing salamanders are demonstrated to be a clade, concordant with recent results using nuclear genes. The internally fertilizing salamanders include two well-supported clades: one is composed of Ambystomatidae, Dicamptodontidae, and Salamandridae, the other Proteidae, Rhyacotritonidae, Amphiumidae, and Plethodontidae. In contrast to results from nuclear loci, our results support the conventional morphological hypothesis that Sirenidae is the sister-group to all other salamanders and they statistically reject the hypothesis from nuclear genes that the suborder Cryptobranchoidea (Cryptobranchidae+Hynobiidae) branched earlier than the Sirenidae. Using recently recommended fossil calibration points and a "soft bound" calibration strategy, we recalculated evolutionary timescales for tetrapods with an emphasis on living salamanders, under a Bayesian framework with and without a rate-autocorrelation assumption. Our dating results indicate: (i) the widely used rate-autocorrelation assumption in relaxed clock analyses is problematic and the accuracy of molecular dating for early lissamphibian evolution is questionable; (ii) the initial

  13. A new statistic and its power to infer membership in a genome-wide association study using genotype frequencies.

    PubMed

    Jacobs, Kevin B; Yeager, Meredith; Wacholder, Sholom; Craig, David; Kraft, Peter; Hunter, David J; Paschal, Justin; Manolio, Teri A; Tucker, Margaret; Hoover, Robert N; Thomas, Gilles D; Chanock, Stephen J; Chatterjee, Nilanjan

    2009-11-01

    Aggregate results from genome-wide association studies (GWAS), such as genotype frequencies for cases and controls, were until recently often made available on public websites because they were thought to disclose negligible information concerning an individual's participation in a study. Homer et al. recently suggested that a method for forensic detection of an individual's contribution to an admixed DNA sample could be applied to aggregate GWAS data. Using a likelihood-based statistical framework, we developed an improved statistic that uses genotype frequencies and individual genotypes to infer whether a specific individual or any close relatives participated in the GWAS and, if so, what the participant's phenotype status is. Our statistic compares the logarithm of genotype frequencies, in contrast to that of Homer et al., which is based on differences in either SNP probe intensity or allele frequencies. We derive the theoretical power of our test statistics and explore the empirical performance in scenarios with varying numbers of randomly chosen or top-associated SNPs.

  14. Graphic analysis of population structure on genome-wide rheumatoid arthritis data.

    PubMed

    Zhang, Jun; Weng, Chunhua; Niyogi, Partha

    2009-12-15

    Principal-component analysis (PCA) has been used for decades to summarize the human genetic variation across geographic regions and to infer population migration history. Reduction of spurious associations due to population structure is crucial for the success of disease association studies. Recently, PCA has also become a popular method for detecting population structure and correction of population stratification in disease association studies. Inspired by manifold learning, we propose a novel method based on spectral graph theory. Regarding each study subject as a node with suitably defined weights for its edges to close neighbors, one can form a weighted graph. We suggest using the spectrum of the associated graph Laplacian operator, namely, Laplacian eigenfunctions, to infer population structures instead of principal components (PCs). For the whole genome-wide association data for the North American Rheumatoid Arthritis Consortium (NARAC) provided by Genetic Workshop Analysis 16, Laplacian eigenfunctions revealed more meaningful structures of the underlying population than PCA. The proposed method has connection to PCA, and it naturally includes PCA as a special case. Our simple method is computationally fast and is suitable for disease studies at the genome-wide scale.

  15. Inference of hazel grouse population structure using multilocus data: a landscape genetic approach.

    PubMed

    Sahlsten, J; Thörngren, H; Höglund, J

    2008-12-01

    In conservation and management of species it is important to make inferences about gene flow, dispersal and population structure. In this study, we used 613 georeferenced tissue samples from hazel grouse (Bonasa bonasia) where each individual was genotyped at 12 microsatellite loci to make inference on population genetic structure, gene flow and dispersal in northern Sweden. Observed levels of genetic diversity suggest that Swedish hazel grouse do not suffer loss of genetic diversity compared with other grouse species. We found significant F(IS) (deviation from Hardy-Weinberg expectations) over the entire sample using jack-knifed estimators over loci, which is most likely explained by a Wahlund effect. With the use of spatial autocorrelation methods, we detected significant isolation by distance among individuals. Neighbourhood size was estimated in the order of 62-158 individuals corresponding to a dispersal distance of 950-1500 m. Using a spatial statistical model for landscape genetics to infer the number of populations and the spatial location of genetic discontinuities between these populations we found indications that Swedish hazel grouse are divided into a northern and a southern population. We could not find a sharp border between these two populations and none of the observed borders appeared to coincide with any potential geographical barriers.These results imply that gene flow appears somewhat unrestricted in the boreal taiga forests of northern Sweden and that the two populations of hazel grouse in Sweden may be explained by the post-glacial reinvasion history of the Scandinavian Peninsula.

  16. Process-Driven Inference of Biological Network Structure: Feasibility, Minimality, and Multiplicity

    PubMed Central

    Wang, Guanyu; Rong, Yongwu; Chen, Hao; Pearson, Carl; Du, Chenghang; Simha, Rahul; Zeng, Chen

    2012-01-01

    A common problem in molecular biology is to use experimental data, such as microarray data, to infer knowledge about the structure of interactions between important molecules in subsystems of the cell. By approximating the state of each molecule as “on” or “off”, it becomes possible to simplify the problem, and exploit the tools of Boolean analysis for such inference. Amongst Boolean techniques, the process-driven approach has shown promise in being able to identify putative network structures, as well as stability and modularity properties. This paper examines the process-driven approach more formally, and makes four contributions about the computational complexity of the inference problem, under the “dominant inhibition” assumption of molecular interactions. The first is a proof that the feasibility problem (does there exist a network that explains the data?) can be solved in polynomial-time. Second, the minimality problem (what is the smallest network that explains the data?) is shown to be NP-hard, and therefore unlikely to result in a polynomial-time algorithm. Third, a simple polynomial-time heuristic is shown to produce near-minimal solutions, as demonstrated by simulation. Fourth, the theoretical framework explains how multiplicity (the number of network solutions to realize a given biological process), which can take exponential-time to compute, can instead be accurately estimated by a fast, polynomial-time heuristic. PMID:22815739

  17. Bayesian inference of the initial conditions from large-scale structure surveys

    NASA Astrophysics Data System (ADS)

    Leclercq, Florent

    2016-10-01

    Analysis of three-dimensional cosmological surveys has the potential to answer outstanding questions on the initial conditions from which structure appeared, and therefore on the very high energy physics at play in the early Universe. We report on recently proposed statistical data analysis methods designed to study the primordial large-scale structure via physical inference of the initial conditions in a fully Bayesian framework, and applications to the Sloan Digital Sky Survey data release 7. We illustrate how this approach led to a detailed characterization of the dynamic cosmic web underlying the observed galaxy distribution, based on the tidal environment.

  18. Population structure of Atlantic mackerel inferred from RAD-seq-derived SNP markers: effects of sequence clustering parameters and hierarchical SNP selection.

    PubMed

    Rodríguez-Ezpeleta, Naiara; Bradbury, Ian R; Mendibil, Iñaki; Álvarez, Paula; Cotano, Unai; Irigoien, Xabier

    2016-07-01

    Restriction-site-associated DNA sequencing (RAD-seq) and related methods are revolutionizing the field of population genomics in nonmodel organisms as they allow generating an unprecedented number of single nucleotide polymorphisms (SNPs) even when no genomic information is available. Yet, RAD-seq data analyses rely on assumptions on nature and number of nucleotide variants present in a single locus, the choice of which may lead to an under- or overestimated number of SNPs and/or to incorrectly called genotypes. Using the Atlantic mackerel (Scomber scombrus L.) and a close relative, the Atlantic chub mackerel (Scomber colias), as case study, here we explore the sensitivity of population structure inferences to two crucial aspects in RAD-seq data analysis: the maximum number of mismatches allowed to merge reads into a locus and the relatedness of the individuals used for genotype calling and SNP selection. Our study resolves the population structure of the Atlantic mackerel, but, most importantly, provides insights into the effects of alternative RAD-seq data analysis strategies on population structure inferences that are directly applicable to other species.

  19. Structural Genomics of Minimal Organisms: Pipeline and Results

    SciTech Connect

    Kim, Sung-Hou; Shin, Dong-Hae; Kim, Rosalind; Adams, Paul; Chandonia, John-Marc

    2007-09-14

    The initial objective of the Berkeley Structural Genomics Center was to obtain a near complete three-dimensional (3D) structural information of all soluble proteins of two minimal organisms, closely related pathogens Mycoplasma genitalium and M. pneumoniae. The former has fewer than 500 genes and the latter has fewer than 700 genes. A semiautomated structural genomics pipeline was set up from target selection, cloning, expression, purification, and ultimately structural determination. At the time of this writing, structural information of more than 93percent of all soluble proteins of M. genitalium is avail able. This chapter summarizes the approaches taken by the authors' center.

  20. Mediation Analysis With Intermediate Confounding: Structural Equation Modeling Viewed Through the Causal Inference Lens

    PubMed Central

    De Stavola, Bianca L.; Daniel, Rhian M.; Ploubidis, George B.; Micali, Nadia

    2015-01-01

    The study of mediation has a long tradition in the social sciences and a relatively more recent one in epidemiology. The first school is linked to path analysis and structural equation models (SEMs), while the second is related mostly to methods developed within the potential outcomes approach to causal inference. By giving model-free definitions of direct and indirect effects and clear assumptions for their identification, the latter school has formalized notions intuitively developed in the former and has greatly increased the flexibility of the models involved. However, through its predominant focus on nonparametric identification, the causal inference approach to effect decomposition via natural effects is limited to settings that exclude intermediate confounders. Such confounders are naturally dealt with (albeit with the caveats of informality and modeling inflexibility) in the SEM framework. Therefore, it seems pertinent to revisit SEMs with intermediate confounders, armed with the formal definitions and (parametric) identification assumptions from causal inference. Here we investigate: 1) how identification assumptions affect the specification of SEMs, 2) whether the more restrictive SEM assumptions can be relaxed, and 3) whether existing sensitivity analyses can be extended to this setting. Data from the Avon Longitudinal Study of Parents and Children (1990–2005) are used for illustration. PMID:25504026

  1. Mediation analysis with intermediate confounding: structural equation modeling viewed through the causal inference lens.

    PubMed

    De Stavola, Bianca L; Daniel, Rhian M; Ploubidis, George B; Micali, Nadia

    2015-01-01

    The study of mediation has a long tradition in the social sciences and a relatively more recent one in epidemiology. The first school is linked to path analysis and structural equation models (SEMs), while the second is related mostly to methods developed within the potential outcomes approach to causal inference. By giving model-free definitions of direct and indirect effects and clear assumptions for their identification, the latter school has formalized notions intuitively developed in the former and has greatly increased the flexibility of the models involved. However, through its predominant focus on nonparametric identification, the causal inference approach to effect decomposition via natural effects is limited to settings that exclude intermediate confounders. Such confounders are naturally dealt with (albeit with the caveats of informality and modeling inflexibility) in the SEM framework. Therefore, it seems pertinent to revisit SEMs with intermediate confounders, armed with the formal definitions and (parametric) identification assumptions from causal inference. Here we investigate: 1) how identification assumptions affect the specification of SEMs, 2) whether the more restrictive SEM assumptions can be relaxed, and 3) whether existing sensitivity analyses can be extended to this setting. Data from the Avon Longitudinal Study of Parents and Children (1990-2005) are used for illustration.

  2. Structural mapping in statistical word problems: A relational reasoning approach to Bayesian inference.

    PubMed

    Johnson, Eric D; Tubau, Elisabet

    2016-09-27

    Presenting natural frequencies facilitates Bayesian inferences relative to using percentages. Nevertheless, many people, including highly educated and skilled reasoners, still fail to provide Bayesian responses to these computationally simple problems. We show that the complexity of relational reasoning (e.g., the structural mapping between the presented and requested relations) can help explain the remaining difficulties. With a non-Bayesian inference that required identical arithmetic but afforded a more direct structural mapping, performance was universally high. Furthermore, reducing the relational demands of the task through questions that directed reasoners to use the presented statistics, as compared with questions that prompted the representation of a second, similar sample, also significantly improved reasoning. Distinct error patterns were also observed between these presented- and similar-sample scenarios, which suggested differences in relational-reasoning strategies. On the other hand, while higher numeracy was associated with better Bayesian reasoning, higher-numerate reasoners were not immune to the relational complexity of the task. Together, these findings validate the relational-reasoning view of Bayesian problem solving and highlight the importance of considering not only the presented task structure, but also the complexity of the structural alignment between the presented and requested relations.

  3. 3D genome structure modeling by Lorentzian objective function.

    PubMed

    Trieu, Tuan; Cheng, Jianlin

    2016-11-29

    The 3D structure of the genome plays a vital role in biological processes such as gene interaction, gene regulation, DNA replication and genome methylation. Advanced chromosomal conformation capture techniques, such as Hi-C and tethered conformation capture, can generate chromosomal contact data that can be used to computationally reconstruct 3D structures of the genome. We developed a novel restraint-based method that is capable of reconstructing 3D genome structures utilizing both intra-and inter-chromosomal contact data. Our method was robust to noise and performed well in comparison with a panel of existing methods on a controlled simulated data set. On a real Hi-C data set of the human genome, our method produced chromosome and genome structures that are consistent with 3D FISH data and known knowledge about the human chromosome and genome, such as, chromosome territories and the cluster of small chromosomes in the nucleus center with the exception of the chromosome 18. The tool and experimental data are available at https://missouri.box.com/v/LorDG.

  4. Phylogeny and biogeography of highly diverged freshwater fish species (Leuciscinae, Cyprinidae, Teleostei) inferred from mitochondrial genome analysis.

    PubMed

    Imoto, Junichi M; Saitoh, Kenji; Sasaki, Takeshi; Yonezawa, Takahiro; Adachi, Jun; Kartavtsev, Yuri P; Miya, Masaki; Nishida, Mutsumi; Hanzawa, Naoto

    2013-02-10

    The distribution of freshwater taxa is a good biogeographic model to study pattern and process of vicariance and dispersal. The subfamily Leuciscinae (Cyprinidae, Teleostei) consists of many species distributed widely in Eurasia and North America. Leuciscinae have been divided into two phyletic groups, leuciscin and phoxinin. The phylogenetic relationships between major clades within the subfamily are poorly understood, largely because of the overwhelming diversity of the group. The origin of the Far Eastern phoxinin is an interesting question regarding the evolutionary history of Leuciscinae. Here we present phylogenetic analysis of 31 species of Leuciscinae and outgroups based on complete mitochondrial genome sequences to clarify the phylogenetic relationships and to infer the evolutionary history of the subfamily. Phylogenetic analysis suggests that the Far Eastern phoxinin species comprised the monophyletic clades Tribolodon, Pseudaspius, Oreoleuciscus and Far Eastern Phoxinus. The Far Eastern phoxinin clade was independent of other Leuciscinae lineages and was closer to North American phoxinins than European leuciscins. All of our analysis also suggested that leuciscins and phoxinins each constituted monophyletic groups. Divergence time estimation suggested that Leuciscinae species diverged from outgroups such as Tincinae to be 83.3 million years ago (Mya) in the Late Cretaceous and leuciscin and phoxinin shared a common ancestor 70.7 Mya. Radiation of Leuciscinae lineages occurred during the Late Cretaceous to Paleocene. This period also witnessed the radiation of tetrapods. Reconstruction of ancestral areas indicates Leuciscinae species originated within Europe. Leuciscin species evolved in Europe and the ancestor of phoxinin was distributed in North America. The Far Eastern phoxinins would have dispersed from North America to Far East across the Beringia land bridge. The present study suggests important roles for the continental rearrangements during the

  5. Inferring a District-Based Hierarchical Structure of Social Contacts from Census Data

    PubMed Central

    Yu, Zhiwen; Liu, Jiming; Zhu, Xianjun

    2015-01-01

    Researchers have recently paid attention to social contact patterns among individuals due to their useful applications in such areas as epidemic evaluation and control, public health decisions, chronic disease research and social network research. Although some studies have estimated social contact patterns from social networks and surveys, few have considered how to infer the hierarchical structure of social contacts directly from census data. In this paper, we focus on inferring an individual’s social contact patterns from detailed census data, and generate various types of social contact patterns such as hierarchical-district-structure-based, cross-district and age-district-based patterns. We evaluate newly generated contact patterns derived from detailed 2011 Hong Kong census data by incorporating them into a model and simulation of the 2009 Hong Kong H1N1 epidemic. We then compare the newly generated social contact patterns with the mixing patterns that are often used in the literature, and draw the following conclusions. First, the generation of social contact patterns based on a hierarchical district structure allows for simulations at different district levels. Second, the newly generated social contact patterns reflect individuals social contacts. Third, the newly generated social contact patterns improve the accuracy of the SEIR-based epidemic model. PMID:25679787

  6. PHAISTOS: a framework for Markov chain Monte Carlo simulation and inference of protein structure.

    PubMed

    Boomsma, Wouter; Frellsen, Jes; Harder, Tim; Bottaro, Sandro; Johansson, Kristoffer E; Tian, Pengfei; Stovgaard, Kasper; Andreetta, Christian; Olsson, Simon; Valentin, Jan B; Antonov, Lubomir D; Christensen, Anders S; Borg, Mikael; Jensen, Jan H; Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper; Hamelryck, Thomas

    2013-07-15

    We present a new software framework for Markov chain Monte Carlo sampling for simulation, prediction, and inference of protein structure. The software package contains implementations of recent advances in Monte Carlo methodology, such as efficient local updates and sampling from probabilistic models of local protein structure. These models form a probabilistic alternative to the widely used fragment and rotamer libraries. Combined with an easily extendible software architecture, this makes PHAISTOS well suited for Bayesian inference of protein structure from sequence and/or experimental data. Currently, two force-fields are available within the framework: PROFASI and OPLS-AA/L, the latter including the generalized Born surface area solvent model. A flexible command-line and configuration-file interface allows users quickly to set up simulations with the desired configuration. PHAISTOS is released under the GNU General Public License v3.0. Source code and documentation are freely available from http://phaistos.sourceforge.net. The software is implemented in C++ and has been tested on Linux and OSX platforms.

  7. Inferring a district-based hierarchical structure of social contacts from census data.

    PubMed

    Yu, Z; Liu, J; Zhu, X

    2015-01-01

    Researchers have recently paid attention to social contact patterns among individuals due to their useful applications in such areas as epidemic evaluation and control, public health decisions, chronic disease research and social network research. Although some studies have estimated social contact patterns from social networks and surveys, few have considered how to infer the hierarchical structure of social contacts directly from census data. In this paper, we focus on inferring an individual's social contact patterns from detailed census data, and generate various types of social contact patterns such as hierarchical-district-structure-based, cross-district and age-district-based patterns. We evaluate newly generated contact patterns derived from detailed 2011 Hong Kong census data by incorporating them into a model and simulation of the 2009 Hong Kong H1N1 epidemic. We then compare the newly generated social contact patterns with the mixing patterns that are often used in the literature, and draw the following conclusions. First, the generation of social contact patterns based on a hierarchical district structure allows for simulations at different district levels. Second, the newly generated social contact patterns reflect individuals social contacts. Third, the newly generated social contact patterns improve the accuracy of the SEIR-based epidemic model.

  8. Hebbian Wiring Plasticity Generates Efficient Network Structures for Robust Inference with Synaptic Weight Plasticity

    PubMed Central

    Hiratani, Naoki; Fukai, Tomoki

    2016-01-01

    In the adult mammalian cortex, a small fraction of spines are created and eliminated every day, and the resultant synaptic connection structure is highly nonrandom, even in local circuits. However, it remains unknown whether a particular synaptic connection structure is functionally advantageous in local circuits, and why creation and elimination of synaptic connections is necessary in addition to rich synaptic weight plasticity. To answer these questions, we studied an inference task model through theoretical and numerical analyses. We demonstrate that a robustly beneficial network structure naturally emerges by combining Hebbian-type synaptic weight plasticity and wiring plasticity. Especially in a sparsely connected network, wiring plasticity achieves reliable computation by enabling efficient information transmission. Furthermore, the proposed rule reproduces experimental observed correlation between spine dynamics and task performance. PMID:27303271

  9. The mutate-and-map protocol for inferring base pairs in structured RNA.

    PubMed

    Cordero, Pablo; Kladwang, Wipapat; VanLang, Christopher C; Das, Rhiju

    2014-01-01

    Chemical mapping is a widespread technique for structural analysis of nucleic acids in which a molecule's reactivity to different probes is quantified at single nucleotide resolution and used to constrain structural modeling. This experimental framework has been extensively revisited in the past decade with new strategies for high-throughput readouts, chemical modification, and rapid data analysis. Recently, we have coupled the technique to high-throughput mutagenesis. Point mutations of a base paired nucleotide can lead to exposure of not only that nucleotide but also its interaction partner. Systematically carrying out the mutation and mapping for the entire system gives an experimental approximation of the molecule's "contact map." Here, we give our in-house protocol for this "mutate-and-map" (M2) strategy, based on 96-well capillary electrophoresis, and we provide practical tips on interpreting the data to infer nucleic acid structure.

  10. The Mutate-and-Map Protocol for Inferring Base Pairs in Structured RNA

    PubMed Central

    VanLang, Christopher C.; Das, Rhiju

    2014-01-01

    Chemical mapping is a widespread technique for structural analysis of nucleic acids in which a molecule’s reactivity to different probes is quantified at single nucleotide resolution and used to constrain structural modeling. This experimental framework has been extensively revisited in the past decade with new strategies for high-throughput readouts, chemical modification, and rapid data analysis. Recently, we have coupled the technique to high-throughput mutagenesis. Point mutations of a base paired nucleotide can lead to exposure of not only that nucleotide but also its interaction partner. Systematically carrying out the mutation and mapping for the entire system gives an experimental approximation of the molecule’s “contact map.” Here, we give our in-house protocol for this “mutate-and-map” (M2) strategy, based on 96-well capillary electrophoresis, and we provide practical tips on interpreting the data to infer nucleic acid structure. PMID:24136598

  11. Effects of vegetation canopy structure on remotely sensed canopy temperatures. [inferring plant water stress and yield

    NASA Technical Reports Server (NTRS)

    Kimes, D. S.

    1979-01-01

    The effects of vegetation canopy structure on thermal infrared sensor response must be understood before vegetation surface temperatures of canopies with low percent ground cover can be accurately inferred. The response of a sensor is a function of vegetation geometric structure, the vertical surface temperature distribution of the canopy components, and sensor view angle. Large deviations between the nadir sensor effective radiant temperature (ERT) and vegetation ERT for a soybean canopy were observed throughout the growing season. The nadir sensor ERT of a soybean canopy with 35 percent ground cover deviated from the vegetation ERT by as much as 11 C during the mid-day. These deviations were quantitatively explained as a function of canopy structure and soil temperature. Remote sensing techniques which determine the vegetation canopy temperature(s) from the sensor response need to be studied.

  12. Inferring the Clonal Structure of Viral Populations from Time Series Sequencing

    PubMed Central

    Chedom, Donatien F.; Murcia, Pablo R.; Greenman, Chris D.

    2015-01-01

    RNA virus populations will undergo processes of mutation and selection resulting in a mixed population of viral particles. High throughput sequencing of a viral population subsequently contains a mixed signal of the underlying clones. We would like to identify the underlying evolutionary structures. We utilize two sources of information to attempt this; within segment linkage information, and mutation prevalence. We demonstrate that clone haplotypes, their prevalence, and maximum parsimony reticulate evolutionary structures can be identified, although the solutions may not be unique, even for complete sets of information. This is applied to a chain of influenza infection, where we infer evolutionary structures, including reassortment, and demonstrate some of the difficulties of interpretation that arise from deep sequencing due to artifacts such as template switching during PCR amplification. PMID:26571026

  13. Structural and Operational Complexity of the Geobacter Sulfurreducens Genome

    SciTech Connect

    Qiu, Yu; Cho, Byung-Kwan; Park, Young S.; Lovley, Derek R.; Palsson, Bernhard O.; Zengler, Karsten

    2010-06-30

    Prokaryotic genomes can be annotated based on their structural, operational, and functional properties. These annotations provide the pivotal scaffold for understanding cellular functions on a genome-scale, such as metabolism and transcriptional regulation. Here, we describe a systems approach to simultaneously determine the structural and operational annotation of the Geobacter sulfurreducens genome. Integration of proteomics, transcriptomics, RNA polymerase, and sigma factor-binding information with deep-sequencing-based analysis of primary 59-end transcripts allowed for a most precise annotation. The structural annotation is comprised of numerous previously undetected genes, noncoding RNAs, prevalent leaderless mRNA transcripts, and antisense transcripts. When compared with other prokaryotes, we found that the number of antisense transcripts reversely correlated with genome size. The operational annotation consists of 1453 operons, 22% of which have multiple transcription start sites that use different RNA polymerase holoenzymes. Several operons with multiple transcription start sites encoded genes with essential functions, giving insight into the regulatory complexity of the genome. The experimentally determined structural and operational annotations can be combined with functional annotation, yielding a new three-level annotation that greatly expands our understanding of prokaryotic genomes.

  14. Structural and operational complexity of the Geobacter sulfurreducens genome

    PubMed Central

    Qiu, Yu; Cho, Byung-Kwan; Park, Young Seoub; Lovley, Derek; Palsson, Bernhard Ø.; Zengler, Karsten

    2010-01-01

    Prokaryotic genomes can be annotated based on their structural, operational, and functional properties. These annotations provide the pivotal scaffold for understanding cellular functions on a genome-scale, such as metabolism and transcriptional regulation. Here, we describe a systems approach to simultaneously determine the structural and operational annotation of the Geobacter sulfurreducens genome. Integration of proteomics, transcriptomics, RNA polymerase, and sigma factor-binding information with deep-sequencing-based analysis of primary 5′-end transcripts allowed for a most precise annotation. The structural annotation is comprised of numerous previously undetected genes, noncoding RNAs, prevalent leaderless mRNA transcripts, and antisense transcripts. When compared with other prokaryotes, we found that the number of antisense transcripts reversely correlated with genome size. The operational annotation consists of 1453 operons, 22% of which have multiple transcription start sites that use different RNA polymerase holoenzymes. Several operons with multiple transcription start sites encoded genes with essential functions, giving insight into the regulatory complexity of the genome. The experimentally determined structural and operational annotations can be combined with functional annotation, yielding a new three-level annotation that greatly expands our understanding of prokaryotic genomes. PMID:20592237

  15. Genome-wide patterns of population structure and admixture in West Africans and African Americans.

    PubMed

    Bryc, Katarzyna; Auton, Adam; Nelson, Matthew R; Oksenberg, Jorge R; Hauser, Stephen L; Williams, Scott; Froment, Alain; Bodo, Jean-Marie; Wambebe, Charles; Tishkoff, Sarah A; Bustamante, Carlos D

    2010-01-12

    Quantifying patterns of population structure in Africans and African Americans illuminates the history of human populations and is critical for undertaking medical genomic studies on a global scale. To obtain a fine-scale genome-wide perspective of ancestry, we analyze Affymetrix GeneChip 500K genotype data from African Americans (n = 365) and individuals with ancestry from West Africa (n = 203 from 12 populations) and Europe (n = 400 from 42 countries). We find that population structure within the West African sample reflects primarily language and secondarily geographical distance, echoing the Bantu expansion. Among African Americans, analysis of genomic admixture by a principal component-based approach indicates that the median proportion of European ancestry is 18.5% (25th-75th percentiles: 11.6-27.7%), with very large variation among individuals. In the African-American sample as a whole, few autosomal regions showed exceptionally high or low mean African ancestry, but the X chromosome showed elevated levels of African ancestry, consistent with a sex-biased pattern of gene flow with an excess of European male and African female ancestry. We also find that genomic profiles of individual African Americans afford personalized ancestry reconstructions differentiating ancient vs. recent European and African ancestry. Finally, patterns of genetic similarity among inferred African segments of African-American genomes and genomes of contemporary African populations included in this study suggest African ancestry is most similar to non-Bantu Niger-Kordofanian-speaking populations, consistent with historical documents of the African Diaspora and trans-Atlantic slave trade.

  16. Statistical inference of seabed sound-speed structure in the Gulf of Oman Basin.

    PubMed

    Sagers, Jason D; Knobles, David P

    2014-06-01

    Addressed is the statistical inference of the sound-speed depth profile of a thick soft seabed from broadband sound propagation data recorded in the Gulf of Oman Basin in 1977. The acoustic data are in the form of time series signals recorded on a sparse vertical line array and generated by explosive sources deployed along a 280 km track. The acoustic data offer a unique opportunity to study a deep-water bottom-limited thickly sedimented environment because of the large number of time series measurements, very low seabed attenuation, and auxiliary measurements. A maximum entropy method is employed to obtain a conditional posterior probability distribution (PPD) for the sound-speed ratio and the near-surface sound-speed gradient. The multiple data samples allow for a determination of the average error constraint value required to uniquely specify the PPD for each data sample. Two complicating features of the statistical inference study are addressed: (1) the need to develop an error function that can both utilize the measured multipath arrival structure and mitigate the effects of data errors and (2) the effect of small bathymetric slopes on the structure of the bottom interacting arrivals.

  17. Fully Bayesian inference for structural MRI: application to segmentation and statistical analysis of T2-hypointensities.

    PubMed

    Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark

    2013-01-01

    Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.

  18. Structural Information Inference from Lanthanoid Complexing Systems: Photoluminescence Studies on Isolated Ions

    NASA Astrophysics Data System (ADS)

    Greisch, Jean Francois; Harding, Michael E.; Chmela, Jiri; Klopper, Willem M.; Schooss, Detlef; Kappes, Manfred M.

    2016-06-01

    The application of lanthanoid complexes ranges from photovoltaics and light-emitting diodes to quantum memories and biological assays. Rationalization of their design requires a thorough understanding of intramolecular processes such as energy transfer, charge transfer, and non-radiative decay involving their subunits. Characterization of the excited states of such complexes considerably benefits from mass spectrometric methods since the associated optical transitions and processes are strongly affected by stoichiometry, symmetry, and overall charge state. We report herein spectroscopic measurements on ensembles of ions trapped in the gas phase and soft-landed in neon matrices. Their interpretation is considerably facilitated by direct comparison with computations. The combination of energy- and time-resolved measurements on isolated species with density functional as well as ligand-field and Franck-Condon computations enables us to infer structural as well as dynamical information about the species studied. The approach is first illustrated for sets of model lanthanoid complexes whose structure and electronic properties are systematically varied via the substitution of one component (lanthanoid or alkali,alkali-earth ion): (i) systematic dependence of ligand-centered phosphorescence on the lanthanoid(III) promotion energy and its impact on sensitization, and (ii) structural changes induced by the substitution of alkali or alkali-earth ions in relation with structures inferred using ion mobility spectroscopy. The temperature dependence of sensitization is briefly discussed. The focus is then shifted to measurements involving europium complexes with doxycycline an antibiotic of the tetracycline family. Besides discussing the complexes' structural and electronic features, we report on their use to monitor enzymatic processes involving hydrogen peroxide or biologically relevant molecules such as adenosine triphosphate (ATP).

  19. New Inferences of Earth's Mantle Viscosity Structure and Implications for Long-wavelength Structure in the Lower Mantle

    NASA Astrophysics Data System (ADS)

    Rudolph, M. L.; Lekic, V.; Lithgow-Bertelloni, C. R.

    2015-12-01

    The viscosity structure of Earth's deep mantle affects the thermal evolution of Earth, the ascent of mantle plumes, settling of subducted oceanic lithosphere, and the mixing of compositional heterogeneities in the mantle. Modeling the long wavelength non-hydrostatic geoid provides a constraint on the radial viscosity structure of Earth's mantle. We carried out inversions for the radial mantle viscosity structure using a transdimensional, hierarchical Bayesian technique that allows us to obtain solutions without specifying at the outset the number or locations of viscosity changes within the mantle. We obtained a posterior probability distribution of mantle viscosity structures, which allowed us to assess our confidence in our inferences of the viscosity structure. We find robust evidence for an increase in viscosity at 800-1200 km depth, significantly deeper than the mineral phase transformations which define the mantle transition zone. The viscosity increase is coincident in depth with regions where tomographic models image slab stagnation, plume deflection, and changes in large-scale structure, manifested in the mantle radial correlation function for the lowest spherical harmonic degrees. Here, we present new results from 3D, spherical-shell geometry thermal and thermochemical mantle convection simulations with prescribed plate motions based on paleogeographic reconstructions. These simulations employ a range of admissible mantle viscosity structures from our geoid inversions. We find that by including the inferred increase in viscosity at 1000 km depth, we can better reproduce the long wavelength mantle radial correlation function observed in the latest tomographic models GAP-P4 and SEMUCB-WM1. The similarity of the modeled and observed radial correlation functions is sensitive to the choice of lower mantle viscosity and the inclusion of phase changes in the transition zone and the mid-mantle. We will also discuss the effect of these viscosity structures on

  20. Spectral entropy criteria for structural segmentation in genomic DNA sequences

    NASA Astrophysics Data System (ADS)

    Chechetkin, V. R.; Lobzin, V. V.

    2004-07-01

    The spectral entropy is calculated with Fourier structure factors and characterizes the level of structural ordering in a sequence of symbols. It may efficiently be applied to the assessment and reconstruction of the modular structure in genomic DNA sequences. We present the relevant spectral entropy criteria for the local and non-local structural segmentation in DNA sequences. The results are illustrated with the model examples and analysis of intervening exon-intron segments in the protein-coding regions.

  1. Population-based 3D genome structure analysis reveals driving forces in spatial genome organization

    PubMed Central

    Li, Wenyuan; Kalhor, Reza; Dai, Chao; Hao, Shengli; Gong, Ke; Zhou, Yonggang; Li, Haochen; Zhou, Xianghong Jasmine; Le Gros, Mark A.; Larabell, Carolyn A.; Chen, Lin; Alber, Frank

    2016-01-01

    Conformation capture technologies (e.g., Hi-C) chart physical interactions between chromatin regions on a genome-wide scale. However, the structural variability of the genome between cells poses a great challenge to interpreting ensemble-averaged Hi-C data, particularly for long-range and interchromosomal interactions. Here, we present a probabilistic approach for deconvoluting Hi-C data into a model population of distinct diploid 3D genome structures, which facilitates the detection of chromatin interactions likely to co-occur in individual cells. Our approach incorporates the stochastic nature of chromosome conformations and allows a detailed analysis of alternative chromatin structure states. For example, we predict and experimentally confirm the presence of large centromere clusters with distinct chromosome compositions varying between individual cells. The stability of these clusters varies greatly with their chromosome identities. We show that these chromosome-specific clusters can play a key role in the overall chromosome positioning in the nucleus and stabilizing specific chromatin interactions. By explicitly considering genome structural variability, our population-based method provides an important tool for revealing novel insights into the key factors shaping the spatial genome organization. PMID:26951677

  2. Structure of the germline genome of Tetrahymena thermophila and relationship to the massively rearranged somatic genome

    PubMed Central

    Hamilton, Eileen P; Kapusta, Aurélie; Huvos, Piroska E; Bidwell, Shelby L; Zafar, Nikhat; Tang, Haibao; Hadjithomas, Michalis; Krishnakumar, Vivek; Badger, Jonathan H; Caler, Elisabet V; Russ, Carsten; Zeng, Qiandong; Fan, Lin; Levin, Joshua Z; Shea, Terrance; Young, Sarah K; Hegarty, Ryan; Daza, Riza; Gujja, Sharvari; Wortman, Jennifer R; Birren, Bruce W; Nusbaum, Chad; Thomas, Jainy; Carey, Clayton M; Pritham, Ellen J; Feschotte, Cédric; Noto, Tomoko; Mochizuki, Kazufumi; Papazyan, Romeo; Taverna, Sean D; Dear, Paul H; Cassidy-Hanley, Donna M; Xiong, Jie; Miao, Wei; Orias, Eduardo; Coyne, Robert S

    2016-01-01

    The germline genome of the binucleated ciliate Tetrahymena thermophila undergoes programmed chromosome breakage and massive DNA elimination to generate the somatic genome. Here, we present a complete sequence assembly of the germline genome and analyze multiple features of its structure and its relationship to the somatic genome, shedding light on the mechanisms of genome rearrangement as well as the evolutionary history of this remarkable germline/soma differentiation. Our results strengthen the notion that a complex, dynamic, and ongoing interplay between mobile DNA elements and the host genome have shaped Tetrahymena chromosome structure, locally and globally. Non-standard outcomes of rearrangement events, including the generation of short-lived somatic chromosomes and excision of DNA interrupting protein-coding regions, may represent novel forms of developmental gene regulation. We also compare Tetrahymena’s germline/soma differentiation to that of other characterized ciliates, illustrating the wide diversity of adaptations that have occurred within this phylum. DOI: http://dx.doi.org/10.7554/eLife.19090.001 PMID:27892853

  3. Allelic genome structural variations in maize detected by array comparative genome hybridization.

    PubMed

    Beló, André; Beatty, Mary K; Hondred, David; Fengler, Kevin A; Li, Bailin; Rafalski, Antoni

    2010-01-01

    DNA polymorphisms such as insertion/deletions and duplications affecting genome segments larger than 1 kb are known as copy-number variations (CNVs) or structural variations (SVs). They have been recently studied in animals and humans by using array-comparative genome hybridization (aCGH), and have been associated with several human diseases. Their presence and phenotypic effects in plants have not been investigated on a genomic scale, although individual structural variations affecting traits have been described. We used aCGH to investigate the presence of CNVs in maize by comparing the genome of 13 maize inbred lines to B73. Analysis of hybridization signal ratios of 60,472 60-mer oligonucleotide probes between inbreds in relation to their location in the reference genome (B73) allowed us to identify clusters of probes that deviated from the ratio expected for equal copy-numbers. We found CNVs distributed along the maize genome in all chromosome arms. They occur with appreciable frequency in different germplasm subgroups, suggesting ancient origin. Validation of several CNV regions showed both insertion/deletions and copy-number differences. The nature of CNVs detected suggests CNVs might have a considerable impact on plant phenotypes, including disease response and heterosis.

  4. Inferring the structure of latent class models using a genetic algorithm.

    PubMed

    van der Maas, Han L J; Raijmakers, Maartje E J; Visser, Ingmar

    2005-05-01

    Present optimization techniques in latent class analysis apply the expectation maximization algorithm or the Newton-Raphson algorithm for optimizing the parameter values of a prespecified model. These techniques can be used to find maximum likelihood estimates of the parameters, given the specified structure of the model, which is defined by the number of classes and, possibly, fixation and equality constraints. The model structure is usually chosen on theoretical grounds. A large variety of structurally different latent class models can be compared using goodness-of-fit indices of the chi-square family, Akaike's information criterion, the Bayesian information criterion, and various other statistics. However, finding the optimal structure for a given goodness-of-fit index often requires a lengthy search in which all kinds of model structures are tested. Moreover, solutions may depend on the choice of initial values for the parameters. This article presents a new method by which one can simultaneously infer the model structure from the data and optimize the parameter values. The method consists of a genetic algorithm in which any goodness-of-fit index can be used as a fitness criterion. In a number of test cases in which data sets from the literature were used, it is shown that this method provides models that fit equally well as or better than the models suggested in the original articles.

  5. Electrical Structure Inferred by 3-D Lightning Mapping Observations During STEPS

    NASA Astrophysics Data System (ADS)

    Hamlin, T.; Krehbiel, P. R.; Zhang, Y.; Thomas, R. J.

    2002-12-01

    The Severe Thunderstorm Electrification and Precipitation Study (STEPS) provided numerous examples of storms which electrified anomalously, developing inverted tripole or quadrupole electrical structures. The storms were often supercells and cases where the lightning activity consisted primarily of IC flashes for substantial periods of time, only followed (if at all) much later by the onset of CG activity, were observed on several occasions. Radar comparisons for the tornadic storm of June 29 and the Bird City storm of June 3 during STEPS indicate that the main positive charge was localized in the precipitation core, but the electrification also had a definite horizontally extensive, multilayer structure extending away from the core. In these storms the upper positive charge region developed rapidly and produced intense lightning activity. The upper positive gradually evolved downward in altitude to become the dominant mid-level charge, forming an inverted tripole structure which appears to be stable for long periods of time. By assuming that a given polarity breakdown is moving into regions of opposite polarity charge (with exceptions) the total charge structure can be inferred and mapped based on information gleaned from the individual flashes; this allows use of the LMA data to detail the charge structure of storms. We take this approach to study the evolution of charge structures for storms during STEPS.

  6. Structural Genomics and Drug Discovery for Infectious Diseases

    SciTech Connect

    Anderson, W.F.

    2010-09-03

    The application of structural genomics methods and approaches to proteins from organisms causing infectious diseases is making available the three dimensional structures of many proteins that are potential drug targets and laying the groundwork for structure aided drug discovery efforts. There are a number of structural genomics projects with a focus on pathogens that have been initiated worldwide. The Center for Structural Genomics of Infectious Diseases (CSGID) was recently established to apply state-of-the-art high throughput structural biology technologies to the characterization of proteins from the National Institute for Allergy and Infectious Diseases (NIAID) category A-C pathogens and organisms causing emerging, or re-emerging infectious diseases. The target selection process emphasizes potential biomedical benefits. Selected proteins include known drug targets and their homologs, essential enzymes, virulence factors and vaccine candidates. The Center also provides a structure determination service for the infectious disease scientific community. The ultimate goal is to generate a library of structures that are available to the scientific community and can serve as a starting point for further research and structure aided drug discovery for infectious diseases. To achieve this goal, the CSGID will determine protein crystal structures of 400 proteins and protein-ligand complexes using proven, rapid, highly integrated, and cost-effective methods for such determination, primarily by X-ray crystallography. High throughput crystallographic structure determination is greatly aided by frequent, convenient access to high-performance beamlines at third-generation synchrotron X-ray sources.

  7. Structural Determinants and Mechanism of HIV-1 Genome Packaging

    PubMed Central

    Lu, Kun; Heng, Xiao; Summers, Michael F.

    2011-01-01

    Like all retroviruses, the Human Immunodeficiency Virus (HIV) selectively packages two copies of its unspliced RNA genome, both of which are utilized for strand-transfer mediated recombination during reverse transcription – a process that enables rapid evolution under environmental and chemotherapeutic pressures. The viral RNA appears to be selected for packaging as a dimer, and there is evidence that dimerization and packaging are mechanistically coupled. Both processes are mediated by interactions between the nucleocapsid (NC) domains of a small number of assembling viral Gag polyproteins and RNA elements within the 5′-untranslated region (5′-UTR) of the genome. A number of secondary structures have been predicted for regions of the genome that are responsible for packaging, and high-resolution structures have been determined for a few small RNA fragments and protein-RNA complexes. However, major questions remain open regarding the RNA structures, and potentially the structural changes, that are responsible for dimeric genome selection. Here we review efforts that have been made to identify the molecular determinants and mechanism of HIV-1 genome packaging. PMID:21762803

  8. Primary structure of the herpesvirus saimiri genome.

    PubMed Central

    Albrecht, J C; Nicholas, J; Biller, D; Cameron, K R; Biesinger, B; Newman, C; Wittmann, S; Craxton, M A; Coleman, H; Fleckenstein, B

    1992-01-01

    This report describes the complete nucleotide sequence of the genome of herpesvirus saimiri, the prototype of gammaherpesvirus subgroup 2 (rhadinoviruses). The unique low-G + C-content DNA region has 112,930 bp with an average base composition of 34.5% G + C and is flanked by about 35 noncoding high-G + C-content DNA repeats of 1,444 bp (70.8% G + C) in tandem orientation. We identified 76 major open reading frames and a set of seven U-RNA genes for a total of 83 potential genes. The genes are closely arranged, with only a few regions of sizable noncoding sequences. For 60 of the predicted proteins, homologous sequences are found in other herpesviruses. Genes conserved between herpesvirus saimiri and Epstein-Barr virus (gammaherpesvirus subgroup 1) show that their genomes are generally collinear, although conserved gene blocks are separated by unique genes that appear to determine the particular phenotype of these viruses. Several deduced protein sequences of herpesvirus saimiri without counterparts in most of the other sequenced herpesviruses exhibited significant homology with cellular proteins of known function. These include thymidylate synthase, dihydrofolate reductase, complement control proteins, the cell surface antigen CD59, cyclins, and G protein-coupled receptors. Searching for functional protein motifs revealed that the virus may encode a cytosine-specific methylase and a tyrosine-specific protein kinase. Several herpesvirus saimiri genes are potential candidates to cooperate with the gene for saimiri transformation-associated protein of subgroup A (STP-A) in T-lymphocyte growth stimulation. PMID:1321287

  9. Crustal stress and structure at Kīlauea Volcano inferred from seismic anisotropy

    USGS Publications Warehouse

    Johnson, Jessica H.; Swanson, Donald; Roman, Diana C.; Poland, Michael P.; Thelen, Weston A.

    2015-01-01

    Seismic anisotropy, measured through shear wave splitting (SWS) analysis, can be indicative of the state of stress in Earth's crust. Changes in SWS at Kīlauea Volcano, Hawai‘i, associated with the onset of summit eruptive activity in 2008 hint at the potential of the technique for tracking volcanic activity. To use SWS observations as a monitoring tool, however, it is important to understand the cause of seismic anisotropy at the volcano throughout the eruptive cycle. To address this need, we analyzed SWS results from across Kīlauea in combination with macroscopic surface structures (mapped fractures, faults, and fissures) and stress orientations inferred from fault plane solutions. Seismic anisotropy seems to be due to pervasive aligned structures in most regions of the volcano. The upper East and Southwest Rift Zones, however, show a bimodality in stress and SWS, suggesting a stress discontinuity with depth, perhaps related to magma conduits that trend obliquely to the dominant structure. Other areas in and around Kīlauea Caldera display principal stresses of similar magnitudes, indicating that small stress perturbations can rotate the maximum horizontal compressive stress direction by up to 90°. In these locations, static structures generally control SWS, but dynamic conditions due to magmatic activity can override the structural control. Monitoring of SWS may therefore provide important signs of impending volcanism.

  10. Inferring the mesoscale structure of layered, edge-valued, and time-varying networks

    NASA Astrophysics Data System (ADS)

    Peixoto, Tiago P.

    2015-10-01

    Many network systems are composed of interdependent but distinct types of interactions, which cannot be fully understood in isolation. These different types of interactions are often represented as layers, attributes on the edges, or as a time dependence of the network structure. Although they are crucial for a more comprehensive scientific understanding, these representations offer substantial challenges. Namely, it is an open problem how to precisely characterize the large or mesoscale structure of network systems in relation to these additional aspects. Furthermore, the direct incorporation of these features invariably increases the effective dimension of the network description, and hence aggravates the problem of overfitting, i.e., the use of overly complex characterizations that mistake purely random fluctuations for actual structure. In this work, we propose a robust and principled method to tackle these problems, by constructing generative models of modular network structure, incorporating layered, attributed and time-varying properties, as well as a nonparametric Bayesian methodology to infer the parameters from data and select the most appropriate model according to statistical evidence. We show that the method is capable of revealing hidden structure in layered, edge-valued, and time-varying networks, and that the most appropriate level of granularity with respect to the additional dimensions can be reliably identified. We illustrate our approach on a variety of empirical systems, including a social network of physicians, the voting correlations of deputies in the Brazilian national congress, the global airport network, and a proximity network of high-school students.

  11. Leveraging enzyme structure-function relationships for functional inference and experimental design: the structure-function linkage database.

    PubMed

    Pegg, Scott C-H; Brown, Shoshana D; Ojha, Sunil; Seffernick, Jennifer; Meng, Elaine C; Morris, John H; Chang, Patricia J; Huang, Conrad C; Ferrin, Thomas E; Babbitt, Patricia C

    2006-02-28

    The study of mechanistically diverse enzyme superfamilies-collections of enzymes that perform different overall reactions but share both a common fold and a distinct mechanistic step performed by key conserved residues-helps elucidate the structure-function relationships of enzymes. We have developed a resource, the structure-function linkage database (SFLD), to analyze these structure-function relationships. Unique to the SFLD is its hierarchical classification scheme based on linking the specific partial reactions (or other chemical capabilities) that are conserved at the superfamily, subgroup, and family levels with the conserved structural elements that mediate them. We present the results of analyses using the SFLD in correcting misannotations, guiding protein engineering experiments, and elucidating the function of recently solved enzyme structures from the structural genomics initiative. The SFLD is freely accessible at http://sfld.rbvi.ucsf.edu.

  12. The Impact of Structural Genomics: Expectations and Outcomes

    SciTech Connect

    Chandonia, John-Marc; Brenner, Steven E.

    2005-12-21

    Structural Genomics (SG) projects aim to expand our structural knowledge of biological macromolecules, while lowering the average costs of structure determination. We quantitatively analyzed the novelty, cost, and impact of structures solved by SG centers, and contrast these results with traditional structural biology. The first structure from a protein family is particularly important to reveal the fold and ancient relationships to other proteins. In the last year, approximately half of such structures were solved at a SG center rather than in a traditional laboratory. Furthermore, the cost of solving a structure at the most efficient U.S. center has now dropped to one-quarter the estimated cost of solving a structure by traditional methods. However, top structural biology laboratories are much more efficient than the average, and comparable to SG centers despite working on very challenging structures. Moreover, traditional structural biology papers are cited significantly more often, suggesting greater current impact.

  13. Mitochondrial Genome of Palpitomonas bilix: Derived Genome Structure and Ancestral System for Cytochrome c Maturation

    PubMed Central

    Nishimura, Yuki; Tanifuji, Goro; Kamikawa, Ryoma; Yabuki, Akinori; Hashimoto, Tetsuo; Inagaki, Yuji

    2016-01-01

    We here reported the mitochondrial (mt) genome of one of the heterotrophic microeukaryotes related to cryptophytes, Palpitomonas bilix. The P. bilix mt genome was found to be a linear molecule composed of “single copy region” (∼16 kb) and repeat regions (∼30 kb) arranged in an inverse manner at both ends of the genome. Linear mt genomes with large inverted repeats are known for three distantly related eukaryotes (including P. bilix), suggesting that this particular mt genome structure has emerged at least three times in the eukaryotic tree of life. The P. bilix mt genome contains 47 protein-coding genes including ccmA, ccmB, ccmC, and ccmF, which encode protein subunits involved in the system for cytochrome c maturation inherited from a bacterium (System I). We present data indicating that the phylogenetic relatives of P. bilix, namely, cryptophytes, goniomonads, and kathablepharids, utilize an alternative system for cytochrome c maturation, which has most likely emerged during the evolution of eukaryotes (System III). To explain the distribution of Systems I and III in P. bilix and its phylogenetic relatives, two scenarios are possible: (i) System I was replaced by System III on the branch leading to the common ancestor of cryptophytes, goniomonads, and kathablepharids, and (ii) the two systems co-existed in their common ancestor, and lost differentially among the four descendants. PMID:27604877

  14. Geographic population structure analysis of worldwide human populations infers their biogeographical origins.

    PubMed

    Elhaik, Eran; Tatarinova, Tatiana; Chebotarev, Dmitri; Piras, Ignazio S; Maria Calò, Carla; De Montis, Antonella; Atzori, Manuela; Marini, Monica; Tofanelli, Sergio; Francalacci, Paolo; Pagani, Luca; Tyler-Smith, Chris; Xue, Yali; Cucca, Francesco; Schurr, Theodore G; Gaieski, Jill B; Melendez, Carlalynne; Vilar, Miguel G; Owings, Amanda C; Gómez, Rocío; Fujita, Ricardo; Santos, Fabrício R; Comas, David; Balanovsky, Oleg; Balanovska, Elena; Zalloua, Pierre; Soodyall, Himla; Pitchappan, Ramasamy; Ganeshprasad, Arunkumar; Hammer, Michael; Matisoo-Smith, Lisa; Wells, R Spencer

    2014-04-29

    The search for a method that utilizes biological information to predict humans' place of origin has occupied scientists for millennia. Over the past four decades, scientists have employed genetic data in an effort to achieve this goal but with limited success. While biogeographical algorithms using next-generation sequencing data have achieved an accuracy of 700 km in Europe, they were inaccurate elsewhere. Here we describe the Geographic Population Structure (GPS) algorithm and demonstrate its accuracy with three data sets using 40,000-130,000 SNPs. GPS placed 83% of worldwide individuals in their country of origin. Applied to over 200 Sardinians villagers, GPS placed a quarter of them in their villages and most of the rest within 50 km of their villages. GPS's accuracy and power to infer the biogeography of worldwide individuals down to their country or, in some cases, village, of origin, underscores the promise of admixture-based methods for biogeography and has ramifications for genetic ancestry testing.

  15. Genomic Alteration in Head and Neck Squamous Cell Carcinoma (HNSCC) Cell Lines Inferred from Karyotyping, Molecular Cytogenetics, and Array Comparative Genomic Hybridization

    PubMed Central

    Rerkarmnuaychoke, Budsaba; Suntronpong, Aorarat; Fu, Beiyuan; Bodhisuwan, Winai; Peyachoknagul, Surin; Yang, Fengtang; Koontongkaew, Sittichai; Srikulnath, Kornsorn

    2016-01-01

    Genomic alteration in head and neck squamous cell carcinoma (HNSCC) was studied in two cell line pairs (HN30-HN31 and HN4-HN12) using conventional C-banding, multiplex fluorescence in situ hybridization (M-FISH), and array comparative genomic hybridization (array CGH). HN30 and HN4 were derived from primary lesions in the pharynx and base of tongue, respectively, and HN31 and HN12 were derived from lymph-node metastatic lesions belonging to the same patients. Gain of chromosome 1, 7, and 11 were shared in almost all cell lines. Hierarchical clustering revealed that HN31 was closely related to HN4, which shared eight chromosome alteration cases. Large C-positive heterochromatins were found in the centromeric region of chromosome 9 in HN31 and HN4, which suggests complex structural amplification of the repetitive sequence. Array CGH revealed amplification of 7p22.3p11.2, 8q11.23q12.1, and 14q32.33 in all cell lines involved with tumorigenesis and inflammation genes. The amplification of 2p21 (SIX3), 11p15.5 (H19), and 11q21q22.3 (MAML2, PGR, TRPC6, and MMP family) regions, and deletion of 9p23 (PTPRD) and 16q23.1 (WWOX) regions were identified in HN31 and HN12. Interestingly, partial loss of PTPRD (9p23) and WWOX (16q23.1) genes was identified in HN31 and HN12, and the level of gene expression tended to be the down-regulation of PTPRD, with no detectable expression of the WWOX gene. This suggests that the scarcity of PTPRD and WWOX genes might have played an important role in progression of HNSCC, and could be considered as a target for cancer therapy or a biomarker in molecular pathology. PMID:27501229

  16. Structural Genomics of Bacterial Virulence Factors

    DTIC Science & Technology

    2004-05-01

    drug design . In this first year of funding we have focused our attention on plasmid annotation, target selection, protein expression, purification and crystallization of proteins encoded by the Bacillus anthracis pXOl plasmid. We have cloned and expressed a total of 35 new proteins, and structural analysis of several of these is underway. Currently, 3 new crystal structures are essentially complete, and 6 crystal structures of anthrax Lethal Factor in complex with small molecule inhibitors provided by our collaborators have been determined, and lodged in the public data

  17. Phylogeny of Oedogoniales, Chaetophorales and Chaetopeltidales (Chlorophyceae): inferences from sequence-structure analysis of ITS2

    PubMed Central

    Buchheim, Mark A.; Sutherland, Danica M.; Schleicher, Tina; Förster, Frank; Wolf, Matthias

    2012-01-01

    Background and Aims The green algal class Chlorophyceae comprises five orders (Chlamydomonadales, Sphaeropleales, Chaetophorales, Chaetopeltidales and Oedogoniales). Attempts to resolve the relationships among these groups have met with limited success. Studies of single genes (18S rRNA, 26S rRNA, rbcL or atpB) have largely failed to unambiguously resolve the relative positions of Oedogoniales, Chaetophorales and Chaetopeltidales (the OCC taxa). In contrast, recent genomics analyses of plastid data from OCC exemplars provided a robust phylogenetic analysis that supports a monophyletic OCC alliance. Methods An ITS2 data set was assembled to independently test the OCC hypothesis and to evaluate the performance of these data in assessing green algal phylogeny at the ordinal or class level. Sequence-structure analysis designed for use with ITS2 data was employed for phylogenetic reconstruction. Key Results Results of this study yielded trees that were, in general, topologically congruent with the results from the genomic analyses, including support for the monophyly of the OCC alliance. Conclusions Not all nodes from the ITS2 analyses exhibited robust support, but our investigation demonstrates that sequence-structure analyses of ITS2 provide a taxon-rich means of testing phylogenetic hypotheses at high taxonomic levels. Thus, the ITS2 data, in the context of sequence-structure analysis, provide an economical supplement or alternative to the single-marker approaches used in green algal phylogeny. PMID:22028463

  18. Life-history traits of the Miocene Hipparion concudense (Spain) inferred from bone histological structure.

    PubMed

    Martinez-Maza, Cayetana; Alberdi, Maria Teresa; Nieto-Diaz, Manuel; Prado, José Luis

    2014-01-01

    Histological analyses of fossil bones have provided clues on the growth patterns and life history traits of several extinct vertebrates that would be unavailable for classical morphological studies. We analyzed the bone histology of Hipparion to infer features of its life history traits and growth pattern. Microscope analysis of thin sections of a large sample of humeri, femora, tibiae and metapodials of Hipparion concudense from the upper Miocene site of Los Valles de Fuentidueña (Segovia, Spain) has shown that the number of growth marks is similar among the different limb bones, suggesting that equivalent skeletochronological inferences for this Hipparion population might be achieved by means of any of the elements studied. Considering their abundance, we conducted a skeletechronological study based on the large sample of third metapodials from Los Valles de Fuentidueña together with another large sample from the Upper Miocene locality of Concud (Teruel, Spain). The data obtained enabled us to distinguish four age groups in both samples and to determine that Hipparion concudense tended to reach skeletal maturity during its third year of life. Integration of bone microstructure and skeletochronological data allowed us to identify ontogenetic changes in bone structure and growth rate and to distinguish three histologic ontogenetic stages corresponding to immature, subadult and adult individuals. Data on secondary osteon density revealed an increase in bone remodeling throughout the ontogenetic stages and a lesser degree thereof in the Concud population, which indicates different biomechanical stresses in the two populations, likely due to environmental differences. Several individuals showed atypical growth patterns in the Concud sample, which may also reflect environmental differences between the two localities. Finally, classification of the specimens' age within groups enabled us to characterize the age structure of both samples, which is typical of

  19. Life-History Traits of the Miocene Hipparion concudense (Spain) Inferred from Bone Histological Structure

    PubMed Central

    Martinez-Maza, Cayetana; Alberdi, Maria Teresa; Nieto-Diaz, Manuel; Prado, José Luis

    2014-01-01

    Histological analyses of fossil bones have provided clues on the growth patterns and life history traits of several extinct vertebrates that would be unavailable for classical morphological studies. We analyzed the bone histology of Hipparion to infer features of its life history traits and growth pattern. Microscope analysis of thin sections of a large sample of humeri, femora, tibiae and metapodials of Hipparion concudense from the upper Miocene site of Los Valles de Fuentidueña (Segovia, Spain) has shown that the number of growth marks is similar among the different limb bones, suggesting that equivalent skeletochronological inferences for this Hipparion population might be achieved by means of any of the elements studied. Considering their abundance, we conducted a skeletechronological study based on the large sample of third metapodials from Los Valles de Fuentidueña together with another large sample from the Upper Miocene locality of Concud (Teruel, Spain). The data obtained enabled us to distinguish four age groups in both samples and to determine that Hipparion concudense tended to reach skeletal maturity during its third year of life. Integration of bone microstructure and skeletochronological data allowed us to identify ontogenetic changes in bone structure and growth rate and to distinguish three histologic ontogenetic stages corresponding to immature, subadult and adult individuals. Data on secondary osteon density revealed an increase in bone remodeling throughout the ontogenetic stages and a lesser degree thereof in the Concud population, which indicates different biomechanical stresses in the two populations, likely due to environmental differences. Several individuals showed atypical growth patterns in the Concud sample, which may also reflect environmental differences between the two localities. Finally, classification of the specimens’ age within groups enabled us to characterize the age structure of both samples, which is typical of

  20. Benefits of Structural Genomics for Drug Discovery Research

    PubMed Central

    Grabowski, Marek; Chruszcz, Maksymilian; Zimmerman, Matthew D.; Kirillova, Olga; Minor, Wladek

    2010-01-01

    While three dimensional structures have long been used to search for new drug targets, only a fraction of new drugs coming to the market has been developed with the use of a structure-based drug discovery approach. However, the recent years have brought not only an avalanche of new macromolecular structures, but also significant advances in the protein structure determination methodology only now making their way into structure-based drug discovery. In this paper, we review recent developments resulting from the Structural Genomics (SG) programs, focusing on the methods and results most likely to improve our understanding of the molecular foundation of human diseases. SG programs have been around for almost a decade, and in that time, have contributed a significant part of the structural coverage of both the genomes of pathogens causing infectious diseases and structurally uncharacterized biological processes in general. Perhaps most importantly, SG programs have developed new methodology at all steps of the structure determination process, not only to determine new structures highly efficiently, but also to screen protein/ligand interactions. We describe the methodologies, experience and technologies developed by SG, which range from improvements to cloning protocols to improved procedures for crystallographic structure solution that may be applied in “traditional” structural biology laboratories particularly those performing drug discovery. We also discuss the conditions that must be met to convert the present high-throughput structure determination pipeline into a high-output structure-based drug discovery system. PMID:19594422

  1. Benefits of Structural Genomics for Drug Discovery Research

    SciTech Connect

    Grabowski, M.; Chruszcz, M; Zimmerman, M; Kirillova, O; Minor, W

    2009-01-01

    While three dimensional structures have long been used to search for new drug targets, only a fraction of new drugs coming to the market has been developed with the use of a structure-based drug discovery approach. However, the recent years have brought not only an avalanche of new macromolecular structures, but also significant advances in the protein structure determination methodology only now making their way into structure-based drug discovery. In this paper, we review recent developments resulting from the Structural Genomics (SG) programs, focusing on the methods and results most likely to improve our understanding of the molecular foundation of human diseases. SG programs have been around for almost a decade, and in that time, have contributed a significant part of the structural coverage of both the genomes of pathogens causing infectious diseases and structurally uncharacterized biological processes in general. Perhaps most importantly, SG programs have developed new methodology at all steps of the structure determination process, not only to determine new structures highly efficiently, but also to screen protein/ligand interactions. We describe the methodologies, experience and technologies developed by SG, which range from improvements to cloning protocols to improved procedures for crystallographic structure solution that may be applied in 'traditional' structural biology laboratories particularly those performing drug discovery. We also discuss the conditions that must be met to convert the present high-throughput structure determination pipeline into a high-output structure-based drug discovery system.

  2. The use of structural modelling to infer structure and function in biocontrol agents.

    PubMed

    Berry, Colin; Board, Jason

    2017-01-01

    Homology modelling can provide important insights into the structures of proteins when a related protein structure has already been solved. However, for many proteins, including a number of invertebrate-active toxins and accessory proteins, no such templates exist. In these cases, techniques of ab initio, template-independent modelling can be employed to generate models that may give insight into structure and function. In this overview, examples of both the problems and the potential benefits of ab initio techniques are illustrated. Consistent modelling results may indicate useful approximations to actual protein structures and can thus allow the generation of hypotheses regarding activity that can be tested experimentally.

  3. Phylogenetic inference and SSR characterization of tropical woody bamboos tribe Bambuseae (Poaceae: Bambusoideae) based on complete plastid genome sequences.

    PubMed

    Vieira, Leila do Nascimento; Dos Anjos, Karina Goulart; Faoro, Helisson; Fraga, Hugo Pacheco de Freitas; Greco, Thiago Machado; Pedrosa, Fábio de Oliveira; de Souza, Emanuel Maltempi; Rogalski, Marcelo; de Souza, Robson Francisco; Guerra, Miguel Pedro

    2016-05-01

    The complete plastome sequencing is an efficient option for increasing phylogenetic resolution and evolutionary studies, as well as may greatly facilitate the use of plastid DNA markers in plant population genetic studies. Merostachys and Guadua stand out as the most common and the highest potential utilization bamboos indigenous of Brazil. Here, we sequenced the complete plastome sequences of the Brazilian Guadua chacoensis and Merostachys sp. to perform full plastome phylogeny and characterize the occurrence, type, and distribution of SRRs using 20 Bambuseae species. The determined plastome sequence of Merostachys sp. and G. chacoensis is 136,334 and 135,403 bp in size, respectively, with an identical gene content and typical quadripartite structure consisting of a pair of IRs separated by the LSC and SSC regions. The Maximum Likelihood and Bayesian Inference analyses produced phylogenomic trees identical in topology. These trees supported monophyly of Paleotropical and Neotropical Bamboos clades. The Neotropical bamboos segregated into three well-supported lineages, Chusqueinae, Guaduinae, and Arthrostylidiinae, with the last two forming a well-supported sister relationship. Paleotropical bamboos segregated into two well-supported lineages, Hickeliinae and Bambusinae + Melocanninae. We identified 141.8 cpSSR in Bambuseae plastomes and an inferior value (38.15) for plastome coding sequences. Among them, we identified 16 polymorphic SSR loci, with number of alleles varying from 3 to 10. These 16 polymorphic cpSSR loci in Bambuseae plastome can be assessed for the intraspecific level of polymorphism, leading to innovative highly sensitive phylogeographic and population genetics studies for this tribe.

  4. Coevolution of the Organization and Structure of Prokaryotic Genomes.

    PubMed

    Touchon, Marie; Rocha, Eduardo P C

    2016-01-04

    The cytoplasm of prokaryotes contains many molecular machines interacting directly with the chromosome. These vital interactions depend on the chromosome structure, as a molecule, and on the genome organization, as a unit of genetic information. Strong selection for the organization of the genetic elements implicated in these interactions drives replicon ploidy, gene distribution, operon conservation, and the formation of replication-associated traits. The genomes of prokaryotes are also very plastic with high rates of horizontal gene transfer and gene loss. The evolutionary conflicts between plasticity and organization lead to the formation of regions with high genetic diversity whose impact on chromosome structure is poorly understood. Prokaryotic genomes are remarkable documents of natural history because they carry the imprint of all of these selective and mutational forces. Their study allows a better understanding of molecular mechanisms, their impact on microbial evolution, and how they can be tinkered in synthetic biology.

  5. Evaluating the Influence of the Microsatellite Marker Set on the Genetic Structure Inferred in Pyrus communis L.

    PubMed Central

    Urrestarazu, Jorge; Royo, José B.; Santesteban, Luis G.; Miranda, Carlos

    2015-01-01

    Fingerprinting information can be used to elucidate in a robust manner the genetic structure of germplasm collections, allowing a more rational and fine assessment of genetic resources. Bayesian model-based approaches are nowadays majorly preferred to infer genetic structure, but it is still largely unresolved how marker sets should be built in order to obtain a robust inference. The objective was to evaluate, in Pyrus germplasm collections, the influence of the SSR marker set size on the genetic structure inferred, also evaluating the influence of the criterion used to select those markers. Inferences were performed considering an increasing number of SSR markers that ranged from just two up to 25, incorporated one at a time into the analysis. The influence of the number of SSR markers used was evaluated comparing the number of populations and the strength of the signal detected, and also the similarity of the genotype assignments to populations between analyses. In order to test if those results were influenced by the criterion used to select the SSRs, several choosing scenarios based on the discrimination power or the fixation index values of the SSRs were tested. Our results indicate that population structure could be inferred accurately once a certain SSR number threshold was reached, which depended on the underlying structure within the genotypes, but the method used to select the markers included on each set appeared not to be very relevant. The minimum number of SSRs required to provide robust structure inferences and adequate measurements of the differentiation, even when low differentiation levels exist within populations, was proved similar to that of the complete list of recommended markers for fingerprinting. When a SSR set size similar to the minimum marker sets recommended for fingerprinting it is used, only major divisions or moderate (FST>0.05) differentiation of the germplasm are detected. PMID:26382618

  6. Hippocampal Structure Predicts Statistical Learning and Associative Inference Abilities during Development.

    PubMed

    Schlichting, Margaret L; Guarino, Katharine F; Schapiro, Anna C; Turk-Browne, Nicholas B; Preston, Alison R

    2017-01-01

    Despite the importance of learning and remembering across the lifespan, little is known about how the episodic memory system develops to support the extraction of associative structure from the environment. Here, we relate individual differences in volumes along the hippocampal long axis to performance on statistical learning and associative inference tasks-both of which require encoding associations that span multiple episodes-in a developmental sample ranging from ages 6 to 30 years. Relating age to volume, we found dissociable patterns across the hippocampal long axis, with opposite nonlinear volume changes in the head and body. These structural differences were paralleled by performance gains across the age range on both tasks, suggesting improvements in the cross-episode binding ability from childhood to adulthood. Controlling for age, we also found that smaller hippocampal heads were associated with superior behavioral performance on both tasks, consistent with this region's hypothesized role in forming generalized codes spanning events. Collectively, these results highlight the importance of examining hippocampal development as a function of position along the hippocampal axis and suggest that the hippocampal head is particularly important in encoding associative structure across development.

  7. Phylogeography and population structure of the biologically invasive phytopathogen Erwinia amylovora inferred using minisatellites.

    PubMed

    Bühlmann, Andreas; Dreo, Tanja; Rezzonico, Fabio; Pothier, Joël F; Smits, Theo H M; Ravnikar, Maja; Frey, Jürg E; Duffy, Brion

    2014-07-01

    Erwinia amylovora causes a major disease of pome fruit trees worldwide, and is regulated as a quarantine organism in many countries. While some diversity of isolates has been observed, molecular epidemiology of this bacterium is hindered by a lack of simple molecular typing techniques with sufficiently high resolution. We report a molecular typing system of E. amylovora based on variable number of tandem repeats (VNTR) analysis. Repeats in the E. amylovora genome were identified with comparative genomic tools, and VNTR markers were developed and validated. A Multiple-Locus VNTR Analysis (MLVA) was applied to E. amylovora isolates from bacterial collections representing global and regional distribution of the pathogen. Based on six repeats, MLVA allowed the distinction of 227 haplotypes among a collection of 833 isolates of worldwide origin. Three geographically separated groups were recognized among global isolates using Bayesian clustering methods. Analysis of regional outbreaks confirmed presence of diverse haplotypes but also high representation of certain haplotypes during outbreaks. MLVA analysis is a practical method for epidemiological studies of E. amylovora, identifying previously unresolved population structure within outbreaks. Knowledge of such structure can increase our understanding on how plant diseases emerge and spread over a given geographical region.

  8. Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms.

    PubMed

    Galbadrakh, Bulgan; Lee, Kyung-Eun; Park, Hyun-Seok

    2012-12-01

    Grammatical inference methods are expected to find grammatical structures hidden in biological sequences. One hopes that studies of grammar serve as an appropriate tool for theory formation. Thus, we have developed JSequitur for automatically generating the grammatical structure of biological sequences in an inference framework of string compression algorithms. Our original motivation was to find any grammatical traits of several cancer genes that can be detected by string compression algorithms. Through this research, we could not find any meaningful unique traits of the cancer genes yet, but we could observe some interesting traits in regards to the relationship among gene length, similarity of sequences, the patterns of the generated grammar, and compression rate.

  9. Symbolic extensions applied to multiscale structure of genomes.

    PubMed

    Downarowicz, Tomasz; Travisany, Dante; Montecino, Martin; Maass, Alejandro

    2014-06-01

    A genome of a living organism consists of a long string of symbols over a finite alphabet carrying critical information for the organism. This includes its ability to control post natal growth, homeostasis, adaptation to changes in the surrounding environment, or to biochemically respond at the cellular level to various specific regulatory signals. In this sense, a genome represents a symbolic encoding of a highly organized system of information whose functioning may be revealed as a natural multilayer structure in terms of complexity and prominence. In this paper we use the mathematical theory of symbolic extensions as a framework to shed light onto how this multilayer organization is reflected in the symbolic coding of the genome. The distribution of data in an element of a standard symbolic extension of a dynamical system has a specific form: the symbolic sequence is divided into several subsequences (which we call layers) encoding the dynamics on various "scales". We propose that a similar structure resides within the genomes, building our analogy on some of the most recent findings in the field of regulation of genomic DNA functioning.

  10. The evolutionary history of Plasmodium vivax as inferred from mitochondrial genomes: parasite genetic diversity in the Americas.

    PubMed

    Taylor, Jesse E; Pacheco, M Andreína; Bacon, David J; Beg, Mohammad A; Machado, Ricardo Luiz; Fairhurst, Rick M; Herrera, Socrates; Kim, Jung-Yeon; Menard, Didier; Póvoa, Marinete Marins; Villegas, Leopoldo; Mulyanto; Snounou, Georges; Cui, Liwang; Zeyrek, Fadile Yildiz; Escalante, Ananias A

    2013-09-01

    Plasmodium vivax is the most prevalent human malaria parasite in the Americas. Previous studies have contrasted the genetic diversity of parasite populations in the Americas with those in Asia and Oceania, concluding that New World populations exhibit low genetic diversity consistent with a recent introduction. Here we used an expanded sample of complete mitochondrial genome sequences to investigate the diversity of P. vivax in the Americas as well as in other continental populations. We show that the diversity of P. vivax in the Americas is comparable to that in Asia and Oceania, and we identify several divergent clades circulating in South America that may have resulted from independent introductions. In particular, we show that several haplotypes sampled in Venezuela and northeastern Brazil belong to a clade that diverged from the other P. vivax lineages at least 30,000 years ago, albeit not necessarily in the Americas. We propose that, unlike in Asia where human migration increases local genetic diversity, the combined effects of the geographical structure and the low incidence of vivax malaria in the Americas has resulted in patterns of low local but high regional genetic diversity. This could explain previous views that P. vivax in the Americas has low genetic diversity because these were based on studies carried out in limited areas. Further elucidation of the complex geographical pattern of P. vivax variation will be important both for diversity assessments of genes encoding candidate vaccine antigens and in the formulation of control and surveillance measures aimed at malaria elimination.

  11. The Evolutionary History of Plasmodium vivax as Inferred from Mitochondrial Genomes: Parasite Genetic Diversity in the Americas

    PubMed Central

    Taylor, Jesse E.; Pacheco, M. Andreína; Bacon, David J.; Beg, Mohammad A.; Machado, Ricardo Luiz; Fairhurst, Rick M.; Herrera, Socrates; Kim, Jung-Yeon; Menard, Didier; Póvoa, Marinete Marins; Villegas, Leopoldo; Mulyanto; Snounou, Georges; Cui, Liwang; Zeyrek, Fadile Yildiz; Escalante, Ananias A.

    2013-01-01

    Plasmodium vivax is the most prevalent human malaria parasite in the Americas. Previous studies have contrasted the genetic diversity of parasite populations in the Americas with those in Asia and Oceania, concluding that New World populations exhibit low genetic diversity consistent with a recent introduction. Here we used an expanded sample of complete mitochondrial genome sequences to investigate the diversity of P. vivax in the Americas as well as in other continental populations. We show that the diversity of P. vivax in the Americas is comparable to that in Asia and Oceania, and we identify several divergent clades circulating in South America that may have resulted from independent introductions. In particular, we show that several haplotypes sampled in Venezuela and northeastern Brazil belong to a clade that diverged from the other P. vivax lineages at least 30,000 years ago, albeit not necessarily in the Americas. We propose that, unlike in Asia where human migration increases local genetic diversity, the combined effects of the geographical structure and the low incidence of vivax malaria in the Americas has resulted in patterns of low local but high regional genetic diversity. This could explain previous views that P. vivax in the Americas has low genetic diversity because these were based on studies carried out in limited areas. Further elucidation of the complex geographical pattern of P. vivax variation will be important both for diversity assessments of genes encoding candidate vaccine antigens and in the formulation of control and surveillance measures aimed at malaria elimination. PMID:23733143

  12. Structural Genomics of Bacterial Virulence Factors

    DTIC Science & Technology

    2006-05-01

    involved in65 these processes. The large G+ C content difference between66 orf6, orf7 and orf8 (35%), and other Bacteroides genes67 ( 42 %) suggests a...initiating assembly of the central spindle, a structure that has important roles in cytokinesis. In C . elegans embryos and other animal cells, central...D. Read, T. Popovic, and C . M. Fraser. 2004. Identification of anthrax toxin genes in a Bacillus cereus associated with an illness resembling

  13. Structure and Functional Studies on Dengue-2 Virus Genome

    DTIC Science & Technology

    1986-03-01

    AD STRUCTURE AND FUNCTIONAL STUDIES ON DENGUE -2 VIRUS GENOME FINAL Report Lfl C’) Radha Krishnan Padmanabhan, Ph.D. 0) March 1, 1986 Supported by U.S...and Functional Studies on Dengue -2 Virus Genome 12. PERSONAL AUTHOR(S) Radha Krishnan Padmanabhan 13a. TYPE OF REPORT 13b. TIME COVERED 14. DATE OF...3’-end of Dengue RNA in order to facilitate cDNA synthesis by oligo d(T) priming as proposed in the original research project. 2. We also showed that

  14. Structure and Functional Studies on Dengue-2 Virus Genome

    DTIC Science & Technology

    1986-03-01

    AD_ _ _ Lfl oSTRUCTURE AND FUNCTIONAL STUDIES ON DENGUE -2 VIRUS GENOME 0Annual Report Radha Krishnan Padmanabhan, Ph.D. March 1, 1986 Supported by...Studies on Dengue -2 Virus Genome 12 PERSONAL AUTHOR(S) Radha Krishnan Padmanabhan 13a TYPE OF REPORT 1 3b TIME COVERED 14 DATE OF REPORT (Year, Month, Day...analysis of these clones totalling 06 01 14,586 nucleotides: Deduced amino acid sequences of dengue virI 19 ABSTRACT (Continue on reverse of

  15. Functional characterization of somatic mutations in cancer using network-based inference of protein activity | Office of Cancer Genomics

    Cancer.gov

    Identifying the multiple dysregulated oncoproteins that contribute to tumorigenesis in a given patient is crucial for developing personalized treatment plans. However, accurate inference of aberrant protein activity in biological samples is still challenging as genetic alterations are only partially predictive and direct measurements of protein activity are generally not feasible.

  16. Arthropod Phylogenetics in Light of Three Novel Millipede (Myriapoda: Diplopoda) Mitochondrial Genomes with Comments on the Appropriateness of Mitochondrial Genome Sequence Data for Inferring Deep Level Relationships

    PubMed Central

    Brewer, Michael S.; Swafford, Lynn; Spruill, Chad L.; Bond, Jason E.

    2013-01-01

    Background Arthropods are the most diverse group of eukaryotic organisms, but their phylogenetic relationships are poorly understood. Herein, we describe three mitochondrial genomes representing orders of millipedes for which complete genomes had not been characterized. Newly sequenced genomes are combined with existing data to characterize the protein coding regions of myriapods and to attempt to reconstruct the evolutionary relationships within the Myriapoda and Arthropoda. Results The newly sequenced genomes are similar to previously characterized millipede sequences in terms of synteny and length. Unique translocations occurred within the newly sequenced taxa, including one half of the Appalachioria falcifera genome, which is inverted with respect to other millipede genomes. Across myriapods, amino acid conservation levels are highly dependent on the gene region. Additionally, individual loci varied in the level of amino acid conservation. Overall, most gene regions showed low levels of conservation at many sites. Attempts to reconstruct the evolutionary relationships suffered from questionable relationships and low support values. Analyses of phylogenetic informativeness show the lack of signal deep in the trees (i.e., genes evolve too quickly). As a result, the myriapod tree resembles previously published results but lacks convincing support, and, within the arthropod tree, well established groups were recovered as polyphyletic. Conclusions The novel genome sequences described herein provide useful genomic information concerning millipede groups that had not been investigated. Taken together with existing sequences, the variety of compositions and evolution of myriapod mitochondrial genomes are shown to be more complex than previously thought. Unfortunately, the use of mitochondrial protein-coding regions in deep arthropod phylogenetics appears problematic, a result consistent with previously published studies. Lack of phylogenetic signal renders the

  17. Unleashing the power of meta-threading for evolution/structure-based function inference of proteins.

    PubMed

    Brylinski, Michal

    2013-01-01

    Protein threading is widely used in the prediction of protein structure and the subsequent functional annotation. Most threading approaches employ similar criteria for the template identification for use in both protein structure and function modeling. Using structure similarity alone might result in a high false positive rate in protein function inference, which suggests that selecting functional templates should be subject to a different set of constraints. In this study, we extend the functionality of eThread, a recently developed approach to meta-threading, focusing on the optimal selection of functional templates. We optimized the selection of template proteins to cover a broad spectrum of protein molecular function: ligand, metal, inorganic cluster, protein, and nucleic acid binding. In large-scale benchmarks, we demonstrate that the recognition rates in identifying templates that bind molecular partners in similar locations are very high, typically 70-80%, at the expense of a relatively low false positive rate. eThread also provides useful insights into the chemical properties of binding molecules and the structural features of binding. For instance, the sensitivity in recognizing similar protein-binding interfaces is 58% at only 18% false positive rate. Furthermore, in comparative analysis, we demonstrate that meta-threading supported by machine learning outperforms single-threading approaches in functional template selection. We show that meta-threading effectively detects many facets of protein molecular function, even in a low-sequence identity regime. The enhanced version of eThread is freely available as a webserver and stand-alone software at http://www.brylinski.org/ethread.

  18. Function inferences from a molecular structural model of bacterial ParE toxin

    PubMed Central

    Barbosa, Luiz Carlos Bertucci; Garrido, Saulo Santesso; Garcia, Anderson; Delfino, Davi Barbosa; Marchetto, Reinaldo

    2010-01-01

    Toxin-antitoxin (TA) systems contribute to plasmid stability by a mechanism that relies on the differential stabilities of the toxin and antitoxin proteins and leads to the killing of daughter bacteria that did not receive a plasmid copy at the cell division. ParE is the toxic component of a TA system that constitutes along with RelE an important class of bacterial toxin called RelE/ParE superfamily. For ParE toxin, no crystallographic structure is available so far and rare in vitro studies demonstrated that the target of toxin activity is E. coli DNA gyrase. Here, a 3D Model for E. coli ParE toxin by molecular homology modeling was built using MODELLER, a program for comparative modeling. The Model was energy minimized by CHARMM and validated using PROCHECK and VERIFY3D programs. Resulting Ramachandran plot analysis it was found that the portion residues failing into the most favored and allowed regions was 96.8%. Structural similarity search employing DALI server showed as the best matches RelE and YoeB families. The Model also showed similarities with other microbial ribonucleases but in a small score. A possible homologous deep cleft active site was identified in the Model using CASTp program. Additional studies to investigate the nuclease activity in members of ParE family as well as to confirm the inhibitory replication activity are needed. The predicted Model allows initial inferences about the unexplored 3D structure of the ParE toxin and may be further used in rational design of molecules for structure­function studies. PMID:20975905

  19. Structural analysis of hepatitis C RNA genome using DNA microarrays

    PubMed Central

    Martell, María; Briones, Carlos; de Vicente, Aránzazu; Piron, María; Esteban, Juan I.; Esteban, Rafael; Guardia, Jaime; Gómez, Jordi

    2004-01-01

    Many studies have tried to identify specific nucleotide sequences in the quasispecies of hepatitis C virus (HCV) that determine resistance or sensitivity to interferon (IFN) therapy, unfortunately without conclusive results. Although viral proteins represent the most evident phenotype of the virus, genomic RNA sequences determine secondary and tertiary structures which are also part of the viral phenotype and can be involved in important biological roles. In this work, a method of RNA structure analysis has been developed based on the hybridization of labelled HCV transcripts to microarrays of complementary DNA oligonucleotides. Hybridizations were carried out at non-denaturing conditions, using appropriate temperature and buffer composition to allow binding to the immobilized probes of the RNA transcript without disturbing its secondary/tertiary structural motifs. Oligonucleotides printed onto the microarray covered the entire 5′ non-coding region (5′NCR), the first three-quarters of the core region, the E2–NS2 junction and the first 400 nt of the NS3 region. We document the use of this methodology to analyse the structural degree of a large region of HCV genomic RNA in two genotypes associated with different responses to IFN treatment. The results reported here show different structural degree along the genome regions analysed, and differential hybridization patterns for distinct genotypes in NS2 and NS3 HCV regions. PMID:15247323

  20. The mitochondrial genome of the red alga Kappaphycus striatus ("Green Sacol" variety): complete nucleotide sequence, genome structure and organization, and comparative analysis.

    PubMed

    Tablizo, Francis A; Lluisma, Arturo O

    2014-12-01

    The complete mitochondrial (mt) DNA sequence of the rhodophyte Kappaphycus striatus ("Green Sacol" variety) was determined. The mtDNA is circular, 25,242 bases long (A+T content: 69.94%), and contains 50 densely packed genes comprising 93.22% of the mitochondrial genome, with genes encoded on both strands. Through comparative analysis, the overall sequence, genome structure, and organization of K. striatus mtDNA were seen to be highly similar with other fully sequenced mitochondrial genomes of the class Florideophyceae. On the other hand, certain degrees of genome rearrangements and greater sequence dissimilarities were observed for the mtDNAs of other evolutionarily distant red algae, such as those from the class Bangiophyceae and Cyanidiophyceae, compared to that of K. striatus. Furthermore, a trend was observed wherein the red algal mtDNAs tend to encode lesser number of protein-coding genes, albeit not necessarily shorter, as the organism becomes more morphologically complex. This trend is supported by the phylogenetic tree inferred from the concatenated amino acid sequences of the deduced protein products of cytochrome c oxidase subunit genes (cox1, 2, and 3).

  1. Genome-Wide Analyses of Individual Strongyloides stercoralis (Nematoda: Rhabditoidea) Provide Insights into Population Structure and Reproductive Life Cycles

    PubMed Central

    Aung, Myo Pa Pa Thet Hnin Htwe; Afrin, Tanzila; Nagayasu, Eiji; Tanaka, Ryusei; Higashiarakawa, Miwa; Win, Kyu Kyu; Hirata, Tetsuo; Htike, Wah Win; Fujita, Jiro; Maruyama, Haruhiko

    2016-01-01

    The helminth Strongyloides stercoralis, which is transmitted through soil, infects 30–100 million people worldwide. S. stercoralis reproduces sexually outside the host as well as asexually within the host, which causes a life-long infection. To understand the population structure and transmission patterns of this parasite, we re-sequenced the genomes of 33 individual S. stercoralis nematodes collected in Myanmar (prevalent region) and Japan (non-prevalent region). We utilised a method combining whole genome amplification and next-generation sequencing techniques to detect 298,202 variant positions (0.6% of the genome) compared with the reference genome. Phylogenetic analyses of SNP data revealed an unambiguous geographical separation and sub-populations that correlated with the host geographical origin, particularly for the Myanmar samples. The relatively higher heterozygosity in the genomes of the Japanese samples can possibly be explained by the independent evolution of two haplotypes of diploid genomes through asexual reproduction during the auto-infection cycle, suggesting that analysing heterozygosity is useful and necessary to infer infection history and geographical prevalence. PMID:28033376

  2. The first complete plastid genomes of Melastomataceae are highly structurally conserved

    PubMed Central

    Neubig, Kurt M.; Majure, Lucas C.

    2016-01-01

    Background In the past three decades, several studies have predominantly relied on a small sample of the plastome to infer deep phylogenetic relationships in the species-rich Melastomataceae. Here, we report the first full plastid sequences of this family, compare general features of the sampled plastomes to other sequenced Myrtales, and survey the plastomes for highly informative regions for phylogenetics. Methods Genome skimming was performed for 16 species spread across the Melastomataceae. Plastomes were assembled, annotated and compared to eight sequenced plastids in the Myrtales. Phylogenetic inference was performed using Maximum Likelihood on six different data sets, where putative biases were taken into account. Summary statistics were generated for all introns and intergenic spacers with suitable size for polymerase chain reaction (PCR) amplification and used to rank the markers by phylogenetic information. Results The majority of the plastomes sampled are conserved in gene content and order, as well as in sequence length and GC content within plastid regions and sequence classes. Departures include the putative presence of rps16 and rpl2 pseudogenes in some plastomes. Phylogenetic analyses of the majority of the schemes analyzed resulted in the same topology with high values of bootstrap support. Although there is still uncertainty in some relationships, in the highest supported topologies only two nodes received bootstrap values lower than 95%. Discussion Melastomataceae plastomes are no exception for the general patterns observed in the genomic structure of land plant chloroplasts, being highly conserved and structurally similar to most other Myrtales. Despite the fact that the full plastome phylogeny shares most of the clades with the previously widely used and reduced data set, some changes are still observed and bootstrap support is higher. The plastome data set presented here is a step towards phylogenomic analyses in the Melastomataceae and will be

  3. Inference of population structure of purebred dairy and beef cattle using high-density genotype data.

    PubMed

    Kelleher, M M; Berry, D P; Kearney, J F; McParland, S; Buckley, F; Purfield, D C

    2017-01-01

    Information on the genetic diversity and population structure of cattle breeds is useful when deciding the most optimal, for example, crossbreeding strategies to improve phenotypic performance by exploiting heterosis. The present study investigated the genetic diversity and population structure of the most prominent dairy and beef breeds used in Ireland. Illumina high-density genotypes (777 962 single nucleotide polymorphisms; SNPs) were available on 4623 purebred bulls from nine breeds; Angus (n=430), Belgian Blue (n=298), Charolais (n=893), Hereford (n=327), Holstein-Friesian (n=1261), Jersey (n=75), Limousin (n=943), Montbéliarde (n=33) and Simmental (n=363). Principal component analysis revealed that Angus, Hereford, and Jersey formed non-overlapping clusters, representing distinct populations. In contrast, overlapping clusters suggested geographical proximity of origin and genetic similarity between Limousin, Simmental and Montbéliarde and to a lesser extent between Holstein, Friesian and Belgian Blue. The observed SNP heterozygosity averaged across all loci was 0.379. The Belgian Blue had the greatest mean observed heterozygosity (HO=0.389) among individuals within breed while the Holstein-Friesian and Jersey populations had the lowest mean heterozygosity (HO=0.370 and 0.376, respectively). The correlation between the genomic-based and pedigree-based inbreeding coefficients was weak (r=0.171; P<0.001). Mean genomic inbreeding estimates were greatest for Jersey (0.173) and least for Hereford (0.051). The pair-wise breed fixation index (F st) ranged from 0.049 (Limousin and Charolais) to 0.165 (Hereford and Jersey). In conclusion, substantial genetic variation exists among breeds commercially used in Ireland. Thus custom-mating strategies would be successful in maximising the exploitation of heterosis in crossbreeding strategies.

  4. A Genome Wide Survey of SNP Variation Reveals the Genetic Structure of Sheep Breeds

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genetic structure of sheep reflects their domestication and subsequent formation into discrete breeds. Understanding genetic structure is essential for achieving genetic improvement through genome-wide association studies, genomic selection and the dissection of quantitative traits. After identi...

  5. Inferring Cell Differentiation Processes Based on Phylogenetic Analysis of Genome-Wide Epigenetic Information: Hematopoiesis as a Model Case

    PubMed Central

    Koyanagi, Kanako O.

    2015-01-01

    How cells divide and differentiate is a fundamental question in organismal development; however, the discovery of differentiation processes in various cell types is laborious and sometimes impossible. Phylogenetic analysis is typically used to reconstruct evolutionary processes based on inherent characters. It could also be used to reconstruct developmental processes based on the developmental changes that occur during cell proliferation and differentiation. In this study, DNA methylation information from differentiated hematopoietic cells was used to perform phylogenetic analyses. The results were assessed for their validity in inferring hierarchical differentiation processes of hematopoietic cells and DNA methylation processes of differentiating progenitor cells. Overall, phylogenetic analyses based on DNA methylation information facilitated inferences regarding hematopoiesis. PMID:25638259

  6. Population structure and demographic inferences concerning the endangered onychophoran species Epiperipatus acacioi (Onychophora: Peripatidae).

    PubMed

    Lacorte, G A; Oliveira, I S; Fonseca, C G

    2011-11-09

    Epiperipatus acacioi (Onychophora: Peripatidae) is an endemic species of the Atlantic rainforest in southeastern Brazil, with a restricted known distribution, found only in two nearby areas (Tripuí and Itacolomi). Mitochondrial gene COI sequences of 93 specimens collected across the known range of E. acacioi were used to assess the extant genetic diversity and patterns of genetic structure, as well as to infer the demographic history of this species. We found considerable variability within the populations, even though there has been recent environmental disturbance in these habitats. The samples from the two areas where this species is found showed significantly different COI sequences and constitute two distinct populations [exact test of sample differentiation (P = 0.0008) and pairwise F(ST) analyses (F(ST) = 0.214, P < 0.00001)]. However, there was little genetic differentiation among samples from different sampling sites within populations, suggesting that the potential for dispersal of E. acacioi greater than would have been expected, based on their cryptic behavior and reduced vagility. Mismatch analyses and neutrality tests revealed evidence of recent population expansion processes for both populations, possibly related to variations in the past distribution of this species.

  7. Geographic population structure analysis of worldwide human populations infers their biogeographical origins

    PubMed Central

    Elhaik, Eran; Tatarinova, Tatiana; Chebotarev, Dmitri; Piras, Ignazio S.; Maria Calò, Carla; De Montis, Antonella; Atzori, Manuela; Marini, Monica; Tofanelli, Sergio; Francalacci, Paolo; Pagani, Luca; Tyler-Smith, Chris; Xue, Yali; Cucca, Francesco; Schurr, Theodore G.; Gaieski, Jill B.; Melendez, Carlalynne; Vilar, Miguel G.; Owings, Amanda C.; Gómez, Rocío; Fujita, Ricardo; Santos, Fabrício R.; Comas, David; Balanovsky, Oleg; Balanovska, Elena; Zalloua, Pierre; Soodyall, Himla; Pitchappan, Ramasamy; GaneshPrasad, ArunKumar; Hammer, Michael; Matisoo-Smith, Lisa; Wells, R. Spencer; Acosta, Oscar; Adhikarla, Syama; Adler, Christina J.; Bertranpetit, Jaume; Clarke, Andrew C.; Cooper, Alan; Der Sarkissian, Clio S. I.; Haak, Wolfgang; Haber, Marc; Jin, Li; Kaplan, Matthew E.; Li, Hui; Li, Shilin; Martínez-Cruz, Begoña; Merchant, Nirav C.; Mitchell, John R.; Parida, Laxmi; Platt, Daniel E.; Quintana-Murci, Lluis; Renfrew, Colin; Lacerda, Daniela R.; Royyuru, Ajay K.; Sandoval, Jose Raul; Santhakumari, Arun Varatharajan; Soria Hernanz, David F.; Swamikrishnan, Pandikumar; Ziegle, Janet S.

    2014-01-01

    The search for a method that utilizes biological information to predict humans’ place of origin has occupied scientists for millennia. Over the past four decades, scientists have employed genetic data in an effort to achieve this goal but with limited success. While biogeographical algorithms using next-generation sequencing data have achieved an accuracy of 700 km in Europe, they were inaccurate elsewhere. Here we describe the Geographic Population Structure (GPS) algorithm and demonstrate its accuracy with three data sets using 40,000–130,000 SNPs. GPS placed 83% of worldwide individuals in their country of origin. Applied to over 200 Sardinians villagers, GPS placed a quarter of them in their villages and most of the rest within 50 km of their villages. GPS’s accuracy and power to infer the biogeography of worldwide individuals down to their country or, in some cases, village, of origin, underscores the promise of admixture-based methods for biogeography and has ramifications for genetic ancestry testing. PMID:24781250

  8. Latitudinal structure of a Coronal Mass Ejection inferred from Ulysses and Geotail observations

    NASA Technical Reports Server (NTRS)

    Hammond, C. M.; Crawford, G. K.; Gosling, J. T.; Kojima, H.; Phillips, J. L.; Matsumoto, H.; Balogh, A.; Frank, L. A.; Kokubun, S.; Yamamoto, T.

    1995-01-01

    We present the first observations of a Coronal Mass Ejection (CME) by two spacecraft separated substantially in heliographic latitude. Ulysses and Geotail both see similar features in the plasma and magnetic field parameters during an interval in which Geotail is located in the deep magnetosheath (greater than 150 Earth radii) and Ulysses is located in the solar wind at 5 AU, approximately 20 S of Geotail, and approximately 51 W (in the direction of solar rotation) of Geotail. Based on the similarity in plasma and magnetic field parameters and similar inferred ejection times from the Sun for both features we argue that the same CME is observed by both spacecraft. The portion of the CME observed by Ulysses is traveling much faster than the portion observed by Geotail. Thus the CME has significant latitudinal structure since at any given time the high latitude portion of the CME extends much further out in radial distance. Furthermore, this implies that a simple calculation of the arrival time of a CME at the Earth may not be done if the observing spacecraft is located substantially away from the ecliptic plane.

  9. The new physician as unwitting quantum mechanic: is adapting Dirac's inference system best practice for personalized medicine, genomics, and proteomics?

    PubMed

    Robson, Barry

    2007-08-01

    What is the Best Practice for automated inference in Medical Decision Support for personalized medicine? A known system already exists as Dirac's inference system from quantum mechanics (QM) using bra-kets and bras where A and B are states, events, or measurements representing, say, clinical and biomedical rules. Dirac's system should theoretically be the universal best practice for all inference, though QM is notorious as sometimes leading to bizarre conclusions that appear not to be applicable to the macroscopic world of everyday world human experience and medical practice. It is here argued that this apparent difficulty vanishes if QM is assigned one new multiplication function @, which conserves conditionality appropriately, making QM applicable to classical inference including a quantitative form of the predicate calculus. An alternative interpretation with the same consequences is if every i = radical-1 in Dirac's QM is replaced by h, an entity distinct from 1 and i and arguably a hidden root of 1 such that h2 = 1. With that exception, this paper is thus primarily a review of the application of Dirac's system, by application of linear algebra in the complex domain to help manipulate information about associations and ontology in complicated data. Any combined bra-ket can be shown to be composed only of the sum of QM-like bra and ket weights c(), times an exponential function of Fano's mutual information measure I(A; B) about the association between A and B, that is, an association rule from data mining. With the weights and Fano measure re-expressed as expectations on finite data using Riemann's Incomplete (i.e., Generalized) Zeta Functions, actual counts of observations for real world sparse data can be readily utilized. Finally, the paper compares identical character, distinguishability of states events or measurements, correlation, mutual information, and orthogonal character, important issues in data mining

  10. Neutral Exosphere Densities and Structures at Titan Inferred from Pickup Ions Observed by CAPS

    NASA Astrophysics Data System (ADS)

    Hartle, R. E.; Sittler, E. C.; Neubauer, F. M.; Johnson, R. E.; Crary, F.; McComas, D. J.; Young, D. T.; Coates, A. J.; Simpson, D. J.; Bolton, S.; Reisenfeld, D.; Szego, K.; Berthelier, J.

    2005-12-01

    Measurements of pickup ions, born from neutral exospheres imbedded in moving plasmas, can be used to determine the composition and structure of the parent neutral exosphere constituents [1]. Pickup ions have been observed in Saturn's rotating magnetosphere near Titan by the Cassini Plasma Spectrometer (CAPS) instrument during the Cassini orbiter's recent flybys of the moon. Pickup ions observed by CAPS include H+, H2+, N+/CH2+, CH4+, and N2+. These ions slow down Saturn's magnetospheric plasma beyond Titan's ionosphere through mass loading. Because of its relatively high mass and high concentration, CH4+ is the dominant mass loading ion. The other ions make negligible contributions to the mass loading process except for N2+ just above the ionopause, where its concentration becomes important. The phase space densities of pickup ions are sensitive functions of the spatial variations of the parent exosphere gasses of the pickup ions [2]. Accounting for such variations, model phase space densities [2], derived from the Vlasov equation, are used in an algorithm to obtain ion density and velocity moments from CAPS measurements. The model implicitly maps an ions trajectory from its observation point to its source point. The analysis shows that because the gyroradius of CH4+ is much greater than the scale height of the source gas, CH4, the ion fluxes are beamlike with velocities distributed over a narrow range. The observed pickup ion velocities are found to be in ring distributions, with the light ion H+ occupying all of its allowed velocities and CH4+ only a small portion of its ring velocities. Applying the algorithm, exosphere densities are inferred. Using CAPS time-of-flight data and empirical cracking patterns, we show that the 14 amu ion is more likely N+. We compare ratios of the inferred N and CH4 exosphere densities with existing exosphere models. 1. Hartle, R. E., K. W. Ogilvie and C. S. Wu, Planet Space Sci., 21, 2181, 1973. 2. Hartle, R. E. and E. C. Sittler

  11. Deciphering the fine-structure of tribal admixture in the Bedouin population using genomic data.

    PubMed

    Markus, B; Alshafee, I; Birk, O S

    2014-02-01

    The Bedouin Israeli population is highly inbred and structured with a very high prevalence of recessive diseases. Many studies in the past two decades focused on linkage analysis in large, multiple consanguineous pedigrees of this population. The advent of high-throughput technologies motivated researchers to search for rare variants shared between smaller pedigrees, integrating data from clinically similar yet seemingly non-related sporadic cases. However, such analyses are challenging because, without pedigree data, there is no prior knowledge regarding possible relatedness between the sporadic cases. Here, we describe models and techniques for the study of relationships between pedigrees and use them for the inference of tribal co-ancestry, delineating the complex social interactions between different tribes in the Negev Bedouins of southern Israel. Through our analysis, we differentiate between tribes that share many yet small genomic segments because of co-ancestry versus tribes that share larger segments because of recent admixture. The emergent pattern is well correlated with the prevalence of rare mutations in the different tribes. Tribes that do not intermarry, mostly because of social restrictions, hold private mutations, whereas tribes that do intermarry demonstrate a genetic flow of mutations between them. Thus, social structure within an inbred community can be delineated through genomic data, with implications to genetic counseling and genetic mapping.

  12. Deciphering the fine-structure of tribal admixture in the Bedouin population using genomic data

    PubMed Central

    Markus, B; Alshafee, I; Birk, O S

    2014-01-01

    The Bedouin Israeli population is highly inbred and structured with a very high prevalence of recessive diseases. Many studies in the past two decades focused on linkage analysis in large, multiple consanguineous pedigrees of this population. The advent of high-throughput technologies motivated researchers to search for rare variants shared between smaller pedigrees, integrating data from clinically similar yet seemingly non-related sporadic cases. However, such analyses are challenging because, without pedigree data, there is no prior knowledge regarding possible relatedness between the sporadic cases. Here, we describe models and techniques for the study of relationships between pedigrees and use them for the inference of tribal co-ancestry, delineating the complex social interactions between different tribes in the Negev Bedouins of southern Israel. Through our analysis, we differentiate between tribes that share many yet small genomic segments because of co-ancestry versus tribes that share larger segments because of recent admixture. The emergent pattern is well correlated with the prevalence of rare mutations in the different tribes. Tribes that do not intermarry, mostly because of social restrictions, hold private mutations, whereas tribes that do intermarry demonstrate a genetic flow of mutations between them. Thus, social structure within an inbred community can be delineated through genomic data, with implications to genetic counseling and genetic mapping. PMID:24084643

  13. Target selection and determination of function in structural genomics.

    PubMed

    Watson, James D; Todd, Annabel E; Bray, James; Laskowski, Roman A; Edwards, Aled; Joachimiak, Andrzej; Orengo, Christine A; Thornton, Janet M

    2003-01-01

    The first crucial step in any structural genomics project is the selection and prioritization of target proteins for structure determination. There may be a number of selection criteria to be satisfied, including that the proteins have novel folds, that they be representatives of large families for which no structure is known, and so on. The better the selection at this stage, the greater is the value of the structures obtained at the end of the experimental process. This value can be further enhanced once the protein structures have been solved if the functions of the given proteins can also be determined. Here we describe the methods used at either end of the experimental process: firstly, sensitive sequence comparison techniques for selecting a high-quality list of target proteins, and secondly the various computational methods that can be applied to the eventual 3D structures to determine the most likely biochemical function of the proteins in question.

  14. Gene3D: Structural Assignment for Whole Genes and Genomes Using the CATH Domain Structure Database

    PubMed Central

    Buchan, Daniel W.A.; Shepherd, Adrian J.; Lee, David; Pearl, Frances M.G.; Rison, Stuart C.G.; Thornton, Janet M.; Orengo, Christine A.

    2002-01-01

    We present a novel web-based resource, Gene3D, of precalculated structural assignments to gene sequences and whole genomes. This resource assigns structural domains from the CATH database to whole genes and links these to their curated functional and structural annotations within the CATH domain structure database, the functional Dictionary of Homologous Superfamilies (DHS) and PDBsum. Currently Gene3D provides annotation for 36 complete genomes (two eukaryotes, six archaea, and 28 bacteria). On average, between 30% and 40% of the genes of a given genome can be structurally annotated. Matches to structural domains are found using the profile-based method (PSI-BLAST). and a novel protocol, DRange, is used to resolve conflicts in matches involving different homologous superfamilies. PMID:11875040

  15. Sequence, structure, function, immunity: structural genomics of costimulation

    PubMed Central

    Chattopadhyay, Kausik; Lazar-Molnar, Eszter; Yan, Qingrong; Rubinstein, Rotem; Zhan, Chenyang; Vigdorovich, Vladimir; Ramagopal, Udupi A.; Bonanno, Jeffrey; Nathenson, Stanley G.; Almo, Steven C.

    2010-01-01

    Summary Costimulatory receptors and ligands trigger the signaling pathways that are responsible for modulating the strength, course and duration of an immune response. High-resolution structures have provided invaluable mechanistic insights by defining the chemical and physical features underlying costimulatory receptor/ligand specificity, affinity, oligomeric state, and valency. Furthermore, these structures revealed general architectural features that are important for the integration of these interactions and their associated signaling pathways into overall cellular physiology. Recent technological advances in structural biology promise unprecedented opportunities for furthering our understanding of the structural features and mechanisms that govern costimulation. In this review we highlight unique insights that have been revealed by structures of costimulatory molecules from the immunoglobulin and tumor necrosis factor superfamilies, and describe a vision for future structural and mechanistic analysis of costimulation. This vision includes simple strategies for the selection of candidate molecules for structure determination and highlights the critical role of structure in the design of mutant costimulatory molecules for the generation of in vivo structure-function correlations in a mammalian model system. This integrated ‘atoms-to-animals’ paradigm provides a comprehensive approach for defining atomic and molecular mechanisms. PMID:19426233

  16. Elucidation of operon structures across closely related bacterial genomes.

    PubMed

    Zhou, Chuan; Ma, Qin; Li, Guojun

    2014-01-01

    About half of the protein-coding genes in prokaryotic genomes are organized into operons to facilitate co-regulation during transcription. With the evolution of genomes, operon structures are undergoing changes which could coordinate diverse gene expression patterns in response to various stimuli during the life cycle of a bacterial cell. Here we developed a graph-based model to elucidate the diversity of operon structures across a set of closely related bacterial genomes. In the constructed graph, each node represents one orthologous gene group (OGG) and a pair of nodes will be connected if any two genes, from the corresponding two OGGs respectively, are located in the same operon as immediate neighbors in any of the considered genomes. Through identifying the connected components in the above graph, we found that genes in a connected component are likely to be functionally related and these identified components tend to form treelike topology, such as paths and stars, corresponding to different biological mechanisms in transcriptional regulation as follows. Specifically, (i) a path-structure component integrates genes encoding a protein complex, such as ribosome; and (ii) a star-structure component not only groups related genes together, but also reflects the key functional roles of the central node of this component, such as the ABC transporter with a transporter permease and substrate-binding proteins surrounding it. Most interestingly, the genes from organisms with highly diverse living environments, i.e., biomass degraders and animal pathogens of clostridia in our study, can be clearly classified into different topological groups on some connected components.

  17. Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches

    SciTech Connect

    Chandonia, John-Marc; Brenner, Steven E.

    2004-07-14

    The structural genomics project is an international effort to determine the three-dimensional shapes of all important biological macromolecules, with a primary focus on proteins. Target proteins should be selected according to a strategy which is medically and biologically relevant, of good value, and tractable. As an option to consider, we present the Pfam5000 strategy, which involves selecting the 5000 most important families from the Pfam database as sources for targets. We compare the Pfam5000 strategy to several other proposed strategies that would require similar numbers of targets. These include including complete solution of several small to moderately sized bacterial proteomes, partial coverage of the human proteome, and random selection of approximately 5000 targets from sequenced genomes. We measure the impact that successful implementation of these strategies would have upon structural interpretation of the proteins in Swiss-Prot, TrEMBL, and 131 complete proteomes (including 10 of eukaryotes) from the Proteome Analysis database at EBI. Solving the structures of proteins from the 5000 largest Pfam families would allow accurate fold assignment for approximately 68 percent of all prokaryotic proteins (covering 59 percent of residues) and 61 percent of eukaryotic proteins (40 percent of residues). More fine-grained coverage which would allow accurate modeling of these proteins would require an order of magnitude more targets. The Pfam5000 strategy may be modified in several ways, for example to focus on larger families, bacterial sequences, or eukaryotic sequences; as long as secondary consideration is given to large families within Pfam, coverage results vary only slightly. In contrast, focusing structural genomics on a single tractable genome would have only a limited impact in structural knowledge of other proteomes: a significant fraction (about 30-40 percent of the proteins, and 40-60 percent of the residues) of each proteome is classified in small

  18. Diverse mechanisms of somatic structural variations in human cancer genomes

    PubMed Central

    Yang, Lixing; Luquette, Lovelace J.; Gehlenborg, Nils; Xi, Ruibin; Haseley, Psalm S.; Hsieh, Chih-Heng; Zhang, Chengsheng; Ren, Xiaojia; Protopopov, Alexei; Chin, Lynda; Kucherlapati, Raju; Lee, Charles; Park, Peter J.

    2013-01-01

    Summary Identification of somatic rearrangements in cancer genomes has accelerated through analysis of high-throughput sequencing data. However, characterization of complex structural alterations and their underlying mechanisms remains inadequate. Here, applying an algorithm to predict structural variations from short reads, we report a comprehensive catalog of somatic structural variations and the mechanisms generating them, using high-coverage whole-genome sequencing data from 140 patients across ten tumor types. We characterize the relative contributions of different types of rearrangements and their mutational mechanisms, find that ~20% of the somatic deletions are complex deletions formed by replication errors, and describe the differences between the mutational mechanisms in somatic and germline alterations. Importantly, we provide detailed reconstructions of the events responsible for loss of CDKN2A/B and gain of EGFR in glioblastoma, revealing that these alterations can result from multiple mechanisms even in a single genome and that both DNA double-strand breaks and replication errors drive somatic rearrangements. PMID:23663786

  19. Mitochondrial Genome Sequences and Structures Aid in the Resolution of Piroplasmida phylogeny

    PubMed Central

    Marr, Henry S.; Tarigo, Jaime L.; Cohn, Leah A.; Bird, David M.; Scholl, Elizabeth H.; Levy, Michael G.; Wiegmann, Brian M.; Birkenheuer, Adam J.

    2016-01-01

    The taxonomy of the order Piroplasmida, which includes a number of clinically and economically relevant organisms, is a hotly debated topic amongst parasitologists. Three genera (Babesia, Theileria, and Cytauxzoon) are recognized based on parasite life cycle characteristics, but molecular phylogenetic analyses of 18S sequences have suggested the presence of five or more distinct Piroplasmida lineages. Despite these important advancements, a few studies have been unable to define the taxonomic relationships of some organisms (e.g. C. felis and T. equi) with respect to other Piroplasmida. Additional evidence from mitochondrial genome sequences and synteny should aid in the inference of Piroplasmida phylogeny and resolution of taxonomic uncertainties. In this study, we have amplified, sequenced, and annotated seven previously uncharacterized mitochondrial genomes (Babesia canis, Babesia vogeli, Babesia rossi, Babesia sp. Coco, Babesia conradae, Babesia microti-like sp., and Cytauxzoon felis) and identified additional ribosomal fragments in ten previously characterized mitochondrial genomes. Phylogenetic analysis of concatenated mitochondrial and 18S sequences as well as cox1 amino acid sequence identified five distinct Piroplasmida groups, each of which possesses a unique mitochondrial genome structure. Specifically, our results confirm the existence of four previously identified clades (B. microti group, Babesia sensu stricto, Theileria equi, and a Babesia sensu latu group that includes B. conradae) while supporting the integration of Theileria and Cytauxzoon species into a single fifth taxon. Although known biological characteristics of Piroplasmida corroborate the proposed phylogeny, more investigation into parasite life cycles is warranted to further understand the evolution of the Piroplasmida. Our results provide an evolutionary framework for comparative biology of these important animal and human pathogens and help focus renewed efforts toward understanding the

  20. Action starring narratives and events: Structure and inference in visual narrative comprehension

    PubMed Central

    Cohn, Neil; Wittenberg, Eva

    2015-01-01

    Studies of discourse have long placed focus on the inference generated by information that is not overtly expressed, and theories of visual narrative comprehension similarly focused on the inference generated between juxtaposed panels. Within the visual language of comics, star-shaped “flashes” commonly signify impacts, but can be enlarged to the size of a whole panel that can omit all other representational information. These “action star” panels depict a narrative culmination (a “Peak”), but have content which readers must infer, thereby posing a challenge to theories of inference generation in visual narratives that focus only on the semantic changes between juxtaposed images. This paper shows that action stars demand more inference than depicted events, and that they are more coherent in narrative sequences than scrambled sequences (Experiment 1). In addition, action stars play a felicitous narrative role in the sequence (Experiment 2). Together, these results suggest that visual narratives use conventionalized depictions that demand the generation of inferences while retaining narrative coherence of a visual sequence. PMID:26709362

  1. Iterative ACORN as a high throughput tool in structural genomics.

    PubMed

    Selvanayagam, S; Velmurugan, D; Yamane, T

    2006-08-01

    High throughput macromolecular structure determination is very essential in structural genomics as the available number of sequence information far exceeds the number of available 3D structures. ACORN, a freely available resource in the CCP4 suite of programs is a comprehensive and efficient program for phasing in the determination of protein structures, when atomic resolution data are available. ACORN with the automatic model-building program ARP/wARP and refinement program REFMAC is a suitable combination for the high throughput structural genomics. ACORN can also be run with secondary structural elements like helices and sheets as inputs with high resolution data. In situations, where ACORN phasing is not sufficient for building the protein model, the fragments (incomplete model/dummy atoms) can again be used as a starting input. Iterative ACORN is proved to work efficiently in the subsequent model building stages in congerin (PDB-ID: lis3) and catalase (PDB-ID: 1gwe) for which models are available.

  2. Structure and sequence of the saimiriine herpesvirus 1 genome.

    PubMed

    Tyler, Shaun; Severini, Alberto; Black, Darla; Walker, Matthew; Eberle, R

    2011-02-05

    We report here the complete genome sequence of the squirrel monkey α-herpesvirus saimiriine herpesvirus 1 (HVS1). Unlike the simplexviruses of other primate species, only the unique short region of the HVS1 genome is bounded by inverted repeats. While all Old World simian simplexviruses characterized to date lack the herpes simplex virus RL1 (γ34.5) gene, HVS1 has an RL1 gene. HVS1 lacks several genes that are present in other primate simplexviruses (US8.5, US10-12, UL43/43.5 and UL49A). Although the overall genome structure appears more like that of varicelloviruses, the encoded HVS1 proteins are most closely related to homologous proteins of the primate simplexviruses. Phylogenetic analyses confirm that HVS1 is a simplexvirus. Limited comparison of two HVS1 strains revealed a very low degree of sequence variation more typical of varicelloviruses. HVS1 is thus unique among the primate α-herpesviruses in that its genome has properties of both simplexviruses and varicelloviruses.

  3. Structural divergence between the human and chimpanzee genomes.

    PubMed

    Kehrer-Sawatzki, Hildegard; Cooper, David N

    2007-02-01

    The structural microheterogeneity evident between the human and chimpanzee genomes is quite considerable and includes inversions and duplications as well as deletions, ranging in size from a few base-pairs up to several megabases (Mb). Insertions and deletions have together given rise to at least 150 Mb of genomic DNA sequence that is either present or absent in humans as compared to chimpanzees. Such regions often contain paralogous sequences and members of multigene families thereby ensuring that the human and chimpanzee genomes differ by a significant fraction of their gene content. There is as yet no evidence to suggest that the large chromosomal rearrangements which serve to distinguish the human and chimpanzee karyotypes have influenced either speciation or the evolution of lineage-specific traits. However, the myriad submicroscopic rearrangements in both genomes, particularly those involving copy number variation, are unlikely to represent exclusively neutral changes and hence promise to facilitate the identification of genes that have been important for human-specific evolution.

  4. The TB Structural Genomics Consortium: A decade of progress

    PubMed Central

    Chim, Nicholas; Habel, Jeff E.; Johnston, Jodie M.; Krieger, Inna; Miallau, Linda; Sankaranarayanan, Ramasamy; Morse, Robert P.; Bruning, John; Swanson, Stephanie; Kim, Haelee; Kim, Chang-Yub; Li, Hongye; Bulloch, Esther M.; Payne, Richard J.; Manos-Turvey, Alexandra; Hung, Li-Wei; Baker, Edward N.; Lott, J. Shaun; James, Michael N.G.; Terwilliger, Thomas C.; Eisenberg, David S.; Sacchettini, James C.; Goulding, Celia W.

    2012-01-01

    Summary The TB Structural Genomics Consortium is a worldwide organization of collaborators whose mission is the comprehensive structural determination and analyses of Mycobacterium tuberculosis proteins to ultimately aid in tuberculosis diagnosis and treatment. Congruent to the overall vision, Consortium members have additionally established an integrated facilities core to streamline M. tuberculosis structural biology and developed bioinformatics resources for data mining. This review aims to share the latest Consortium developments with the TB community, including recent structures of proteins that play significant roles within M. tuberculosis. Atomic resolution details may unravel mechanistic insights and reveal unique and novel protein features, as well as important protein-protein and protein-ligand interactions, which ultimately leads to a better understanding of M. tuberculosis biology and may be exploited for rational, structure-based therapeutics design. PMID:21247804

  5. Introgression and phenotypic assimilation in Zimmerius flycatchers (Tyrannidae): population genetic and phylogenetic inferences from genome-wide SNPs.

    PubMed

    Rheindt, Frank E; Fujita, Matthew K; Wilton, Peter R; Edwards, Scott V

    2014-03-01

    Genetic introgression is pervasive in nature and may lead to large-scale phenotypic assimilation and/or admixture of populations, but there is limited knowledge on whether large phenotypic changes are typically accompanied by high levels of introgression throughout the genome. Using bioacoustic, biometric, and spectrophotometric data from a flycatcher (Tyrannidae) system in the Neotropical genus Zimmerius, we document a mosaic pattern of phenotypic admixture in which a population of Zimmerius viridiflavus in northern Peru (henceforth "mosaic") is vocally and biometrically similar to conspecifics to the south but shares plumage characteristics with a different species (Zimmerius chrysops) to the north. To clarify the origins of the mosaic population, we used the RAD-seq approach to generate a data set of 37,361 genome-wide single nucleotide polymorphisms (SNPs). A range of population-genetic diagnostics shows that the genome of the mosaic population is largely indistinguishable from southern Z. viridiflavus and distinct from northern Z. chrysops, and the application of parsimony and species tree methods to the genome-wide SNP data set confirms the close affinity of the mosaic population with southern Z. viridiflavus. Even so, using a subset of 2710 SNPs found across all sampled lineages in configurations appropriate for a recently proposed statistical ("ABBA/BABA") test that distinguishes gene flow from incomplete lineage sorting, we detected low levels of gene flow from northern Z. chrysops into the mosaic population. Mapping the candidate loci for introgression from Z. chrysops into the mosaic population to the zebra finch genome reveals close linkage with genes significantly enriched in functions involving cell projection and plasma membranes. Introgression of key alleles may have led to phenotypic assimilation in the plumage of mosaic birds, suggesting that selection may have been a key factor facilitating introgression.

  6. Population genetic structure of the cotton bollworm Helicoverpa armigera (Hübner) (Lepidoptera: Noctuidae) in India as inferred from EPIC-PCR DNA markers.

    PubMed

    Behere, Gajanan Tryambak; Tay, Wee Tek; Russell, Derek Alan; Kranthi, Keshav Raj; Batterham, Philip

    2013-01-01

    Helicoverpa armigera is an important pest of cotton and other agricultural crops in the Old World. Its wide host range, high mobility and fecundity, and the ability to adapt and develop resistance against all common groups of insecticides used for its management have exacerbated its pest status. An understanding of the population genetic structure in H. armigera under Indian agricultural conditions will help ascertain gene flow patterns across different agricultural zones. This study inferred the population genetic structure of Indian H. armigera using five Exon-Primed Intron-Crossing (EPIC)-PCR markers. Nested alternative EPIC markers detected moderate null allele frequencies (4.3% to 9.4%) in loci used to infer population genetic structure but the apparently genome-wide heterozygote deficit suggests in-breeding or a Wahlund effect rather than a null allele effect. Population genetic analysis of the 26 populations suggested significant genetic differentiation within India but especially in cotton-feeding populations in the 2006-07 cropping season. In contrast, overall pair-wise F(ST) estimates from populations feeding on food crops indicated no significant population substructure irrespective of cropping seasons. A Baysian cluster analysis was used to assign the genetic make-up of individuals to likely membership of population clusters. Some evidence was found for four major clusters with individuals in two populations from cotton in one year (from two populations in northern India) showing especially high homogeneity. Taken as a whole, this study found evidence of population substructure at host crop, temporal and spatial levels in Indian H. armigera, without, however, a clear biological rationale for these structures being evident.

  7. Northern Bobwhite (Colinus virginianus) Mitochondrial Population Genomics Reveals Structure, Divergence, and Evidence for Heteroplasmy.

    PubMed

    Halley, Yvette A; Oldeschulte, David L; Bhattarai, Eric K; Hill, Joshua; Metz, Richard P; Johnson, Charles D; Presley, Steven M; Ruzicka, Rebekah E; Rollins, Dale; Peterson, Markus J; Murphy, William J; Seabury, Christopher M

    2015-01-01

    Herein, we evaluated the concordance of population inferences and conclusions resulting from the analysis of short mitochondrial fragments (i.e., partial or complete D-Loop nucleotide sequences) versus complete mitogenome sequences for 53 bobwhites representing six ecoregions across TX and OK (USA). Median joining (MJ) haplotype networks demonstrated that analyses performed using small mitochondrial fragments were insufficient for estimating the true (i.e., complete) mitogenome haplotype structure, corresponding levels of divergence, and maternal population history of our samples. Notably, discordant demographic inferences were observed when mismatch distributions of partial (i.e., partial D-Loop) versus complete mitogenome sequences were compared, with the reduction in mitochondrial genomic information content observed to encourage spurious inferences in our samples. A probabilistic approach to variant prediction for the complete bobwhite mitogenomes revealed 344 segregating sites corresponding to 347 total mutations, including 49 putative nonsynonymous single nucleotide variants (SNVs) distributed across 12 protein coding genes. Evidence of gross heteroplasmy was observed for 13 bobwhites, with 10 of the 13 heteroplasmies involving one moderate to high frequency SNV. Haplotype network and phylogenetic analyses for the complete bobwhite mitogenome sequences revealed two divergent maternal lineages (dXY = 0.00731; FST = 0.849; P < 0.05), thereby supporting the potential for two putative subspecies. However, the diverged lineage (n = 103 variants) almost exclusively involved bobwhites geographically classified as Colinus virginianus texanus, which is discordant with the expectations of previous geographic subspecies designations. Tests of adaptive evolution for functional divergence (MKT), frequency distribution tests (D, FS) and phylogenetic analyses (RAxML) provide no evidence for positive selection or hybridization with the sympatric scaled quail (Callipepla

  8. Northern Bobwhite (Colinus virginianus) Mitochondrial Population Genomics Reveals Structure, Divergence, and Evidence for Heteroplasmy

    PubMed Central

    Halley, Yvette A.; Oldeschulte, David L.; Bhattarai, Eric K.; Hill, Joshua; Metz, Richard P.; Johnson, Charles D.; Presley, Steven M.; Ruzicka, Rebekah E.; Rollins, Dale; Peterson, Markus J.; Murphy, William J.; Seabury, Christopher M.

    2015-01-01

    Herein, we evaluated the concordance of population inferences and conclusions resulting from the analysis of short mitochondrial fragments (i.e., partial or complete D-Loop nucleotide sequences) versus complete mitogenome sequences for 53 bobwhites representing six ecoregions across TX and OK (USA). Median joining (MJ) haplotype networks demonstrated that analyses performed using small mitochondrial fragments were insufficient for estimating the true (i.e., complete) mitogenome haplotype structure, corresponding levels of divergence, and maternal population history of our samples. Notably, discordant demographic inferences were observed when mismatch distributions of partial (i.e., partial D-Loop) versus complete mitogenome sequences were compared, with the reduction in mitochondrial genomic information content observed to encourage spurious inferences in our samples. A probabilistic approach to variant prediction for the complete bobwhite mitogenomes revealed 344 segregating sites corresponding to 347 total mutations, including 49 putative nonsynonymous single nucleotide variants (SNVs) distributed across 12 protein coding genes. Evidence of gross heteroplasmy was observed for 13 bobwhites, with 10 of the 13 heteroplasmies involving one moderate to high frequency SNV. Haplotype network and phylogenetic analyses for the complete bobwhite mitogenome sequences revealed two divergent maternal lineages (dXY = 0.00731; FST = 0.849; P < 0.05), thereby supporting the potential for two putative subspecies. However, the diverged lineage (n = 103 variants) almost exclusively involved bobwhites geographically classified as Colinus virginianus texanus, which is discordant with the expectations of previous geographic subspecies designations. Tests of adaptive evolution for functional divergence (MKT), frequency distribution tests (D, FS) and phylogenetic analyses (RAxML) provide no evidence for positive selection or hybridization with the sympatric scaled quail (Callipepla

  9. Analysis of correlation structures in the Synechocystis PCC6803 genome.

    PubMed

    Wu, Zuo-Bing

    2014-12-01

    Transfer of nucleotide strings in the Synechocystis sp. PCC6803 genome is investigated to exhibit periodic and non-periodic correlation structures by using the recurrence plot method and the phase space reconstruction technique. The periodic correlation structures are generated by periodic transfer of several substrings in long periodic or non-periodic nucleotide strings embedded in the coding regions of genes. The non-periodic correlation structures are generated by non-periodic transfer of several substrings covering or overlapping with the coding regions of genes. In the periodic and non-periodic transfer, some gaps divide the long nucleotide strings into the substrings and prevent their global transfer. Most of the gaps are either the replacement of one base or the insertion/reduction of one base. In the reconstructed phase space, the points generated from two or three steps for the continuous iterative transfer via the second maximal distance can be fitted by two lines. It partly reveals an intrinsic dynamics in the transfer of nucleotide strings. Due to the comparison of the relative positions and lengths, the substrings concerned with the non-periodic correlation structures are almost identical to the mobile elements annotated in the genome. The mobile elements are thus endowed with the basic results on the correlation structures.

  10. RNA structures, genomic organization and selection of recombinant HIV.

    PubMed

    Simon-Loriere, Etienne; Rossolillo, Paola; Negroni, Matteo

    2011-01-01

    Recombination is an evolutionary mechanism intrinsic to the evolution of many RNA viruses. In retroviruses and notably in the case of HIV, recombination is so frequent that it can be considered as part of its mode of replication. This process not only plays a central role in shaping HIV genetic diversity worldwide, but has also been involved in immune escape and development of resistance to antiviral treatments. Recombination does not create new mutations in the existing genetic repertoire of the virus, but creates new combinations of pre-existing polymorphisms. The simultaneous insertion of multiple substitutions in a single replication cycle leaves little room for the progressive coevolution of regions of proteins, RNA or, more in general, genomes, to accommodate these drastic sequence changes. Therefore, recombination, while allowing the virus to rapidly explore larger sequence space than the slow accumulation of point mutations, also runs the risk of generating non functional viruses. Recombination is the consequence of a switch in the template used during reverse transcription and is promoted by the presence of structured regions in the genomic RNA template. In this review, we discuss new observations suggesting that the distribution of RNA structures along the HIV genome may enhance recombination rates in regions where the resultant progeny is less likely to be impaired, and could therefore maximize the evolutionary value of this source of genetic diversity.

  11. The phylogenetic position of eriophyoid mites (superfamily Eriophyoidea) in Acariformes inferred from the sequences of mitochondrial genomes and nuclear small subunit (18S) rRNA gene.

    PubMed

    Xue, Xiao-Feng; Dong, Yan; Deng, Wei; Hong, Xiao-Yue; Shao, Renfu

    2017-04-01

    Eriophyoid mites (superfamily Eriophyoidea) comprise >4400 species worldwide. Despite over a century of study, the phylogenetic position of these mites within Acariformes is still poorly resolved. Currently, Eriophyoidea is placed in the order Trombidiformes. We inferred the high-level phylogeny of Acari with the mitochondrial (mt) genome sequences of 110 species including four eriophyoid species, and the nuclear small subunit (18S) rRNA gene sequences of 226 species including 25 eriophyoid species. Maximum likelihood (ML), Bayesian inference (BI) and Maximum parsimony (MP) methods were used to analyze the sequence data. Divergence times were estimated for major lineages of Acari using Bayesian approaches. Our analyses consistently recovered the monophyly of Eriophyoidea but rejected the monophyly of Trombidiformes. The eriophyoid mites were grouped with the sarcoptiform mites, or were the sister group of sarcoptiform mites+non-eriophyoid trombidiform mites, depending on data partition strategies. Eriophyoid mites diverged from other mites in the Devonian (384Mya, 95% HPD, 352-410Mya). The origin of eriophyoid mites was dated to the Permian (262Mya, 95% HPD 230-307Mya), mostly prior to the radiation of gymnosperms (Triassic-Jurassic) and angiosperms (early Cretaceous). We propose that the placement of Eriophyoidea in the order Trombidiformes under the current classification system should be reviewed.

  12. Evidence of pervasive biologically functional secondary structures within the genomes of eukaryotic single-stranded DNA viruses.

    PubMed

    Muhire, Brejnev Muhizi; Golden, Michael; Murrell, Ben; Lefeuvre, Pierre; Lett, Jean-Michel; Gray, Alistair; Poon, Art Y F; Ngandu, Nobubelo Kwanele; Semegni, Yves; Tanov, Emil Pavlov; Monjane, Adérito Luis; Harkins, Gordon William; Varsani, Arvind; Shepherd, Dionne Natalie; Martin, Darren Patrick

    2014-02-01

    Single-stranded DNA (ssDNA) viruses have genomes that are potentially capable of forming complex secondary structures through Watson-Crick base pairing between their constituent nucleotides. A few of the structural elements formed by such base pairings are, in fact, known to have important functions during the replication of many ssDNA viruses. Unknown, however, are (i) whether numerous additional ssDNA virus genomic structural elements predicted to exist by computational DNA folding methods actually exist and (ii) whether those structures that do exist have any biological relevance. We therefore computationally inferred lists of the most evolutionarily conserved structures within a diverse selection of animal- and plant-infecting ssDNA viruses drawn from the families Circoviridae, Anelloviridae, Parvoviridae, Nanoviridae, and Geminiviridae and analyzed these for evidence of natural selection favoring the maintenance of these structures. While we find evidence that is consistent with purifying selection being stronger at nucleotide sites that are predicted to be base paired than at sites predicted to be unpaired, we also find strong associations between sites that are predicted to pair with one another and site pairs that are apparently coevolving in a complementary fashion. Collectively, these results indicate that natural selection actively preserves much of the pervasive secondary structure that is evident within eukaryote-infecting ssDNA virus genomes and, therefore, that much of this structure is biologically functional. Lastly, we provide examples of various highly conserved but completely uncharacterized structural elements that likely have important functions within some of the ssDNA virus genomes analyzed here.

  13. TSTMP: target selection for structural genomics of human transmembrane proteins

    PubMed Central

    Varga, Julia; Dobson, László; Reményi, István; Tusnády, Gábor E.

    2017-01-01

    The TSTMP database is designed to help the target selection of human transmembrane proteins for structural genomics projects and structure modeling studies. Currently, there are only 60 known 3D structures among the polytopic human transmembrane proteins and about a further 600 could be modeled using existing structures. Although there are a great number of human transmembrane protein structures left to be determined, surprisingly only a small fraction of these proteins have ‘selected’ (or above) status according to the current version the TargetDB/TargetTrack database. This figure is even worse regarding those transmembrane proteins that would contribute the most to the structural coverage of the human transmembrane proteome. The database was built by sorting out proteins from the human transmembrane proteome with known structure and searching for suitable model structures for the remaining proteins by combining the results of a state-of-the-art transmembrane specific fold recognition algorithm and a sequence similarity search algorithm. Proteins were searched for homologues among the human transmembrane proteins in order to select targets whose successful structure determination would lead to the best structural coverage of the human transmembrane proteome. The pipeline constructed for creating the TSTMP database guarantees to keep the database up-to-date. The database is available at http://tstmp.enzim.ttk.mta.hu. PMID:27924015

  14. Polytene Chromosomal Maps of 11 Drosophila Species: The Order of Genomic Scaffolds Inferred From Genetic and Physical Maps

    PubMed Central

    Schaeffer, Stephen W.; Bhutkar, Arjun; McAllister, Bryant F.; Matsuda, Muneo; Matzkin, Luciano M.; O'Grady, Patrick M.; Rohde, Claudia; Valente, Vera L. S.; Aguadé, Montserrat; Anderson, Wyatt W.; Edwards, Kevin; Garcia, Ana C. L.; Goodman, Josh; Hartigan, James; Kataoka, Eiko; Lapoint, Richard T.; Lozovsky, Elena R.; Machado, Carlos A.; Noor, Mohamed A. F.; Papaceit, Montserrat; Reed, Laura K.; Richards, Stephen; Rieger, Tania T.; Russo, Susan M.; Sato, Hajime; Segarra, Carmen; Smith, Douglas R.; Smith, Temple F.; Strelets, Victor; Tobari, Yoshiko N.; Tomimura, Yoshihiko; Wasserman, Marvin; Watts, Thomas; Wilson, Robert; Yoshida, Kiyohito; Markow, Therese A.; Gelbart, William M.; Kaufman, Thomas C.

    2008-01-01

    The sequencing of the 12 genomes of members of the genus Drosophila was taken as an opportunity to reevaluate the genetic and physical maps for 11 of the species, in part to aid in the mapping of assembled scaffolds. Here, we present an overview of the importance of cytogenetic maps to Drosophila biology and to the concepts of chromosomal evolution. Physical and genetic markers were used to anchor the genome assembly scaffolds to the polytene chromosomal maps for each species. In addition, a computational approach was used to anchor smaller scaffolds on the basis of the analysis of syntenic blocks. We present the chromosomal map data from each of the 11 sequenced non-Drosophila melanogaster species as a series of sections. Each section reviews the history of the polytene chromosome maps for each species, presents the new polytene chromosome maps, and anchors the genomic scaffolds to the cytological maps using genetic and physical markers. The mapping data agree with Muller's idea that the majority of Drosophila genes are syntenic. Despite the conservation of genes within homologous chromosome arms across species, the karyotypes of these species have changed through the fusion of chromosomal arms followed by subsequent rearrangement events. PMID:18622037

  15. Refolding strategies from inclusion bodies in a structural genomics project.

    PubMed

    Trésaugues, Lionel; Collinet, Bruno; Minard, Philippe; Henckes, Gilles; Aufrère, Robert; Blondeau, Karine; Liger, Dominique; Zhou, Cong-Zhao; Janin, Joël; Van Tilbeurgh, Herman; Quevillon-Cheruel, Sophie

    2004-01-01

    The South-Paris Yeast Structural Genomics Project aims at systematically expressing, purifying and determining the structure of S. cerevisiae proteins with no detectable homology to proteins of known structure. We brought 250 yeast ORFs to expression in E. coli, but 37% of them form inclusion bodies. This important fraction of proteins that are well expressed but lost for structural studies prompted us to test methodologies to recover these proteins. Three different strategies were explored in parallel on a set of 20 proteins: (1) refolding from solubilized inclusion bodies using an original and fast 96-well plates screening test, (2) co-expression of the targets in E. coli with DnaK-DnaJ-GrpE and GroEL-GroES chaperones, and (3) use of the cell-free expression system. Most of the tested proteins (17/20) could be resolubilized at least by one approach, but the subsequent purification proved to be difficult for most of them.

  16. X-ray scattering data and structural genomics

    NASA Astrophysics Data System (ADS)

    Doniach, Sebastian

    2003-03-01

    High throughput structural genomics has the ambitious goal of determining the structure of all, or a very large number of protein folds using the high-resolution techniques of protein crystallography and NMR. However, the program is facing significant bottlenecks in reaching this goal, which include problems of protein expression and crystallization. In this talk, some preliminary results on how the low-resolution technique of small-angle X-ray solution scattering (SAXS) can help ameliorate some of these bottlenecks will be presented. One of the most significant bottlenecks arises from the difficulty of crystallizing integral membrane proteins, where only a handful of structures are available compared to thousands of structures for soluble proteins. By 3-dimensional reconstruction from SAXS data, the size and shape of detergent-solubilized integral membrane proteins can be characterized. This information can then be used to classify membrane proteins which constitute some 25% of all genomes. SAXS may also be used to study the dependence of interparticle interference scattering on solvent conditions so that regions of the protein solution phase diagram which favor crystallization can be elucidated. As a further application, SAXS may be used to provide physical constraints on computational methods for protein structure prediction based on primary sequence information. This in turn can help in identifying structural homologs of a given protein, which can then give clues to its function. D. Walther, F. Cohen and S. Doniach. "Reconstruction of low resolution three-dimensional density maps from one-dimensional small angle x-ray scattering data for biomolecules." J. Appl. Cryst. 33(2):350-363 (2000). Protein structure prediction constrained by solution X-ray scattering data and structural homology identification Zheng WJ, Doniach S JOURNAL OF MOLECULAR BIOLOGY , v. 316(#1) pp. 173-187 FEB 8, 2002

  17. Structural Genomics: From Genes to Structures With Valuable Materials And Many Questions in Between

    SciTech Connect

    Fox, B.G.; Goulding, C.; Malkowski, M.G.; Stewart, L.; Deacon, A.; /SLAC, SSRL

    2009-04-30

    The Protein Structure Initiative (PSI), funded by the US National Institutes of Health (NIH), provides a framework for the development and systematic evaluation of methods to solve protein structures. Although the PSI and other structural genomics efforts around the world have led to the solution of many new protein structures as well as the development of new methods, methodological bottlenecks still exist and are being addressed in this 'production phase' of PSI.

  18. USING CORONAL CELLS TO INFER THE MAGNETIC FIELD STRUCTURE AND CHIRALITY OF FILAMENT CHANNELS

    SciTech Connect

    Sheeley, N. R. Jr.; Warren, H. P.; Martin, S. F.; Panasenco, O.

    2013-08-01

    Coronal cells are visible at temperatures of {approx}1.2 MK in Fe XII coronal images obtained from the Solar Dynamics Observatory and Solar Terrestrial Relations Observatory spacecraft. We show that near a filament channel, the plumelike tails of these cells bend horizontally in opposite directions on the two sides of the channel like fibrils in the chromosphere. Because the cells are rooted in magnetic flux concentrations of majority polarity, these observations can be used with photospheric magnetograms to infer the direction of the horizontal field in filament channels and the chirality of the associated magnetic field. This method is similar to the procedure for inferring the direction of the magnetic field and the chirality of the fibril pattern in filament channels from H{alpha} observations. However, the coronal cell observations are easier to use and provide clear inferences of the horizontal field direction for heights up to {approx}50 Mm into the corona.

  19. Oceanic Domains - Observed Relationship With Tomographic Features and Inferred Mantle Structure

    NASA Astrophysics Data System (ADS)

    Loubet, M.

    A persistent contradiction exists between the current views of mantle stratification derived from geochemistry and number of geophysical and simulations which sug- gest the existence of a significant material exchange throughout the entire mantle and favor mixing processes. In this presentation, we will show that the common interpre- tation of oceanic basalt heterogeneities can be contested and that a new interpretation of these heterogeneities can be done which leads to interesting relationships between geochemical and geophysical (tomographic) features. The new approach is based on (a) identification of mantle heterogeneities at the scale of oceanic domains recovering in some cases MORB and OIB basalt types and (b) use of incompatible element ratios in (Cx/Cz,Cy/Cz) representations as in particular the (Th/La,Nb/La) representation. This last representation is very interesting for identification of magmatic processes and for estimating magma sources compositions. Analysis of oceanic basalts compo- sitions based on a large set of literature data leads to identify 4 (eventually 5) large scale oceanic domains: Atlantic East Pacific (AEP), Indian ocean (IO), South Central Pacific (SCP), Kerguelen South Atlantic (KSA) (and eventually Hawaï (H)). The two first ones which include MORB sources extend at upper mantle levels. The good geo- graphical recovery of the SCP and KSA domains with tomographic features assigned to take place within the mantle at the D" level in the Central Pacific and South Africa (Masters et al., 2000) leads to interpret the basalts from the KSA and SCP domains as issued from D" layer source. Two different mantle structures (general ones before discussing more complex ones), both comprising a D" layer (composed of recycled oceanic crust enriched materials) at the CMB, can be inferred from these oceanic basalt source interpretations: (a) a layered mantle with an upper and a lower mantle with primitive mantle material composing a significant part of

  20. The Seattle Structural Genomics Center for Infectious Disease (SSGCID)

    PubMed Central

    Myler, P.J.; Stacy, R.; Stewart, L.; Staker, B.L.; Van Voorhis, W.C.; Varani, G.; Buchko, G.W.

    2010-01-01

    The NIAID-funded Seattle Structural Genomics Center for Infectious Disease (SSGCID) is a consortium established to apply structural genomics approaches to potential drug targets from NIAID priority organisms for biodefense and emerging and re-emerging diseases. The mission of the SSGCID is to determine ~400 protein structures over five years ending in 2012. In order to maximize biomedical impact, ligand-based drug-lead discovery campaigns will be pursued for a small number of high-impact targets. Here we review the center’s target selection processes, which include pro-active engagement of the infectious disease research and drug therapy communities to identify drug targets, essential enzymes, virulence factors and vaccine candidates of biomedical relevance to combat infectious diseases. This is followed by a brief overview of the SSGCID structure determination pipeline and ligand screening methodology. Finally, specifics of our resources available to the scientific community are presented. Physical materials and data produced by SSGCID will be made available to the scientific community, with the aim that they will provide essential groundwork benefiting future research and drug discovery. PMID:19594426

  1. The evolution of chloroplast genome structure in ferns.

    PubMed

    Wolf, Paul G; Roper, Jessie M; Duffy, Aaron M

    2010-09-01

    The plastid genome (plastome) is a rich source of phylogenetic and other comparative data in plants. Most land plants possess a plastome of similar structure. However, in a major group of plants, the ferns, a unique plastome structure has evolved. The gene order in ferns has been explained by a series of genomic inversions relative to the plastome organization of seed plants. Here, we examine for the first time the structure of the plastome across fern phylogeny. We used a PCR-based strategy to map and partially sequence plastomes. We found that a pair of partially overlapping inversions in the region of the inverted repeat occurred in the common ancestor of most ferns. However, the ancestral (seed plant) structure is still found in early diverging branches leading to the osmundoid and filmy fern lineages. We found that a second pair of overlapping inversions occurred on a branch leading to the core leptosporangiates. We also found that the unique placement of the gene matK in ferns (lacking a flanking intron) is not a result of a large-scale inversion, as previously thought. This is because the intron loss maps to an earlier point on the phylogeny than the nearby inversion. We speculate on why inversions may occur in pairs and what this may mean for the dynamics of plastome evolution.

  2. Target selection and deselection at the Berkeley Structural Genomics Center.

    PubMed

    Chandonia, John-Marc; Kim, Sung-Hou; Brenner, Steven E

    2006-02-01

    At the Berkeley Structural Genomics Center (BSGC), our goal is to obtain a near-complete structural complement of proteins in the minimal organisms Mycoplasma genitalium and M. pneumoniae, two closely related pathogens. Current targets for structure determination have been selected in six major stages, starting with those predicted to be most tractable to high throughput study and likely to yield new structural information. We report on the process used to select these proteins, as well as our target deselection procedure. Target deselection reduces experimental effort by eliminating targets similar to those recently solved by the structural biology community or other centers. We measure the impact of the 69 structures solved at the BSGC as of July 2004 on structure prediction coverage of the M. pneumoniae and M. genitalium proteomes. The number of Mycoplasma proteins for which the fold could first be reliably assigned based on structures solved at the BSGC (24 M. pneumoniae and 21 M. genitalium) is approximately 25% of the total resulting from work at all structural genomics centers and the worldwide structural biology community (94 M. pneumoniae and 86 M. genitalium) during the same period. As the number of structures contributed by the BSGC during that period is less than 1% of the total worldwide output, the benefits of a focused target selection strategy are apparent. If the structures of all current targets were solved, the percentage of M. pneumoniae proteins for which folds could be reliably assigned would increase from approximately 57% (391 of 687) at present to around 80% (550 of 687), and the percentage of the proteome that could be accurately modeled would increase from around 37% (254 of 687) to about 64% (438 of 687). In M. genitalium, the percentage of the proteome that could be structurally annotated based on structures of our remaining targets would rise from 72% (348 of 486) to around 76% (371 of 486), with the percentage of accurately modeled

  3. The Impact of Structural Genomics: the First Quindecennial

    PubMed Central

    Grabowski, Marek; Niedzialkowska, Ewa; Zimmerman, Matthew D.; Minor, Wladek

    2016-01-01

    The period 2000–2015 brought the advent of high-throughput approaches to protein structure determination. With the overall funding on the order of $2 billion (in 2010 dollars), the structural genomics (SG) consortia established worldwide have developed pipelines for target selection, protein production, sample preparation, crystallization, and structure determination by X-ray crystallography and NMR. These efforts resulted in the determination of over 13,500 protein structures, mostly from unique protein families, and increased the structural coverage of the expanding protein universe. SG programs contributed over 4,400 publications to the scientific literature. The NIH-funded Protein Structure Initiatives (PSI) alone have produced over 2,000 scientific publications, which to date have attracted more than 93,000 citations. Software and database developments that were necessary to handle high-throughput structure determination workflows have led to structures of better quality and improved integrity of the associated data. Organized and accessible data have a positive impact on the reproducibility of scientific experiments. Most of the experimental data generated by the SG centers are freely available to the community and has been utilized by scientists in various fields of research. SG projects have created, improved, streamlined, and validated many protocols for protein production and crystallization, data collection, and functional analysis, significantly benefiting biological and biomedical research. PMID:26935210

  4. The impact of structural genomics: the first quindecennial.

    PubMed

    Grabowski, Marek; Niedzialkowska, Ewa; Zimmerman, Matthew D; Minor, Wladek

    2016-03-01

    The period 2000-2015 brought the advent of high-throughput approaches to protein structure determination. With the overall funding on the order of $2 billion (in 2010 dollars), the structural genomics (SG) consortia established worldwide have developed pipelines for target selection, protein production, sample preparation, crystallization, and structure determination by X-ray crystallography and NMR. These efforts resulted in the determination of over 13,500 protein structures, mostly from unique protein families, and increased the structural coverage of the expanding protein universe. SG programs contributed over 4400 publications to the scientific literature. The NIH-funded Protein Structure Initiatives alone have produced over 2000 scientific publications, which to date have attracted more than 93,000 citations. Software and database developments that were necessary to handle high-throughput structure determination workflows have led to structures of better quality and improved integrity of the associated data. Organized and accessible data have a positive impact on the reproducibility of scientific experiments. Most of the experimental data generated by the SG centers are freely available to the community and has been utilized by scientists in various fields of research. SG projects have created, improved, streamlined, and validated many protocols for protein production and crystallization, data collection, and functional analysis, significantly benefiting biological and biomedical research.

  5. A spruce gene map infers ancient plant genome reshuffling and subsequent slow evolution in the gymnosperm lineage leading to extant conifers

    PubMed Central

    2012-01-01

    Background Seed plants are composed of angiosperms and gymnosperms, which diverged from each other around 300 million years ago. While much light has been shed on the mechanisms and rate of genome evolution in flowering plants, such knowledge remains conspicuously meagre for the gymnosperms. Conifers are key representatives of gymnosperms and the sheer size of their genomes represents a significant challenge for characterization, sequencing and assembling. Results To gain insight into the macro-organisation and long-term evolution of the conifer genome, we developed a genetic map involving 1,801 spruce genes. We designed a statistical approach based on kernel density estimation to analyse gene density and identified seven gene-rich isochors. Groups of co-localizing genes were also found that were transcriptionally co-regulated, indicative of functional clusters. Phylogenetic analyses of 157 gene families for which at least two duplicates were mapped on the spruce genome indicated that ancient gene duplicates shared by angiosperms and gymnosperms outnumbered conifer-specific duplicates by a ratio of eight to one. Ancient duplicates were much more translocated within and among spruce chromosomes than conifer-specific duplicates, which were mostly organised in tandem arrays. Both high synteny and collinearity were also observed between the genomes of spruce and pine, two conifers that diverged more than 100 million years ago. Conclusions Taken together, these results indicate that much genomic evolution has occurred in the seed plant lineage before the split between gymnosperms and angiosperms, and that the pace of evolution of the genome macro-structure has been much slower in the gymnosperm lineage leading to extent conifers than that seen for the same period of time in flowering plants. This trend is largely congruent with the contrasted rates of diversification and morphological evolution observed between these two groups of seed plants. PMID:23102090

  6. A Roadmap for Functional Structural Variants in the Soybean Genome

    PubMed Central

    Anderson, Justin E.; Kantar, Michael B.; Kono, Thomas Y.; Fu, Fengli; Stec, Adrian O.; Song, Qijian; Cregan, Perry B.; Specht, James E.; Diers, Brian W.; Cannon, Steven B.; McHale, Leah K.; Stupar, Robert M.

    2014-01-01

    Gene structural variation (SV) has recently emerged as a key genetic mechanism underlying several important phenotypic traits in crop species. We screened a panel of 41 soybean (Glycine max) accessions serving as parents in a soybean nested association mapping population for deletions and duplications in more than 53,000 gene models. Array hybridization and whole genome resequencing methods were used as complementary technologies to identify SV in 1528 genes, or approximately 2.8%, of the soybean gene models. Although SV occurs throughout the genome, SV enrichment was noted in families of biotic defense response genes. Among accessions, SV was nearly eightfold less frequent for gene models that have retained paralogs since the last whole genome duplication event, compared with genes that have not retained paralogs. Increases in gene copy number, similar to that described at the Rhg1 resistance locus, account for approximately one-fourth of the genic SV events. This assessment of soybean SV occurrence presents a target list of genes potentially responsible for rapidly evolving and/or adaptive traits. PMID:24855315

  7. Corynebacterium diphtheriae: genome diversity, population structure and genotyping perspectives.

    PubMed

    Mokrousov, Igor

    2009-01-01

    The epidemic re-emergence of diphtheria in Russia and the Newly Independent States (NIS) of the former Soviet Union in the 1990s demonstrated the continued threat of this thought to be rare disease. The bacteriophage encoded toxin is a main virulence factor of Corynebacterium diphtheriae, however, an analysis of the first complete genome sequence of C. diphtheriae revealed a recent acquisition of other pathogenicity factors including iron-uptake systems, adhesins and fimbrial proteins as indeed this extracellular pathogen has more possibilities for lateral gene transfer than, e.g., its close relative, mainly intracellular Mycobacterium tuberculosis. C. diphtheriae appears to have a phylogeographical structure mainly represented by area-specific variants whose circulation is under strong influence of human host factors, including health control measures, first of all, vaccination, and social economic conditions. This framework core population structure may be challenged by importation of the endemic and eventually toxigenic strains from new areas thus leading to localized or large epidemics caused directly by imported strains or by bacteriophage-lysogenized indigenous strains converted into toxin production. A feature of C. diphtheriae co-existence with humans is its periodicity: following large epidemic in the 1990s, the present period is marked by increasing heterogeneity of the circulating populations whereas re-emergence of new toxigenic variants along with persistent circulation of invasive non-toxigenic strains appear alarming. To identify and rapidly monitor subtle changes in the genome structure at an infraclonal level during and between epidemics, portable and discriminatory typing methods of C. diphtheriae are still needed. In this view, CRISPRs and minisatellites are promising genomic markers for development of high-resolution typing schemes and databasing of C. diphtheriae.

  8. The complete mitochondrial genome structure of the jaguar (Panthera onca).

    PubMed

    Caragiulo, Anthony; Dougherty, Eric; Soto, Sofia; Rabinowitz, Salisa; Amato, George

    2016-01-01

    The jaguar (Panthera onca) is the largest felid in the Western hemisphere, and the only member of the Panthera genus in the New World. The jaguar inhabits most countries within Central and South America, and is considered near threatened by the International Union for the Conservation of Nature. This study represents the first sequence of the entire jaguar mitogenome, which was the only Panthera mitogenome that had not been sequenced. The jaguar mitogenome is 17,049 bases and possesses the same molecular structure as other felid mitogenomes. Bayesian inference (BI) and maximum likelihood (ML) were used to determine the phylogenetic placement of the jaguar within the Panthera genus. Both BI and ML analyses revealed the jaguar to be sister to the tiger/leopard/snow leopard clade.

  9. Population structure and minimum core genome typing of Legionella pneumophila

    PubMed Central

    Qin, Tian; Zhang, Wen; Liu, Wenbin; Zhou, Haijian; Ren, Hongyu; Shao, Zhujun; Lan, Ruiting; Xu, Jianguo

    2016-01-01

    Legionella pneumophila is an important human pathogen causing Legionnaires’ disease. In this study, whole genome sequencing (WGS) was used to study the characteristics and population structure of L. pneumophila strains. We sequenced and compared 53 isolates of L. pneumophila covering different serogroups and sequence-based typing (SBT) types (STs). We found that 1,896 single-copy orthologous genes were shared by all isolates and were defined as the minimum core genome (MCG) of L. pneumophila. A total of 323,224 single-nucleotide polymorphisms (SNPs) were identified among the 53 strains. After excluding 314,059 SNPs which were likely to be results of recombination, the remaining 9,165 SNPs were referred to as MCG SNPs. Population Structure analysis based on MCG divided the 53 L. pneumophila into nine MCG groups. The within-group distances were much smaller than the between-group distances, indicating considerable divergence between MCG groups. MCG groups were also supplied by phylogenetic analysis and may be considered as robust taxonomic units within L. pneumophila. Among the nine MCG groups, eight showed high intracellular growth ability while one showed low intracellular growth ability. Furthermore, MCG typing also showed high resolution in subtyping ST1 strains. The results obtained in this study provided significant insights into the evolution, population structure and pathogenicity of L. pneumophila. PMID:26888563

  10. Structure and Functional Studies of DEN-2 Virus Genome.

    DTIC Science & Technology

    1982-09-01

    Structure and Functional Studies on Dengue -2 Progress Report Virus Genome 1 Mar 82 - I Sep 82 6. PERFORMING ORO. REPORT NUMBER 7. AUTHOR(e) 8. CONTRACT OR...Identify by block number) Complementdry DNA synthesis of Dengue -2 RNA by avian reverse transcriptase in vitro. The size of the DNA copy of Dengue RNA is in...Unannounced 0 Justification ............ By........... Di.A b-Aio: i Availability Codes S Avail and (or 2 Abstract 1. Dengue -2 RNA (DEN-2 RNA) was extracted

  11. Constraints on the structure and dynamics of the Earth's deep interior inferred from nutation observations

    NASA Astrophysics Data System (ADS)

    Koot, L.

    2012-12-01

    The gravitational torque applied on the Earth by the other celestial bodies generates periodic variations in the orientation of the Earth's rotation axis in space which are called nutations. This motion has two normal modes, the Free Core Nutation (FCN) and the Free Inner Core Nutation (FICN), of which the frequencies and dampings depend directly on the Earth's interior structure and dynamics (e.g. Mathews et al. 1991a, 1991b, Mathews & Shapiro 1992). Both normal modes are characterized by differential rotations of the inner core, the outer core, and the mantle. Their natural frequencies are thus directly affected both by the strength of the mechanical coupling at the outer core boundaries and by the way the three regions deform due to the action of centrifugal forces. Similarly, the damping of the modes reflects the energy dissipated both through the couplings at the outer core boundaries and through anelastic deformation. The mechanical coupling can be of several physical origins such as gravitational, electromagnetic, viscous, or pressure/topographic couplings. Due to the high precision of the nutation observations, obtained from the Very Long Baseline Interferometry (VLBI) technique, the frequency and damping of the normal modes can be estimated from the resonance effect they induce on the forced nutations (Mathews et al. 2002, Koot et al. 2008, 2010). Interpretation of these estimated natural frequencies and dampings allows then for insights into the deep Earth's physical properties. In this talk, we review the constraints that have been inferred from nutation observations on deep Earth's properties such as the intensity of the magnetic field at the outer core boundaries (Buffett et al. 2002, Koot et al. 2010, Buffett 2010a), the viscosity of the core fluid close to those boundaries (Mathews & Guo 2005, Deleplace & Cardin 2006, Koot et al. 2010), the chemical stratification at the top of the core (Buffett 2010b), and the viscosity of the inner core (Koot

  12. Complete Genome and Molecular Epidemiological Data Infer the Maintenance of Rabies among Kudu (Tragelaphus strepsiceros) in Namibia

    PubMed Central

    Scott, Terence P.; Fischer, Melina; Khaiseb, Siegfried; Freuling, Conrad; Höper, Dirk; Hoffmann, Bernd; Markotter, Wanda; Müller, Thomas; Nel, Louis H.

    2013-01-01

    Rabies in kudu is unique to Namibia and two major peaks in the epizootic have occurred since it was first noted in 1977. Due to the large numbers of kudu that were affected, it was suspected that horizontal transmission of rabies occurs among kudu and that rabies was being maintained independently within the Namibian kudu population – separate from canid cycles, despite geographic overlap. In this study, it was our aim to show, through phylogenetic analyses, that rabies was being maintained independently within the Namibian kudu population. We also tested, through complete genome sequencing of four rabies virus isolates from jackal and kudu, whether specific mutations occurred in the virus genome due to host adaptation. We found the separate grouping of all rabies isolates from kudu to those of any other canid species in Namibia, suggesting that rabies was being maintained independently in kudu. Additionally, we noted several mutations unique to isolates from kudu, suggesting that these mutations may be due to the adaptation of rabies to a new host. In conclusion, we show clear evidence that rabies is being maintained independently in the Namibian kudu population – a unique phenomenon with ecological and economic impacts. PMID:23527015

  13. A method and knowledge base for automated inference of patient problems from structured data in an electronic medical record

    PubMed Central

    Pang, Justine; Feblowitz, Joshua C; Maloney, Francine L; Wilcox, Allison R; Ramelson, Harley Z; Schneider, Louise I; Bates, David W

    2011-01-01

    Background Accurate knowledge of a patient's medical problems is critical for clinical decision making, quality measurement, research, billing and clinical decision support. Common structured sources of problem information include the patient problem list and billing data; however, these sources are often inaccurate or incomplete. Objective To develop and validate methods of automatically inferring patient problems from clinical and billing data, and to provide a knowledge base for inferring problems. Study design and methods We identified 17 target conditions and designed and validated a set of rules for identifying patient problems based on medications, laboratory results, billing codes, and vital signs. A panel of physicians provided input on a preliminary set of rules. Based on this input, we tested candidate rules on a sample of 100 000 patient records to assess their performance compared to gold standard manual chart review. The physician panel selected a final rule for each condition, which was validated on an independent sample of 100 000 records to assess its accuracy. Results Seventeen rules were developed for inferring patient problems. Analysis using a validation set of 100 000 randomly selected patients showed high sensitivity (range: 62.8–100.0%) and positive predictive value (range: 79.8–99.6%) for most rules. Overall, the inference rules performed better than using either the problem list or billing data alone. Conclusion We developed and validated a set of rules for inferring patient problems. These rules have a variety of applications, including clinical decision support, care improvement, augmentation of the problem list, and identification of patients for research cohorts. PMID:21613643

  14. The Complete Chloroplast Genome Sequence of Podocarpus lambertii: Genome Structure, Evolutionary Aspects, Gene Content and SSR Detection

    PubMed Central

    Vieira, Leila do Nascimento; Faoro, Helisson; Rogalski, Marcelo; Fraga, Hugo Pacheco de Freitas; Cardoso, Rodrigo Luis Alves; de Souza, Emanuel Maltempi; de Oliveira Pedrosa, Fábio; Nodari, Rubens Onofre; Guerra, Miguel Pedro

    2014-01-01

    Background Podocarpus lambertii (Podocarpaceae) is a native conifer from the Brazilian Atlantic Forest Biome, which is considered one of the 25 biodiversity hotspots in the world. The advancement of next-generation sequencing technologies has enabled the rapid acquisition of whole chloroplast (cp) genome sequences at low cost. Several studies have proven the potential of cp genomes as tools to understand enigmatic and basal phylogenetic relationships at different taxonomic levels, as well as further probe the structural and functional evolution of plants. In this work, we present the complete cp genome sequence of P. lambertii. Methodology/Principal Findings The P. lambertii cp genome is 133,734 bp in length, and similar to other sequenced cupressophytes, it lacks one of the large inverted repeat regions (IR). It contains 118 unique genes and one duplicated tRNA (trnN-GUU), which occurs as an inverted repeat sequence. The rps16 gene was not found, which was previously reported for the plastid genome of another Podocarpaceae (Nageia nagi) and Araucariaceae (Agathis dammara). Structurally, P. lambertii shows 4 inversions of a large DNA fragment ∼20,000 bp compared to the Podocarpus totara cp genome. These unexpected characteristics may be attributed to geographical distance and different adaptive needs. The P. lambertii cp genome presents a total of 28 tandem repeats and 156 SSRs, with homo- and dipolymers being the most common and tri-, tetra-, penta-, and hexapolymers occurring with less frequency. Conclusion The complete cp genome sequence of P. lambertii revealed significant structural changes, even in species from the same genus. These results reinforce the apparently loss of rps16 gene in Podocarpaceae cp genome. In addition, several SSRs in the P. lambertii cp genome are likely intraspecific polymorphism sites, which may allow highly sensitive phylogeographic and population structure studies, as well as phylogenetic studies of species of this genus. PMID

  15. Molecular phylogeny of the nettle family (Urticaceae) inferred from multiple loci of three genomes and extensive generic sampling.

    PubMed

    Wu, Zeng-Yuan; Monro, Alex K; Milne, Richard I; Wang, Hong; Yi, Ting-Shuang; Liu, Jie; Li, De-Zhu

    2013-12-01

    Urticaceae is one of the larger Angiosperm families, but relationships within it remain poorly known. This study presents the first densely sampled molecular phylogeny of Urticaceae, using maximum likelihood (ML), maximum parsimony (MP) and Bayesian inference (BI) to analyze the DNA sequence data from two nuclear (ITS and 18S), four chloroplast (matK, rbcL, rpll4-rps8-infA-rpl36, trnL-trnF) and one mitochondrial (matR) loci. We sampled 169 accessions representing 122 species, representing 47 of the 54 recognized genera within Urticaceae, including four of the six sometimes separated as Cecropiaceae. Major results included: (1) Urticaceae including Cecropiaceae was monophyletic; (2) Cecropiaceae was biphyletic, with both lineages nested within Urticaceae; (3) Urticaceae can be divided into four well-supported clades; (4) previously erected tribes or subfamilies were broadly supported, with some additions and alterations; (5) the monophyly of many genera was supported, whereas Boehmeria, Pellionia, Pouzolzia and Urera were clearly polyphyletic, while Urtica and Pilea each had a small genus nested within them; (6) relationships between genera were clarified, mostly with substantial support. These results clarify that some morphological characters have been overstated and others understated in previous classifications of the family, and provide a strong foundation for future studies on biogeography, character evolution, and circumscription of difficult genera.

  16. AFLP markers resolve intra-specific relationships and infer genetic structure among lineages of the canyon treefrog, Hyla arenicolor.

    PubMed

    Klymus, Katy E; Carl Gerhardt, H

    2012-11-01

    The canyon treefrog, Hyla arenicolor, is a wide-ranging hylid found from southwestern US into southern Mexico. Recent studies have shown this species to have a complex evolutionary history, with several phylogeographically distinct lineages, a probable cryptic species, and multiple episodes of mitochondrial introgression with the sister group, the H. eximia complex. We aimed to use genome wide AFLP markers to better resolve relationships within this group. As in other studies, our inferred phylogeny not only provides evidence for repeated mitochondrial introgression between H. arenicolor lineages and H. eximia/H. wrightorum, but it also affords more resolution within the main H. arenicolor clade than was previously achieved with sequence data. However, as with a previous study, the placement of a lineage of H. arenicolor whose distribution is centered in the Balsas Basin of Mexico remains poorly resolved, perhaps due to past hybridization with the H. eximia complex. Furthermore, the AFLP data set shows no differentiation among lineages from the Grand Canyon and Colorado Plateau despite their large mitochondrial sequence divergence. Finally, our results infer a well-supported sister relationship between this combined Colorado Plateau/Grand Canyon lineage and the Sonoran Desert lineage, a relationship that strongly contradicts conclusions drawn from the mtDNA evidence. Our study provides a basis for further behavioral and ecological speciation studies of this system and highlights the importance of multi-taxon (species) sampling in phylogenetic and phylogeographic studies.

  17. The Chloroplast Genome of Passiflora edulis (Passifloraceae) Assembled from Long Sequence Reads: Structural Organization and Phylogenomic Studies in Malpighiales

    PubMed Central

    Cauz-Santos, Luiz A.; Munhoz, Carla F.; Rodde, Nathalie; Cauet, Stephane; Santos, Anselmo A.; Penha, Helen A.; Dornelas, Marcelo C.; Varani, Alessandro M.; Oliveira, Giancarlo C. X.; Bergès, Hélène; Vieira, Maria Lucia C.

    2017-01-01

    The family Passifloraceae consists of some 700 species classified in around 16 genera. Almost all its members belong to the genus Passiflora. In Brazil, the yellow passion fruit (Passiflora edulis) is of considerable economic importance, both for juice production and consumption as fresh fruit. The availability of chloroplast genomes (cp genomes) and their sequence comparisons has led to a better understanding of the evolutionary relationships within plant taxa. In this study, we obtained the complete nucleotide sequence of the P. edulis chloroplast genome, the first entirely sequenced in the Passifloraceae family. We determined its structure and organization, and also performed phylogenomic studies on the order Malpighiales and the Fabids clade. The P. edulis chloroplast genome is characterized by the presence of two copies of an inverted repeat sequence (IRA and IRB) of 26,154 bp, each separating a small single copy region of 13,378 bp and a large single copy (LSC) region of 85,720 bp. The annotation resulted in the identification of 105 unique genes, including 30 tRNAs, 4 rRNAs, and 71 protein coding genes. Also, 36 repetitive elements and 85 SSRs (microsatellites) were identified. The structure of the complete cp genome of P. edulis differs from that of other species because of rearrangement events detected by means of a comparison based on 22 members of the Malpighiales. The rearrangements were three inversions of 46,151, 3,765 and 1,631 bp, located in the LSC region. Phylogenomic analysis resulted in strongly supported trees, but this could also be a consequence of the limited taxonomic sampling used. Our results have provided a better understanding of the evolutionary relationships in the Malpighiales and the Fabids, confirming the potential of complete chloroplast genome sequences in inferring evolutionary relationships and the utility of long sequence reads for generating very accurate biological information. PMID:28344587

  18. The Chloroplast Genome of Passiflora edulis (Passifloraceae) Assembled from Long Sequence Reads: Structural Organization and Phylogenomic Studies in Malpighiales.

    PubMed

    Cauz-Santos, Luiz A; Munhoz, Carla F; Rodde, Nathalie; Cauet, Stephane; Santos, Anselmo A; Penha, Helen A; Dornelas, Marcelo C; Varani, Alessandro M; Oliveira, Giancarlo C X; Bergès, Hélène; Vieira, Maria Lucia C

    2017-01-01

    The family Passifloraceae consists of some 700 species classified in around 16 genera. Almost all its members belong to the genus Passiflora. In Brazil, the yellow passion fruit (Passiflora edulis) is of considerable economic importance, both for juice production and consumption as fresh fruit. The availability of chloroplast genomes (cp genomes) and their sequence comparisons has led to a better understanding of the evolutionary relationships within plant taxa. In this study, we obtained the complete nucleotide sequence of the P. edulis chloroplast genome, the first entirely sequenced in the Passifloraceae family. We determined its structure and organization, and also performed phylogenomic studies on the order Malpighiales and the Fabids clade. The P. edulis chloroplast genome is characterized by the presence of two copies of an inverted repeat sequence (IRA and IRB) of 26,154 bp, each separating a small single copy region of 13,378 bp and a large single copy (LSC) region of 85,720 bp. The annotation resulted in the identification of 105 unique genes, including 30 tRNAs, 4 rRNAs, and 71 protein coding genes. Also, 36 repetitive elements and 85 SSRs (microsatellites) were identified. The structure of the complete cp genome of P. edulis differs from that of other species because of rearrangement events detected by means of a comparison based on 22 members of the Malpighiales. The rearrangements were three inversions of 46,151, 3,765 and 1,631 bp, located in the LSC region. Phylogenomic analysis resulted in strongly supported trees, but this could also be a consequence of the limited taxonomic sampling used. Our results have provided a better understanding of the evolutionary relationships in the Malpighiales and the Fabids, confirming the potential of complete chloroplast genome sequences in inferring evolutionary relationships and the utility of long sequence reads for generating very accurate biological information.

  19. The complete mitochondrial genome structure of snow leopard Panthera uncia.

    PubMed

    Wei, Lei; Wu, Xiaobing; Jiang, Zhigang

    2009-05-01

    The complete mitochondrial genome (mtDNA) of snow leopard Panthera uncia was obtained by using the polymerase chain reaction (PCR) technique based on the PCR fragments of 30 primers we designed. The entire mtDNA sequence was 16 773 base pairs (bp) in length, and the base composition was: A-5,357 bp (31.9%); C-4,444 bp (26.5%); G-2,428 bp (14.5%); T-4,544 bp (27.1%). The structural characteristics [0] of the P. uncia mitochondrial genome were highly similar to these of Felis catus, Acinonyx jubatus, Neofelis nebulosa and other mammals. However, we found several distinctive features of the mitochondrial genome of Panthera unica. First, the termination codon of COIII was TAA, which differed from those of F. catus, A. jubatus and N. nebulosa. Second, tRNA(Ser) ((AGY)), which lacked the ''DHU'' arm, could not be folded into the typical cloverleaf-shaped structure. Third, in the control region, a long repetitive sequence in RS-2 (32 bp) region was found with 2 repeats while one short repetitive segment (9 bp) was found with 15 repeats in the RS-3 region. We performed phylogenetic analysis based on a 3 816 bp concatenated sequence of 12S rRNA, 16S rRNA, ND2, ND4, ND5, Cyt b and ATP8 for P. uncia and other related species, the result indicated that P. uncia and P. leo were the sister species, which was different from the previous findings.

  20. Structural genomic variation in childhood epilepsies with complex phenotypes

    PubMed Central

    Helbig, Ingo; Swinkels, Marielle E M; Aten, Emmelien; Caliebe, Almuth; van 't Slot, Ruben; Boor, Rainer; von Spiczak, Sarah; Muhle, Hiltrud; Jähn, Johanna A; van Binsbergen, Ellen; van Nieuwenhuizen, Onno; Jansen, Floor E; Braun, Kees P J; de Haan, Gerrit-Jan; Tommerup, Niels; Stephani, Ulrich; Hjalgrim, Helle; Poot, Martin; Lindhout, Dick; Brilstra, Eva H; Møller, Rikke S; Koeleman, Bobby PC

    2014-01-01

    A genetic contribution to a broad range of epilepsies has been postulated, and particularly copy number variations (CNVs) have emerged as significant genetic risk factors. However, the role of CNVs in patients with epilepsies with complex phenotypes is not known. Therefore, we investigated the role of CNVs in patients with unclassified epilepsies and complex phenotypes. A total of 222 patients from three European countries, including patients with structural lesions on magnetic resonance imaging (MRI), dysmorphic features, and multiple congenital anomalies, were clinically evaluated and screened for CNVs. MRI findings including acquired or developmental lesions and patient characteristics were subdivided and analyzed in subgroups. MRI data were available for 88.3% of patients, of whom 41.6% had abnormal MRI findings. Eighty-eight rare CNVs were discovered in 71 out of 222 patients (31.9%). Segregation of all identified variants could be assessed in 42 patients, 11 of which were de novo. The frequency of all structural variants and de novo variants was not statistically different between patients with or without MRI abnormalities or MRI subcategories. Patients with dysmorphic features were more likely to carry a rare CNV. Genome-wide screening methods for rare CNVs may provide clues for the genetic etiology in patients with a broader range of epilepsies than previously anticipated, including in patients with various brain anomalies detectable by MRI. Performing genome-wide screens for rare CNVs can be a valuable contribution to the routine diagnostic workup in patients with a broad range of childhood epilepsies. PMID:24281369

  1. Patenting nonassociated polymeric structures (NAPS): implications for structural genomic data release.

    PubMed

    Sung, Lawrence M

    2003-01-01

    The intellectual property laws that govern patent rights should provide a reasonable balance between the competing concerns of open access and exclusivity. Open access can facilitate knowledge dissemination and collaboration in furthering science. On the other hand, exclusivity can ensure interest and financial investment in scientific research and development. In recent days, the appropriate balance between open access and exclusivity has been a focus of public debate, particularly with regard to genomic inventions and their applications. In seeking to reconcile the timing of structural genomic data release with certain efforts to secure intellectual property rights, the International Structural Genomics Organisation joins others confronting this controversy. This paper seeks to inform the discussion with an overview of the U.S. standards for patenting nonassociated polymeric structures (NAPS), which include polynucleotides or polypeptides of unknown biological significance, and their corresponding structural data. In the United States, the present ability to obtain patent rights to these discoveries appears problematic given the requirement of specific, substantial and credible utility, among other things. Without demonstrable utility, NAPS and NAPS-related data likely will not be entitled to patent protection, whether the U.S. Patent & Trademark Office rejects NAPS claims as unpatentable in the first instance, or the U.S. federal courts invalidate NAPS claims in later patent litigation. As such, the improbability of obtaining enforceable patent rights to NAPS might undermine the rationale for delaying structural genomic data release to allow for the filing of patent applications in this regard.

  2. Utilizing Statistical Inference to Guide Expectations and Test Structuring During Operational Testing and Evaluation

    DTIC Science & Technology

    2011-04-30

    Commander, Naval Sea Systems Command • Army Contracting Command, U.S. Army Materiel Command • Program Manager, Airborne, Maritime and Fixed Station...are in the area of the Design and Acquisition of Military Assets. Specific domains of interests include the concept of value and its integration...inference may point to areas where the test may be modified or additional control measures may be introduced to increase the likelihood of obtaining

  3. Inferring Action Structure and Causal Relationships in Continuous Sequences of Human Action

    DTIC Science & Technology

    2014-01-01

    1997; Gopnik et al., 2004; Griffiths, Sobel , Tenenbaum, & Gopnik, 2011). In fact, though a variety of sources of information inform people’s causal...inference over causal graphical models has successfully been used to capture causal learning in both children (e.g., Gopnik et al., 2004; Sobel , Tenenbaum...emerging along roughly the same timeline as other statistical learning abilities ( Sobel & Kirkham, 2006, 2007). Therefore, both our modeling and

  4. Demographic inferences using short-read genomic data in an approximate Bayesian computation framework: in silico evaluation of power, biases and proof of concept in Atlantic walrus.

    PubMed

    Shafer, Aaron B A; Gattepaille, Lucie M; Stewart, Robert E A; Wolf, Jochen B W

    2015-01-01

    Approximate Bayesian computation (ABC) is a powerful tool for model-based inference of demographic histories from large genetic data sets. For most organisms, its implementation has been hampered by the lack of sufficient genetic data. Genotyping-by-sequencing (GBS) provides cheap genome-scale data to fill this gap, but its potential has not fully been exploited. Here, we explored power, precision and biases of a coalescent-based ABC approach where GBS data were modelled with either a population mutation parameter (θ) or a fixed site (FS) approach, allowing single or several segregating sites per locus. With simulated data ranging from 500 to 50 000 loci, a variety of demographic models could be reliably inferred across a range of timescales and migration scenarios. Posterior estimates were informative with 1000 loci for migration and split time in simple population divergence models. In more complex models, posterior distributions were wide and almost reverted to the uninformative prior even with 50 000 loci. ABC parameter estimates, however, were generally more accurate than an alternative composite-likelihood method. Bottleneck scenarios proved particularly difficult, and only recent bottlenecks without recovery could be reliably detected and dated. Notably, minor-allele-frequency filters - usual practice for GBS data - negatively affected nearly all estimates. With this in mind, we used a combination of FS and θ approaches on empirical GBS data generated from the Atlantic walrus (Odobenus rosmarus rosmarus), collectively providing support for a population split before the last glacial maximum followed by asymmetrical migration and a high Arctic bottleneck. Overall, this study evaluates the potential and limitations of GBS data in an ABC-coalescence framework and proposes a best-practice approach.

  5. Southeast Asian origins of five Hill Tribe populations and correlation of genetic to linguistic relationships inferred with genome-wide SNP data.

    PubMed

    Listman, J B; Malison, R T; Sanichwankul, K; Ittiwut, C; Mutirangura, A; Gelernter, J

    2011-02-01

    In Thailand, the term Hill Tribe is used to describe populations whose members traditionally practice slash and burn agriculture and reside in the mountains. These tribes are thought to have migrated throughout Asia for up to 5,000 years, including migrations through Southern China and/or Southeast Asia. There have been continuous migrations southward from China into Thailand for approximately the past thousand years and the present geographic range of any given tribe straddles multiple political borders. As none of these populations have autochthonous scripts, written histories have until recently, been externally produced. Northern Asian, Tibetan, and Siberian origins of Hill Tribes have been proposed. All purport endogamy and have nonmutually intelligible languages. To test hypotheses regarding the geographic origins of these populations, relatedness and migrations among them and neighboring populations, and whether their genetic relationships correspond with their linguistic relationships, we analyzed 2,445 genome-wide SNP markers in 118 individuals from five Thai Hill Tribe populations (Akha, Hmong, Karen, Lahu, and Lisu), 90 individuals from majority Thai populations, and 826 individuals from Asian and Oceanean HGDP and HapMap populations using a Bayesian clustering method. Considering these results within the context of results ofrecent large-scale studies of Asian geographic genetic variation allows us to infer a shared Southeast Asian origin of these five Hill Tribe populations as well ancestry components that distinguish among them seen in successive levels of clustering. In addition, the inferred level of shared ancestry among the Hill Tribes corresponds well to relationships among their languages.

  6. First all-in-one diagnostic tool for DNA intelligence: genome-wide inference of biogeographic ancestry, appearance, relatedness, and sex with the Identitas v1 Forensic Chip.

    PubMed

    Keating, Brendan; Bansal, Aruna T; Walsh, Susan; Millman, Jonathan; Newman, Jonathan; Kidd, Kenneth; Budowle, Bruce; Eisenberg, Arthur; Donfack, Joseph; Gasparini, Paolo; Budimlija, Zoran; Henders, Anjali K; Chandrupatla, Hareesh; Duffy, David L; Gordon, Scott D; Hysi, Pirro; Liu, Fan; Medland, Sarah E; Rubin, Laurence; Martin, Nicholas G; Spector, Timothy D; Kayser, Manfred

    2013-05-01

    When a forensic DNA sample cannot be associated directly with a previously genotyped reference sample by standard short tandem repeat profiling, the investigation required for identifying perpetrators, victims, or missing persons can be both costly and time consuming. Here, we describe the outcome of a collaborative study using the Identitas Version 1 (v1) Forensic Chip, the first commercially available all-in-one tool dedicated to the concept of developing intelligence leads based on DNA. The chip allows parallel interrogation of 201,173 genome-wide autosomal, X-chromosomal, Y-chromosomal, and mitochondrial single nucleotide polymorphisms for inference of biogeographic ancestry, appearance, relatedness, and sex. The first assessment of the chip's performance was carried out on 3,196 blinded DNA samples of varying quantities and qualities, covering a wide range of biogeographic origin and eye/hair coloration as well as variation in relatedness and sex. Overall, 95 % of the samples (N = 3,034) passed quality checks with an overall genotype call rate >90 % on variable numbers of available recorded trait information. Predictions of sex, direct match, and first to third degree relatedness were highly accurate. Chip-based predictions of biparental continental ancestry were on average ~94 % correct (further support provided by separately inferred patrilineal and matrilineal ancestry). Predictions of eye color were 85 % correct for brown and 70 % correct for blue eyes, and predictions of hair color were 72 % for brown, 63 % for blond, 58 % for black, and 48 % for red hair. From the 5 % of samples (N = 162) with <90 % call rate, 56 % yielded correct continental ancestry predictions while 7 % yielded sufficient genotypes to allow hair and eye color prediction. Our results demonstrate that the Identitas v1 Forensic Chip holds great promise for a wide range of applications including criminal investigations, missing person investigations, and for national security

  7. Can we continue to neglect genomic variation in introgression rates when inferring the history of speciation? A case study in a Mytilus hybrid zone.

    PubMed

    Roux, C; Fraïsse, C; Castric, V; Vekemans, X; Pogson, G H; Bierne, N

    2014-08-01

    The use of molecular data to reconstruct the history of divergence and gene flow between populations of closely related taxa represents a challenging problem. It has been proposed that the long-standing debate about the geography of speciation can be resolved by comparing the likelihoods of a model of isolation with migration and a model of secondary contact. However, data are commonly only fit to a model of isolation with migration and rarely tested against the secondary contact alternative. Furthermore, most demographic inference methods have neglected variation in introgression rates and assume that the gene flow parameter (Nm) is similar among loci. Here, we show that neglecting this source of variation can give misleading results. We analysed DNA sequences sampled from populations of the marine mussels, Mytilus edulis and M. galloprovincialis, across a well-studied mosaic hybrid zone in Europe and evaluated various scenarios of speciation, with or without variation in introgression rates, using an Approximate Bayesian Computation (ABC) approach. Models with heterogeneous gene flow across loci always outperformed models assuming equal migration rates irrespective of the history of gene flow being considered. By incorporating this heterogeneity, the best-supported scenario was a long period of allopatric isolation during the first three-quarters of the time since divergence followed by secondary contact and introgression during the last quarter. By contrast, constraining migration to be homogeneous failed to discriminate among any of the different models of gene flow tested. Our simulations thus provide statistical support for the secondary contact scenario in the European Mytilus hybrid zone that the standard coalescent approach failed to confirm. Our results demonstrate that genomic variation in introgression rates can have profound impacts on the biological conclusions drawn from inference methods and needs to be incorporated in future studies.

  8. Phylogenetic position of tetraodontiform fishes within the higher teleosts: Bayesian inferences based on 44 whole mitochondrial genome sequences.

    PubMed

    Yamanoue, Yusuke; Miya, Masaki; Matsuura, Keiichi; Yagishita, Naoki; Mabuchi, Kohji; Sakai, Harumi; Katoh, Masaya; Nishida, Mutsumi

    2007-10-01

    Tetraodontiformes includes approximately 350 species assigned to nine families, sharing several reduced morphological features of higher teleosts. The order has been accepted as a monophyletic group by many authors, although several alternative hypotheses exist regarding its phylogenetic position within the higher teleosts. To date, acanthuroids, zeiforms, and lophiiforms have been proposed as sister-groups of the tetraodontiforms. The monophyly and sister-group status was investigated using whole mitochondrial genome (mitogenome) sequences from 44 purposefully-chosen species (26 sequences newly-determined during the study) that fully represent the major tetraodontiform lineages plus all the groups that have been hypothesized as being close relatives. Partitioned Bayesian analyses were conducted with the three datasets that comprised concatenated nucleotide sequences from 13 protein-coding genes (with and without, or with RY-coding, 3rd codon positions), plus 22 transfer RNA and two ribosomal RNA genes. The resultant trees were well resolved and largely congruent, with most internal branches being supported by high posterior probabilities. Mitogenomic data strongly supported the monophyly of tetraodontiform fishes, placing them as a sister-group of either Lophiiformes plus Caproidei or Caproidei only. The sister-group relationship between Acanthuroidei and Tetraodontiformes was statistically rejected using Bayes factors. These results were confirmed by a reanalysis of the previously published nuclear RAG1 gene sequences using the Bayesian method. Within the Tetraodontiformes, however, monophylies of the three superfamilies were not recovered and further taxonomic sampling and subsequent efforts should clarify these relationships.

  9. RCK: accurate and efficient inference of sequence- and structure-based protein–RNA binding models from RNAcompete data

    PubMed Central

    Orenstein, Yaron; Wang, Yuhao; Berger, Bonnie

    2016-01-01

    Motivation: Protein–RNA interactions, which play vital roles in many processes, are mediated through both RNA sequence and structure. CLIP-based methods, which measure protein–RNA binding in vivo, suffer from experimental noise and systematic biases, whereas in vitro experiments capture a clearer signal of protein RNA-binding. Among them, RNAcompete provides binding affinities of a specific protein to more than 240 000 unstructured RNA probes in one experiment. The computational challenge is to infer RNA structure- and sequence-based binding models from these data. The state-of-the-art in sequence models, Deepbind, does not model structural preferences. RNAcontext models both sequence and structure preferences, but is outperformed by GraphProt. Unfortunately, GraphProt cannot detect structural preferences from RNAcompete data due to the unstructured nature of the data, as noted by its developers, nor can it be tractably run on the full RNACompete dataset. Results: We develop RCK, an efficient, scalable algorithm that infers both sequence and structure preferences based on a new k-mer based model. Remarkably, even though RNAcompete data is designed to be unstructured, RCK can still learn structural preferences from it. RCK significantly outperforms both RNAcontext and Deepbind in in vitro binding prediction for 244 RNAcompete experiments. Moreover, RCK is also faster and uses less memory, which enables scalability. While currently on par with existing methods in in vivo binding prediction on a small scale test, we demonstrate that RCK will increasingly benefit from experimentally measured RNA structure profiles as compared to computationally predicted ones. By running RCK on the entire RNAcompete dataset, we generate and provide as a resource a set of protein–RNA structure-based models on an unprecedented scale. Availability and Implementation: Software and models are freely available at http://rck.csail.mit.edu/ Contact: bab@mit.edu Supplementary information

  10. 3D structures of membrane proteins from genomic sequencing

    PubMed Central

    Hopf, Thomas A.; Colwell, Lucy J.; Sheridan, Robert; Rost, Burkhard; Sander, Chris; Marks, Debora S.

    2012-01-01

    Summary We show that amino acid co-variation in proteins, extracted from the evolutionary sequence record, can be used to fold transmembrane proteins. We use this technique to predict previously unknown, 3D structures for 11 transmembrane proteins (with up to 14 helices) from their sequences alone. The prediction method (EVfold_membrane), applies a maximum entropy approach to infer evolutionary co-variation in pairs of sequence positions within a protein family and then generates all-atom models with the derived pairwise distance constraints. We benchmark the approach with blinded, de novo computation of known transmembrane protein structures from 23 families, demonstrating unprecedented accuracy of the method for large transmembrane proteins. We show how the method can predict oligomerization, functional sites, and conformational changes in transmembrane proteins. With the rapid rise in large-scale sequencing, more accurate and more comprehensive information on evolutionary constraints can be decoded from genetic variation, greatly expanding the repertoire of transmembrane proteins amenable to modelling by this method. PMID:22579045

  11. Origin of avian genome size and structure in non-avian dinosaurs.

    PubMed

    Organ, Chris L; Shedlock, Andrew M; Meade, Andrew; Pagel, Mark; Edwards, Scott V

    2007-03-08

    Avian genomes are small and streamlined compared with those of other amniotes by virtue of having fewer repetitive elements and less non-coding DNA. This condition has been suggested to represent a key adaptation for flight in birds, by reducing the metabolic costs associated with having large genome and cell sizes. However, the evolution of genome architecture in birds, or any other lineage, is difficult to study because genomic information is often absent for long-extinct relatives. Here we use a novel bayesian comparative method to show that bone-cell size correlates well with genome size in extant vertebrates, and hence use this relationship to estimate the genome sizes of 31 species of extinct dinosaur, including several species of extinct birds. Our results indicate that the small genomes typically associated with avian flight evolved in the saurischian dinosaur lineage between 230 and 250 million years ago, long before this lineage gave rise to the first birds. By comparison, ornithischian dinosaurs are inferred to have had much larger genomes, which were probably typical for ancestral Dinosauria. Using comparative genomic data, we estimate that genome-wide interspersed mobile elements, a class of repetitive DNA, comprised 5-12% of the total genome size in the saurischian dinosaur lineage, but was 7-19% of total genome size in ornithischian dinosaurs, suggesting that repetitive elements became less active in the saurischian lineage. These genomic characteristics should be added to the list of attributes previously considered avian but now thought to have arisen in non-avian dinosaurs, such as feathers, pulmonary innovations, and parental care and nesting.

  12. Studies on cattle genomic structural variation provide insights into ruminant speciation and adaptation

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genomic structural variations, including segmental duplications (SD) and copy number variations (CNV), contribute significantly to individual health and disease in primates and rodents. As a part of the bovine genome annotation effort, we performed the first genome-wide analysis of SD in cattle usin...

  13. ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments

    PubMed Central

    Lachmann, Alexander; Xu, Huilei; Krishnan, Jayanth; Berger, Seth I.; Mazloom, Amin R.; Ma'ayan, Avi

    2010-01-01

    Motivation: Experiments such as ChIP-chip, ChIP-seq, ChIP-PET and DamID (the four methods referred herein as ChIP-X) are used to profile the binding of transcription factors to DNA at a genome-wide scale. Such experiments provide hundreds to thousands of potential binding sites for a given transcription factor in proximity to gene coding regions. Results: In order to integrate data from such studies and utilize it for further biological discovery, we collected interactions from such experiments to construct a mammalian ChIP-X database. The database contains 189 933 interactions, manually extracted from 87 publications, describing the binding of 92 transcription factors to 31 932 target genes. We used the database to analyze mRNA expression data where we perform gene-list enrichment analysis using the ChIP-X database as the prior biological knowledge gene-list library. The system is delivered as a web-based interactive application called ChIP Enrichment Analysis (ChEA). With ChEA, users can input lists of mammalian gene symbols for which the program computes over-representation of transcription factor targets from the ChIP-X database. The ChEA database allowed us to reconstruct an initial network of transcription factors connected based on shared overlapping targets and binding site proximity. To demonstrate the utility of ChEA we present three case studies. We show how by combining the Connectivity Map (CMAP) with ChEA, we can rank pairs of compounds to be used to target specific transcription factor activity in cancer cells. Availability: The ChEA software and ChIP-X database is freely available online at: http://amp.pharm.mssm.edu/lib/chea.jsp Contact: avi.maayan@mssm.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20709693

  14. Taxonomy, molecular phylogeny and evolution of plant reverse transcribing viruses (family Caulimoviridae) inferred from full-length genome and reverse transcriptase sequences.

    PubMed

    Bousalem, M; Douzery, E J P; Seal, S E

    2008-01-01

    This study constitutes the first evaluation and application of quantitative taxonomy to the family Caulimoviridae and the first in-depth phylogenetic study of the family Caulimoviridae that integrates the common origin between LTR retrotransposons and caulimoviruses. The phylogenetic trees and PASC analyses derived from the full genome and from the corresponding partial RT concurred, providing strong support for the current genus classification based mainly on genome organisation and use of partial RT sequence as a molecular marker. The PASC distributions obtained are multimodal, making it possible to distinguish between genus, species and strain. The taxonomy of badnaviruses infecting banana (Musa spp.) was clarified, and the consequence of endogenous badnaviruses on the genetic diversity and evolution of caulimoviruses is discussed. The use of LTR retrotransposons as outgroups reveals a structured bipolar topology separating the genus Badnavirus from the other genera. Badnaviruses appear to be the most recent genus, with the genus Tungrovirus in an intermediary position. This structuring intersects the one established by genomic and biological properties and allows us to make a correlation between phylogeny and biogeography. The variability shown between members of the family Caulimoviridae is in a similar range to that reported within other DNA and RNA plant virus families.

  15. Genomic structure and expression of immunoglobulins in Squamata.

    PubMed

    Olivieri, David N; Garet, Elina; Estevez, Olivia; Sánchez-Espinel, Christian; Gambón-Deza, Francisco

    2016-04-01

    The Squamata order represents a major evolutionary reptile lineage, yet the structure and expression of immunoglobulins in this order has been scarcely studied in detail. From the genome sequences of four Squamata species (Gekko japonicus, Ophisaurus gracilis, Pogona vitticeps and Ophiophagus hannah) and RNA-seq datasets from 18 other Squamata species, we identified the immunoglobulins present in these animals as well as the tissues in which they are found. All Squamata have at least three immunoglobulin classes; namely, the immunoglobulins M, D, and Y. Unlike mammals, however, we provide evidence that some Squamata lineages possess more than one Cμ gene which is located downstream from the Cδ gene. The existence of two evolutionary lineages of immunoglobulin Y is shown. Additionally, it is demonstrated that while all Squamata species possess the λ light chain, only Iguanidae species possess the κ light chain.

  16. Ecological Inference

    NASA Astrophysics Data System (ADS)

    King, Gary; Rosen, Ori; Tanner, Martin A.

    2004-09-01

    This collection of essays brings together a diverse group of scholars to survey the latest strategies for solving ecological inference problems in various fields. The last half-decade has witnessed an explosion of research in ecological inference--the process of trying to infer individual behavior from aggregate data. Although uncertainties and information lost in aggregation make ecological inference one of the most problematic types of research to rely on, these inferences are required in many academic fields, as well as by legislatures and the Courts in redistricting, by business in marketing research, and by governments in policy analysis.

  17. Integrated consensus map of cultivated peanut and wild relatives reveals structures of the A and B genomes of Arachis and divergence of the legume genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The complex, tetraploid genome structure of peanut (Arachis hypogaea) has obstructed advances in genetics and genomics in the species. The aim of this study is to understand the genome structure of Arachis by developing a high-density integrated consensus map. Three recombinant inbred line populatio...

  18. Inferences about nested subsets structure when not all species are detected

    USGS Publications Warehouse

    Cam, E.; Nichols, J.D.; Hines, J.E.; Sauer, J.R.

    2000-01-01

    Comparisons of species composition among ecological communities of different size have often provided evidence that the species in communities with lower species richness form nested subsets of the species in larger communities. In the vast majority of studies, the question of nested subsets has been addressed using information on presence-absence, where a '0' is interpreted as the absence of a given species from a given location. Most of the methodological discussion in earlier studies investigating nestedness concerns the approach to generation of model-based matrices. However, it is most likely that in many situations investigators cannot detect all the species present in the location sampled. The possibility that zeros in incidence matrices reflect nondetection rather than absence of species has not been considered in studies addressing nested subsets, even though the position of zeros in these matrices forms the basis of earlier inference methods. These sampling artifacts are likely to lead to erroneous conclusions about both variation over space in species richness and the degree of similarity of the various locations. Here we propose an approach to investigation of nestedness, based on statistical inference methods explicitly incorporating species detection probability, that take into account the probabilistic nature of the sampling process. We use presence-absence data collected under Pollock?s robust capture-recapture design, and resort to an estimator of species richness originally developed for closed populations to assess the proportion of species shared by different locations. We develop testable predictions corresponding to the null hypothesis of a nonnested pattern, and an alternative hypothesis of perfect nestedness. We also present an index for assessing the degree of nestedness of a system of ecological communities. We illustrate our approach using avian data from the North American Breeding Bird Survey collected in Florida Keys.

  19. A high-density genetic recombination map of sequence-tagged sites for sorghum, as a framework for comparative structural and evolutionary genomics of tropical grains and grasses.

    PubMed Central

    Bowers, John E; Abbey, Colette; Anderson, Sharon; Chang, Charlene; Draye, Xavier; Hoppe, Alison H; Jessup, Russell; Lemke, Cornelia; Lennington, Jennifer; Li, Zhikang; Lin, Yann-Rong; Liu, Sin-Chieh; Luo, Lijun; Marler, Barry S; Ming, Reiguang; Mitchell, Sharon E; Qiang, Dou; Reischmann, Kim; Schulze, Stefan R; Skinner, D Neil; Wang, Yue-Wen; Kresovich, Stephen; Schertz, Keith F; Paterson, Andrew H

    2003-01-01

    We report a genetic recombination map for Sorghum of 2512 loci spaced at average 0.4 cM ( approximately 300 kb) intervals based on 2050 RFLP probes, including 865 heterologous probes that foster comparative genomics of Saccharum (sugarcane), Zea (maize), Oryza (rice), Pennisetum (millet, buffelgrass), the Triticeae (wheat, barley, oat, rye), and Arabidopsis. Mapped loci identify 61.5% of the recombination events in this progeny set and reveal strong positive crossover interference acting across intervals of structural rearrangements between Sorghum bicolor and S. propinquum, but not to variation in levels of intraspecific allelic richness. While cDNA and genomic clones are similarly distributed across the genome, SSR-containing clones show different abundance patterns. Rapidly evolving hypomethylated DNA may contribute to intraspecific genomic differentiation. Nonrandom distribution patterns of multiple loci detected by 357 probes suggest ancient chromosomal duplication followed by extensive rearrangement and gene loss. Exemplifying the value of these data for comparative genomics, we support and extend prior findings regarding maize-sorghum synteny-in particular, 45% of comparative loci fall outside the inferred colinear/syntenic regions, suggesting that many small rearrangements have occurred since maize-sorghum divergence. These genetically anchored sequence-tagged sites will foster many structural, functional and evolutionary genomic studies in major food, feed, and biomass crops. PMID:14504243

  20. Structural Variation Mutagenesis of the Human Genome: Impact on Disease and Evolution

    PubMed Central

    Lupski, James R.

    2015-01-01

    Watson-Crick base-pair changes, or single-nucleotide variants (SNV), have long been known as a source of mutations. However, the extent to which DNA structural variation, including duplication and deletion copy number variants (CNV) and copy number neutral inversions and translocations, contribute to human genome variation and disease has been appreciated only recently. Moreover, the potential complexity of structural variants (SV) was not envisioned; thus, the frequency of complex genomic rearrangements (CGR) and how such events form remained a mystery. The concept of genomic disorders, diseases due to genomic rearrangements and not sequence-based changes for which genomic architecture incite genomic instability, delineated a new category of conditions distinct from chromosomal syndromes and single-gene Mendelian diseases. Nevertheless, it is the mechanistic understanding of CNV/SV formation that has promoted further understanding of human biology and disease and provided insights into human genome and gene evolution. PMID:25892534

  1. Genome3D: exploiting structure to help users understand their sequences

    PubMed Central

    Lewis, Tony E.; Sillitoe, Ian; Andreeva, Antonina; Blundell, Tom L.; Buchan, Daniel W.A.; Chothia, Cyrus; Cozzetto, Domenico; Dana, José M.; Filippis, Ioannis; Gough, Julian; Jones, David T.; Kelley, Lawrence A.; Kleywegt, Gerard J.; Minneci, Federico; Mistry, Jaina; Murzin, Alexey G.; Ochoa-Montaño, Bernardo; Oates, Matt E.; Punta, Marco; Rackham, Owen J.L.; Stahlhacke, Jonathan; Sternberg, Michael J.E.; Velankar, Sameer; Orengo, Christine

    2015-01-01

    Genome3D (http://www.genome3d.eu) is a collaborative resource that provides predicted domain annotations and structural models for key sequences. Since introducing Genome3D in a previous NAR paper, we have substantially extended and improved the resource. We have annotated representatives from Pfam families to improve coverage of diverse sequences and added a fast sequence search to the website to allow users to find Genome3D-annotated sequences similar to their own. We have improved and extended the Genome3D data, enlarging the source data set from three model organisms to 10, and adding VIVACE, a resource new to Genome3D. We have analysed and updated Genome3D's SCOP/CATH mapping. Finally, we have improved the superposition tools, which now give users a more powerful interface for investigating similarities and differences between structural models. PMID:25348407

  2. Seed plant phylogeny inferred from all three plant genomes: Monophyly of extant gymnosperms and origin of Gnetales from conifers

    PubMed Central

    Chaw, Shu-Miaw; Parkinson, Christopher L.; Cheng, Yuchang; Vincent, Thomas M.; Palmer, Jeffrey D.

    2000-01-01

    Phylogenetic relationships among the five groups of extant seed plants are presently quite unclear. For example, morphological studies consistently identify the Gnetales as the extant sister group to angiosperms (the so-called “anthophyte” hypothesis), whereas a number of molecular studies recover gymnosperm monophyly, and few agree with the morphology-based placement of Gnetales. To better resolve these and other unsettled issues, we have generated a new molecular data set of mitochondrial small subunit rRNA sequences, and have analyzed these data together with comparable data sets for the nuclear small subunit rRNA gene and the chloroplast rbcL gene. All nuclear analyses strongly ally Gnetales with a monophyletic conifers, whereas all mitochondrial analyses and those chloroplast analyses that take into account saturation of third-codon position transitions actually place Gnetales within conifers, as the sister group to the Pinaceae. Combined analyses of all three genes strongly support this latter relationship, which to our knowledge has never been suggested before. The combined analyses also strongly support monophyly of extant gymnosperms, with cycads identified as the basal-most group of gymnosperms, Ginkgo as the next basal, and all conifers except for Pinaceae as sister to the Gnetales + Pinaceae clade. According to these findings, the Gnetales may be viewed as extremely divergent conifers, and the many morphological similarities between angiosperms and Gnetales (e.g., double fertilization and flower-like reproductive structures) arose independently. PMID:10760277

  3. Inferring planar disorder in close-packed structures via ε-machine spectral reconstruction theory: examples from simulated diffraction patterns.

    PubMed

    Varn, D P; Canright, G S; Crutchfield, J P

    2013-07-01

    A previous paper detailed a novel algorithm, ε-machine spectral reconstruction theory (εMSR), that infers pattern and disorder in planar-faulted, close-packed structures directly from X-ray diffraction patterns [Varn et al. (2013). Acta Cryst. A69, 197-206]. Here εMSR is applied to simulated diffraction patterns from four close-packed crystals. It is found that, for stacking structures with a memory length of three or less, εMSR reproduces the statistics of the stacking structure; the result being in the form of a directed graph called an ε-machine. For stacking structures with a memory length larger than three, εMSR returns a model that captures many important features of the original stacking structure. These include multiple stacking faults and multiple crystal structures. Further, it is found that εMSR is able to discover stacking structure in even highly disordered crystals. In order to address issues concerning the long-range order observed in many classes of layered materials, several length parameters are defined, calculable from the ε-machine, and their relevance is discussed.

  4. Internal structure of the Moon inferred from Apollo seismic data and selenodetic data from GRAIL and LLR

    NASA Astrophysics Data System (ADS)

    Matsumoto, Koji; Yamada, Ryuhei; Kikuchi, Fuyuhiko; Kamata, Shunichi; Ishihara, Yoshiaki; Iwata, Takahiro; Hanada, Hideo; Sasaki, Sho

    2015-09-01

    The internal structure of the Moon is important for discussions on its origin and evolution. However, the deep structure of the Moon is still debated due to the absence of comprehensive seismic data. This study explores lunar interior models by complementing Apollo seismic travel time data with selenodetic data which have recently been improved by Gravity Recovery and Interior Laboratory (GRAIL) and Lunar Laser Ranging (LLR). The observed data can be explained by models including a deep-seated zone with a low velocity (S wave velocity = 2.9 ± 0.5 km/s) and a low viscosity (˜3 × 1016 Pa s). The thickness of this zone above the core-mantle boundary is larger than 170 km, showing a negative correlation with the radius of the fluid outer core. The inferred density of the lowermost mantle suggests a high TiO2 content (>11 wt.%) which prefers a mantle overturn scenario.

  5. Inferred vs Realized Patterns of Gene Flow: An Analysis of Population Structure in the Andros Island Rock Iguana

    PubMed Central

    Colosimo, Giuliano; Knapp, Charles R.; Wallace, Lisa E.; Welch, Mark E.

    2014-01-01

    Ecological data, the primary source of information on patterns and rates of migration, can be integrated with genetic data to more accurately describe the realized connectivity between geographically isolated demes. In this paper we implement this approach and discuss its implications for managing populations of the endangered Andros Island Rock Iguana, Cyclura cychlura cychlura. This iguana is endemic to Andros, a highly fragmented landmass of large islands and smaller cays. Field observations suggest that geographically isolated demes were panmictic due to high, inferred rates of gene flow. We expand on these observations using 16 polymorphic microsatellites to investigate the genetic structure and rates of gene flow from 188 Andros Iguanas collected across 23 island sites. Bayesian clustering of specimens assigned individuals to three distinct genotypic clusters. An analysis of molecular variance (AMOVA) indicates that allele frequency differences are responsible for a significant portion of the genetic variance across the three defined clusters (Fst =  0.117, p0.01). These clusters are associated with larger islands and satellite cays isolated by broad water channels with strong currents. These findings imply that broad water channels present greater obstacles to gene flow than was inferred from field observation alone. Additionally, rates of gene flow were indirectly estimated using BAYESASS 3.0. The proportion of individuals originating from within each identified cluster varied from 94.5 to 98.7%, providing further support for local isolation. Our assessment reveals a major disparity between inferred and realized gene flow. We discuss our results in a conservation perspective for species inhabiting highly fragmented landscapes. PMID:25229344

  6. Inferred vs realized patterns of gene flow: an analysis of population structure in the Andros Island Rock Iguana.

    PubMed

    Colosimo, Giuliano; Knapp, Charles R; Wallace, Lisa E; Welch, Mark E

    2014-01-01

    Ecological data, the primary source of information on patterns and rates of migration, can be integrated with genetic data to more accurately describe the realized connectivity between geographically isolated demes. In this paper we implement this approach and discuss its implications for managing populations of the endangered Andros Island Rock Iguana, Cyclura cychlura cychlura. This iguana is endemic to Andros, a highly fragmented landmass of large islands and smaller cays. Field observations suggest that geographically isolated demes were panmictic due to high, inferred rates of gene flow. We expand on these observations using 16 polymorphic microsatellites to investigate the genetic structure and rates of gene flow from 188 Andros Iguanas collected across 23 island sites. Bayesian clustering of specimens assigned individuals to three distinct genotypic clusters. An analysis of molecular variance (AMOVA) indicates that allele frequency differences are responsible for a significant portion of the genetic variance across the three defined clusters (Fst =  0.117, p<0.01). These clusters are associated with larger islands and satellite cays isolated by broad water channels with strong currents. These findings imply that broad water channels present greater obstacles to gene flow than was inferred from field observation alone. Additionally, rates of gene flow were indirectly estimated using BAYESASS 3.0. The proportion of individuals originating from within each identified cluster varied from 94.5 to 98.7%, providing further support for local isolation. Our assessment reveals a major disparity between inferred and realized gene flow. We discuss our results in a conservation perspective for species inhabiting highly fragmented landscapes.

  7. Integrated Consensus Map of Cultivated Peanut and Wild Relatives Reveals Structures of the A and B Genomes of Arachis and Divergence of the Legume Genomes

    PubMed Central

    Shirasawa, Kenta; Bertioli, David J.; Varshney, Rajeev K.; Moretzsohn, Marcio C.; Leal-Bertioli, Soraya C. M.; Thudi, Mahendar; Pandey, Manish K.; Rami, Jean-Francois; Foncéka, Daniel; Gowda, Makanahally V. C.; Qin, Hongde; Guo, Baozhu; Hong, Yanbin; Liang, Xuanqiang; Hirakawa, Hideki; Tabata, Satoshi; Isobe, Sachiko

    2013-01-01

    The complex, tetraploid genome structure of peanut (Arachis hypogaea) has obstructed advances in genetics and genomics in the species. The aim of this study is to understand the genome structure of Arachis by developing a high-density integrated consensus map. Three recombinant inbred line populations derived from crosses between the A genome diploid species, Arachis duranensis and Arachis stenosperma; the B genome diploid species, Arachis ipaënsis and Arachis magna; and between the AB genome tetraploids, A. hypogaea and an artificial amphidiploid (A. ipaënsis × A. duranensis)4×, were used to construct genetic linkage maps: 10 linkage groups (LGs) of 544 cM with 597 loci for the A genome; 10 LGs of 461 cM with 798 loci for the B genome; and 20 LGs of 1442 cM with 1469 loci for the AB genome. The resultant maps plus 13 published maps were integrated into a consensus map covering 2651 cM with 3693 marker loci which was anchored to 20 consensus LGs corresponding to the A and B genomes. The comparative genomics with genome sequences of Cajanus cajan, Glycine max, Lotus japonicus, and Medicago truncatula revealed that the Arachis genome has segmented synteny relationship to the other legumes. The comparative maps in legumes, integrated tetraploid consensus maps, and genome-specific diploid maps will increase the genetic and genomic understanding of Arachis and should facilitate molecular breeding. PMID:23315685

  8. Integrated consensus map of cultivated peanut and wild relatives reveals structures of the A and B genomes of Arachis and divergence of the legume genomes.

    PubMed

    Shirasawa, Kenta; Bertioli, David J; Varshney, Rajeev K; Moretzsohn, Marcio C; Leal-Bertioli, Soraya C M; Thudi, Mahendar; Pandey, Manish K; Rami, Jean-Francois; Foncéka, Daniel; Gowda, Makanahally V C; Qin, Hongde; Guo, Baozhu; Hong, Yanbin; Liang, Xuanqiang; Hirakawa, Hideki; Tabata, Satoshi; Isobe, Sachiko

    2013-04-01

    The complex, tetraploid genome structure of peanut (Arachis hypogaea) has obstructed advances in genetics and genomics in the species. The aim of this study is to understand the genome structure of Arachis by developing a high-density integrated consensus map. Three recombinant inbred line populations derived from crosses between the A genome diploid species, Arachis duranensis and Arachis stenosperma; the B genome diploid species, Arachis ipaënsis and Arachis magna; and between the AB genome tetraploids, A. hypogaea and an artificial amphidiploid (A. ipaënsis × A. duranensis)(4×), were used to construct genetic linkage maps: 10 linkage groups (LGs) of 544 cM with 597 loci for the A genome; 10 LGs of 461 cM with 798 loci for the B genome; and 20 LGs of 1442 cM with 1469 loci for the AB genome. The resultant maps plus 13 published maps were integrated into a consensus map covering 2651 cM with 3693 marker loci which was anchored to 20 consensus LGs corresponding to the A and B genomes. The comparative genomics with genome sequences of Cajanus cajan, Glycine max, Lotus japonicus, and Medicago truncatula revealed that the Arachis genome has segmented synteny relationship to the other legumes. The comparative maps in legumes, integrated tetraploid consensus maps, and genome-specific diploid maps will increase the genetic and genomic understanding of Arachis and should facilitate molecular breeding.

  9. An improved protocol for sequencing of repetitive genomic regions and structural variations using mutagenesis and next generation sequencing.

    PubMed

    Sipos, Botond; Massingham, Tim; Stütz, Adrian M; Goldman, Nick

    2012-01-01

    The rise of Next Generation Sequencing (NGS) technologies has transformed de novo genome sequencing into an accessible research tool, but obtaining high quality eukaryotic genome assemblies remains a challenge, mostly due to the abundance of repetitive elements. These also make it difficult to study nucleotide polymorphism in repetitive regions, including certain types of structural variations. One solution proposed for resolving such regions is Sequence Assembly aided by Mutagenesis (SAM), which relies on the fact that introducing enough random mutations breaks the repetitive structure, making assembly possible. Sequencing many different mutated copies permits the sequence of the repetitive region to be inferred by consensus methods. However, this approach relies on molecular cloning in order to isolate and amplify individual mutant copies, making it hard to scale-up the approach for use in conjunction with high-throughput sequencing technologies. To address this problem, we propose NG-SAM, a modified version of the SAM protocol that relies on PCR and dilution steps only, coupled to a NGS workflow. NG-SAM therefore has the potential to be scaled-up, e.g. using emerging microfluidics technologies. We built a realistic simulation pipeline to study the feasibility of NG-SAM, and our results suggest that under appropriate experimental conditions the approach might be successfully put into practice. Moreover, our simulations suggest that NG-SAM is capable of reconstructing robustly a wide range of potential target sequences of varying lengths and repetitive structures.

  10. BreakDancer – Identification of Genomic Structural Variation from Paired-End Read Mapping

    PubMed Central

    Fan, Xian; Abbott, Travis E.; Larson, David; Chen, Ken

    2014-01-01

    The advent of the next-generation sequencing data has made it possible to cost-effectively detect and characterize genomic variation in human genomes. Structural variation, including deletion, duplication, insertion, inversion and translocation, is of great importance to human genetics due to its association with many genetic diseases. BreakDancer is a bioinformatics tool that relates paired-end read alignments from a test genome to the reference genome for the purpose of comprehensively and accurately detecting various types of structural variation. PMID:25152801

  11. Towards fully automated structure-based function prediction in structural genomics: a case study.

    PubMed

    Watson, James D; Sanderson, Steve; Ezersky, Alexandra; Savchenko, Alexei; Edwards, Aled; Orengo, Christine; Joachimiak, Andrzej; Laskowski, Roman A; Thornton, Janet M

    2007-04-13

    As the global Structural Genomics projects have picked up pace, the number of structures annotated in the Protein Data Bank as hypothetical protein or unknown function has grown significantly. A major challenge now involves the development of computational methods to assign functions to these proteins accurately and automatically. As part of the Midwest Center for Structural Genomics (MCSG) we have developed a fully automated functional analysis server, ProFunc, which performs a battery of analyses on a submitted structure. The analyses combine a number of sequence-based and structure-based methods to identify functional clues. After the first stage of the Protein Structure Initiative (PSI), we review the success of the pipeline and the importance of structure-based function prediction. As a dataset, we have chosen all structures solved by the MCSG during the 5 years of the first PSI. Our analysis suggests that two of the structure-based methods are particularly successful and provide examples of local similarity that is difficult to identify using current sequence-based methods. No one method is successful in all cases, so, through the use of a number of complementary sequence and structural approaches, the ProFunc server increases the chances that at least one method will find a significant hit that can help elucidate function. Manual assessment of the results is a time-consuming process and subject to individual interpretation and human error. We present a method based on the Gene Ontology (GO) schema using GO-slims that can allow the automated assessment of hits with a success rate approaching that of expert manual assessment.

  12. Inferences of mantle viscosity based on ice age data sets: Radial structure

    NASA Astrophysics Data System (ADS)

    Lau, Harriet C. P.; Mitrovica, Jerry X.; Austermann, Jacqueline; Crawford, Ophelia; Al-Attar, David; Latychev, Konstantin

    2016-10-01

    We perform joint nonlinear inversions of glacial isostatic adjustment (GIA) data, including the following: postglacial decay times in Canada and Scandinavia, the Fennoscandian relaxation spectrum (FRS), late-Holocene differential sea level (DSL) highstands (based on recent compilations of Australian sea level histories), and the rate of change of the degree 2 zonal harmonic of the geopotential, J2. Resolving power analyses demonstrate the following: (1) the FRS constrains mean upper mantle viscosity to be ˜3 × 1020 Pa s, (2) postglacial decay time data require the average viscosity in the top ˜1500 km of the mantle to be 1021 Pa s, and (3) the J2 datum constrains mean lower mantle viscosity to be ˜5 × 1021 Pa s. To reconcile (2) and (3), viscosity must increase to 1022-1023 Pa s in the deep mantle. Our analysis highlights the importance of accurately correcting the J2 observation for modern glacier melting in order to robustly infer deep mantle viscosity. We also perform a large series of forward calculations to investigate the compatibility of the GIA data sets with a viscosity jump within the lower mantle, as suggested by geodynamic and seismic studies, and conclude that the GIA data may accommodate a sharp jump of 1-2 orders of magnitude in viscosity across a boundary placed in a depth range of 1000-1700 km but does not require such a feature. Finally, we find that no 1-D viscosity profile appears capable of simultaneously reconciling the DSL highstand data and suggest that this discord is likely due to laterally heterogeneous mantle viscosity, an issue we explore in a companion study.

  13. Comparative Genomics of Sibling Fungal Pathogenic Taxa Identifies Adaptive Evolution without Divergence in Pathogenicity Genes or Genomic Structure

    PubMed Central

    Sillo, Fabiano; Garbelotto, Matteo; Friedman, Maria; Gonthier, Paolo

    2015-01-01

    It has been estimated that the sister plant pathogenic fungal species Heterobasidion irregulare and Heterobasidion annosum may have been allopatrically isolated for 34–41 Myr. They are now sympatric due to the introduction of the first species from North America into Italy, where they freely hybridize. We used a comparative genomic approach to 1) confirm that the two species are distinct at the genomic level; 2) determine which gene groups have diverged the most and the least between species; 3) show that their overall genomic structures are similar, as predicted by the viability of hybrids, and identify genomic regions that instead are incongruent; and 4) test the previously formulated hypothesis that genes involved in pathogenicity may be less divergent between the two species than genes involved in saprobic decay and sporulation. Results based on the sequencing of three genomes per species identified a high level of interspecific similarity, but clearly confirmed the status of the two as distinct taxa. Genes involved in pathogenicity were more conserved between species than genes involved in saprobic growth and sporulation, corroborating at the genomic level that invasiveness may be determined by the two latter traits, as documented by field and inoculation studies. Additionally, the majority of genes under positive selection and the majority of genes bearing interspecific structural variations were involved either in transcriptional or in mitochondrial functions. This study provides genomic-level evidence that invasiveness of pathogenic microbes can be attained without the high levels of pathogenicity presumed to exist for pathogens challenging naïve hosts. PMID:26527650

  14. Nuclear species-diagnostic SNP markers mined from 454 amplicon sequencing reveal admixture genomic structure of modern citrus varieties.

    PubMed

    Curk, Franck; Ancillo, Gema; Ollitrault, Frédérique; Perrier, Xavier; Jacquemoud-Collet, Jean-Pierre; Garcia-Lor, Andres; Navarro, Luis; Ollitrault, Patrick

    2015-01-01

    Most cultivated Citrus species originated from interspecific hybridisation between four ancestral taxa (C. reticulata, C. maxima, C. medica, and C. micrantha) with limited further interspecific recombination due to vegetative propagation. This evolution resulted in admixture genomes with frequent interspecific heterozygosity. Moreover, a major part of the phenotypic diversity of edible citrus results from the initial differentiation between these taxa. Deciphering the phylogenomic structure of citrus germplasm is therefore essential for an efficient utilization of citrus biodiversity in breeding schemes. The objective of this work was to develop a set of species-diagnostic single nucleotide polymorphism (SNP) markers for the four Citrus ancestral taxa covering the nine chromosomes, and to use these markers to infer the phylogenomic structure of secondary species and modern cultivars. Species-diagnostic SNPs were mined from 454 amplicon sequencing of 57 gene fragments from 26 genotypes of the four basic taxa. Of the 1,053 SNPs mined from 28,507 kb sequence, 273 were found to be highly diagnostic for a single basic taxon. Species-diagnostic SNP markers (105) were used to analyse the admixture structure of varieties and rootstocks. This revealed C. maxima introgressions in most of the old and in all recent selections of mandarins, and suggested that C. reticulata × C. maxima reticulation and introgression processes were important in edible mandarin domestication. The large range of phylogenomic constitutions between C. reticulata and C. maxima revealed in mandarins, tangelos, tangors, sweet oranges, sour oranges, grapefruits, and orangelos is favourable for genetic association studies based on phylogenomic structures of the germplasm. Inferred admixture structures were in agreement with previous hypotheses regarding the origin of several secondary species and also revealed the probable origin of several acid citrus varieties. The developed species-diagnostic SNP

  15. Nuclear Species-Diagnostic SNP Markers Mined from 454 Amplicon Sequencing Reveal Admixture Genomic Structure of Modern Citrus Varieties

    PubMed Central

    Curk, Franck; Ancillo, Gema; Ollitrault, Frédérique; Perrier, Xavier; Jacquemoud-Collet, Jean-Pierre; Garcia-Lor, Andres; Navarro, Luis; Ollitrault, Patrick

    2015-01-01

    Most cultivated Citrus species originated from interspecific hybridisation between four ancestral taxa (C. reticulata, C. maxima, C. medica, and C. micrantha) with limited further interspecific recombination due to vegetative propagation. This evolution resulted in admixture genomes with frequent interspecific heterozygosity. Moreover, a major part of the phenotypic diversity of edible citrus results from the initial differentiation between these taxa. Deciphering the phylogenomic structure of citrus germplasm is therefore essential for an efficient utilization of citrus biodiversity in breeding schemes. The objective of this work was to develop a set of species-diagnostic single nucleotide polymorphism (SNP) markers for the four Citrus ancestral taxa covering the nine chromosomes, and to use these markers to infer the phylogenomic structure of secondary species and modern cultivars. Species-diagnostic SNPs were mined from 454 amplicon sequencing of 57 gene fragments from 26 genotypes of the four basic taxa. Of the 1,053 SNPs mined from 28,507 kb sequence, 273 were found to be highly diagnostic for a single basic taxon. Species-diagnostic SNP markers (105) were used to analyse the admixture structure of varieties and rootstocks. This revealed C. maxima introgressions in most of the old and in all recent selections of mandarins, and suggested that C. reticulata × C. maxima reticulation and introgression processes were important in edible mandarin domestication. The large range of phylogenomic constitutions between C. reticulata and C. maxima revealed in mandarins, tangelos, tangors, sweet oranges, sour oranges, grapefruits, and orangelos is favourable for genetic association studies based on phylogenomic structures of the germplasm. Inferred admixture structures were in agreement with previous hypotheses regarding the origin of several secondary species and also revealed the probable origin of several acid citrus varieties. The developed species-diagnostic SNP

  16. Real-Time Pathogen Detection in the Era of Whole-Genome Sequencing and Big Data: Comparison of k-mer and Site-Based Methods for Inferring the Genetic Distances among Tens of Thousands of Salmonella Samples

    PubMed Central

    Pettengill, James B.; Pightling, Arthur W.; Baugher, Joseph D.; Rand, Hugh; Strain, Errol

    2016-01-01

    The adoption of whole-genome sequencing within the public health realm for molecular characterization of bacterial pathogens has been followed by an increased emphasis on real-time detection of emerging outbreaks (e.g., food-borne Salmonellosis). In turn, large databases of whole-genome sequence data are being populated. These databases currently contain tens of thousands of samples and are expected to grow to hundreds of thousands within a few years. For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genetic distances among a set of samples. Being able to do so is challenging due to both biological (evolutionary diverse samples) and computational (petabytes of sequence data) issues. We evaluated seven measures of genetic distance, which were estimated from either k-mer profiles (Jaccard, Euclidean, Manhattan, Mash Jaccard, and Mash distances) or nucleotide sites (NUCmer and an extended multi-locus sequence typing (MLST) scheme). When analyzing empirical data (whole-genome sequence data from 18,997 Salmonella isolates) there are features (e.g., genomic, assembly, and contamination) that cause distances inferred from k-mer profiles, which treat absent data as informative, to fail to accurately capture the distance between samples when compared to distances inferred from differences in nucleotide sites. Thus, site-based distances, like NUCmer and extended MLST, are superior in performance, but accessing the computing resources necessary to perform them may be challenging when analyzing large databases. PMID:27832109

  17. Real-Time Pathogen Detection in the Era of Whole-Genome Sequencing and Big Data: Comparison of k-mer and Site-Based Methods for Inferring the Genetic Distances among Tens of Thousands of Salmonella Samples.

    PubMed

    Pettengill, James B; Pightling, Arthur W; Baugher, Joseph D; Rand, Hugh; Strain, Errol

    2016-01-01

    The adoption of whole-genome sequencing within the public health realm for molecular characterization of bacterial pathogens has been followed by an increased emphasis on real-time detection of emerging outbreaks (e.g., food-borne Salmonellosis). In turn, large databases of whole-genome sequence data are being populated. These databases currently contain tens of thousands of samples and are expected to grow to hundreds of thousands within a few years. For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genetic distances among a set of samples. Being able to do so is challenging due to both biological (evolutionary diverse samples) and computational (petabytes of sequence data) issues. We evaluated seven measures of genetic distance, which were estimated from either k-mer profiles (Jaccard, Euclidean, Manhattan, Mash Jaccard, and Mash distances) or nucleotide sites (NUCmer and an extended multi-locus sequence typing (MLST) scheme). When analyzing empirical data (whole-genome sequence data from 18,997 Salmonella isolates) there are features (e.g., genomic, assembly, and contamination) that cause distances inferred from k-mer profiles, which treat absent data as informative, to fail to accurately capture the distance between samples when compared to distances inferred from differences in nucleotide sites. Thus, site-based distances, like NUCmer and extended MLST, are superior in performance, but accessing the computing resources necessary to perform them may be challenging when analyzing large databases.

  18. Population structure and comparative genome hybridization of European flor yeast reveal a unique group of Saccharomyces cerevisiae strains with few gene duplications in their genome.

    PubMed

    Legras, Jean-Luc; Erny, Claude; Charpentier, Claudine

    2014-01-01

    Wine biological aging is a wine making process used to produce specific beverages in several countries in Europe, including Spain, Italy, France, and Hungary. This process involves the formation of a velum at the surface of the wine. Here, we present the first large scale comparison of all European flor strains involved in this process. We inferred the population structure of these European flor strains from their microsatellite genotype diversity and analyzed their ploidy. We show that almost all of these flor strains belong to the same cluster and are diploid, except for a few Spanish strains. Comparison of the array hybridization profile of six flor strains originating from these four countries, with that of three wine strains did not reveal any large segmental amplification. Nonetheless, some genes, including YKL221W/MCH2 and YKL222C, were amplified in the genome of four out of six flor strains. Finally, we correlated ICR1 ncRNA and FLO11 polymorphisms with flor yeast population structure, and associate the presence of wild type ICR1 and a long Flo11p with thin velum formation in a cluster of Jura strains. These results provide new insight into the diversity of flor yeast and show that combinations of different adaptive changes can lead to an increase of hydrophobicity and affect velum formation.

  19. Population Structure and Comparative Genome Hybridization of European Flor Yeast Reveal a Unique Group of Saccharomyces cerevisiae Strains with Few Gene Duplications in Their Genome

    PubMed Central

    Legras, Jean-Luc; Erny, Claude; Charpentier, Claudine

    2014-01-01

    Wine biological aging is a wine making process used to produce specific beverages in several countries in Europe, including Spain, Italy, France, and Hungary. This process involves the formation of a velum at the surface of the wine. Here, we present the first large scale comparison of all European flor strains involved in this process. We inferred the population structure of these European flor strains from their microsatellite genotype diversity and analyzed their ploidy. We show that almost all of these flor strains belong to the same cluster and are diploid, except for a few Spanish strains. Comparison of the array hybridization profile of six flor strains originating from these four countries, with that of three wine strains did not reveal any large segmental amplification. Nonetheless, some genes, including YKL221W/MCH2 and YKL222C, were amplified in the genome of four out of six flor strains. Finally, we correlated ICR1 ncRNA and FLO11 polymorphisms with flor yeast population structure, and associate the presence of wild type ICR1 and a long Flo11p with thin velum formation in a cluster of Jura strains. These results provide new insight into the diversity of flor yeast and show that combinations of different adaptive changes can lead to an increase of hydrophobicity and affect velum formation. PMID:25272156

  20. Inferring R0 in emerging epidemics—the effect of common population structure is small

    PubMed Central

    Ball, Frank; Dhersin, Jean-Stéphane; Tran, Viet Chi; Wallinga, Jacco; Britton, Tom

    2016-01-01

    When controlling an emerging outbreak of an infectious disease, it is essential to know the key epidemiological parameters, such as the basic reproduction number R0 and the control effort required to prevent a large outbreak. These parameters are estimated from the observed incidence of new cases and information about the infectious contact structures of the population in which the disease spreads. However, the relevant infectious contact structures for new, emerging infections are often unknown or hard to obtain. Here, we show that, for many common true underlying heterogeneous contact structures, the simplification to neglect such structures and instead assume that all contacts are made homogeneously in the whole population results in conservative estimates for R0 and the required control effort. This means that robust control policies can be planned during the early stages of an outbreak, using such conservative estimates of the required control effort. PMID:27581480

  1. Population structure and genotypic variation of Crataegus pontica inferred by molecular markers.

    PubMed

    Rahmani, Mohammad-Shafie; Shabanian, Naghi; Khadivi-Khub, Abdollah; Woeste, Keith E; Badakhshan, Hedieh; Alikhani, Leila

    2015-11-01

    Information about the natural patterns of genetic variability and their evolutionary bases are of fundamental practical importance for sustainable forest management and conservation. In the present study, the genetic diversity of 164 individuals from fourteen natural populations of Crataegus pontica K.Koch was assessed for the first time using three genome-based molecular techniques; inter-retrotransposon amplified polymorphism (IRAP); inter-simple sequence repeats (ISSR) and start codon targeted (SCoT) polymorphism. IRAP, ISSR and SCoT analyses yielded 126, 254 and 199 scorable amplified bands, respectively, of which 90.48, 93.37 and 83.78% were polymorphic. ISSR revealed efficiency over IRAP and SCoT due to high effective multiplex ratio, marker index and resolving power. The dendrograms based on the markers used and combined data divided individuals into three major clusters. The correlation between the coefficient matrices for the IRAP, ISSR and SCoT data was significant. A higher level of genetic variation was observed within populations than among populations based on the markers used. The lower divergence levels depicted among the studied populations could be seen as evidence of gene flow. The promotion of gene exchange will be very beneficial to conserve and utilize the enormous genetic variability.

  2. Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase.

    PubMed Central

    Clark, A G; Weiss, K M; Nickerson, D A; Taylor, S L; Buchanan, A; Stengård, J; Salomaa, V; Vartiainen, E; Perola, M; Boerwinkle, E; Sing, C F

    1998-01-01

    Allelic variation in 9.7 kb of genomic DNA sequence from the human lipoprotein lipase gene (LPL) was scored in 71 healthy individuals (142 chromosomes) from three populations: African Americans (24) from Jackson, MS; Finns (24) from North Karelia, Finland; and non-Hispanic Whites (23) from Rochester, MN. The sequences had a total of 88 variable sites, with a nucleotide diversity (site-specific heterozygosity) of .002+/-.001 across this 9.7-kb region. The frequency spectrum of nucleotide variation exhibited a slight excess of heterozygosity, but, in general, the data fit expectations of the infinite-sites model of mutation and genetic drift. Allele-specific PCR helped resolve linkage phases, and a total of 88 distinct haplotypes were identified. For 1,410 (64%) of the 2,211 site pairs, all four possible gametes were present in these haplotypes, reflecting a rich history of past recombination. Despite the strong evidence for recombination, extensive linkage disequilibrium was observed. The number of haplotypes generally is much greater than the number expected under the infinite-sites model, but there was sufficient multisite linkage disequilibrium to reveal two major clades, which appear to be very old. Variation in this region of LPL may depart from the variation expected under a simple, neutral model, owing to complex historical patterns of population founding, drift, selection, and recombination. These data suggest that the design and interpretation of disease-association studies may not be as straightforward as often is assumed. PMID:9683608

  3. Crystal structures of Thermotoga maritima reverse gyrase: inferences for the mechanism of positive DNA supercoiling

    PubMed Central

    Rudolph, Markus G.; del Toro Duany, Yoandris; Jungblut, Stefan P.; Ganguly, Agneyo; Klostermeier, Dagmar

    2013-01-01

    Reverse gyrase is an ATP-dependent topoisomerase that is unique to hyperthermophilic archaea and eubacteria. The only reverse gyrase structure determined to date has revealed the arrangement of the N-terminal helicase domain and the C-terminal topoisomerase domain that intimately cooperate to generate the unique function of positive DNA supercoiling. Although the structure has elicited hypotheses as to how supercoiling may be achieved, it lacks structural elements important for supercoiling and the molecular mechanism of positive supercoiling is still not clear. We present five structures of authentic Thermotoga maritima reverse gyrase that reveal a first view of two interacting zinc fingers that are crucial for positive DNA supercoiling. The so-called latch domain, which connects the helicase and the topoisomerase domains is required for their functional cooperation and presents a novel fold. Structural comparison defines mobile regions in parts of the helicase domain, including a helical insert and the latch that are likely important for DNA binding during catalysis. We show that the latch, the helical insert and the zinc fingers contribute to the binding of DNA to reverse gyrase and are uniquely placed within the reverse gyrase structure to bind and guide DNA during strand passage. A possible mechanism for positive supercoiling by reverse gyrases is presented. PMID:23209025

  4. HorA web server to infer homology between proteins using sequence and structural similarity

    PubMed Central

    Kim, Bong-Hyun; Cheng, Hua; Grishin, Nick V.

    2009-01-01

    The biological properties of proteins are often gleaned through comparative analysis of evolutionary relatives. Although protein structure similarity search methods detect more distant homologs than purely sequence-based methods, structural resemblance can result from either homology (common ancestry) or analogy (similarity without common ancestry). While many existing web servers detect structural neighbors, they do not explicitly address the question of homology versus analogy. Here, we present a web server named HorA (Homology or Analogy) that identifies likely homologs for a query protein structure. Unlike other servers, HorA combines sequence information from state-of-the-art profile methods with structure information from spatial similarity measures using an advanced computational technique. HorA aims to identify biologically meaningful connections rather than purely 3D-geometric similarities. The HorA method finds ∼90% of remote homologs defined in the manually curated database SCOP. HorA will be especially useful for finding remote homologs that might be overlooked by other sequence or structural similarity search servers. The HorA server is available at http://prodata.swmed.edu/horaserver. PMID:19417074

  5. Vive la différence: naming structural variants in the human reference genome.

    PubMed

    Seal, Ruth L; Wright, Mathew W; Gray, Kristian A; Bruford, Elspeth A

    2013-05-01

    The HUGO Gene Nomenclature Committee has approved gene symbols for the majority of protein-coding genes on the human reference genome. To adequately represent regions of complex structural variation, the Genome Reference Consortium now includes alternative representations of some of these regions as part of the reference genome. Here, we describe examples of how we name novel genes in these regions and how this nomenclature is displayed on our website, http://genenames.org.

  6. Genetic diversity and population structure in Bactrocera correcta (Diptera: Tephritidae) inferred from mtDNA cox1 and microsatellite markers

    PubMed Central

    Qin, Yu-Jia; Buahom, Nopparat; Krosch, Matthew N.; Du, Yu; Wu, Yi; Malacrida, Anna R.; Deng, Yu-Liang; Liu, Jia-Qi; Jiang, Xiao-Long; Li, Zhi-Hong

    2016-01-01

    Bactrocera correcta is one of the most destructive pests of horticultural crops in tropical and subtropical regions. Despite the economic risk, the population genetics of this pest have remained relatively unexplored. This study explores population genetic structure and contemporary gene flow in B. correcta in Chinese Yunnan Province and attempts to place observed patterns within the broader geographical context of the species’ total range. Based on combined data from mtDNA cox1 sequences and 12 microsatellite loci obtained from 793 individuals located in 7 countries, overall genetic structuring was low. The expansion history of this species, including likely human-mediated dispersal, may have played a role in shaping the observed weak structure. The study suggested a close relationship between Yunnan Province and adjacent countries, with evidence for Western and/or Southern Yunnan as the invasive origin of B. correcta within Yunnan Province. The information gleaned from this analysis of gene flow and population structure has broad implications for quarantine, trade and management of this pest, especially in China where it is expanding northward. Future studies should concentrate effort on sampling South Asian populations, which would enable better inferences of the ancestral location of B. correcta and its invasion history into and throughout Asia. PMID:27929126

  7. Inferring Horizontal Gene Transfer

    PubMed Central

    Lassalle, Florent; Dessimoz, Christophe

    2015-01-01

    Horizontal or Lateral Gene Transfer (HGT or LGT) is the transmission of portions of genomic DNA between organisms through a process decoupled from vertical inheritance. In the presence of HGT events, different fragments of the genome are the result of different evolutionary histories. This can therefore complicate the investigations of evolutionary relatedness of lineages and species. Also, as HGT can bring into genomes radically different genotypes from distant lineages, or even new genes bearing new functions, it is a major source of phenotypic innovation and a mechanism of niche adaptation. For example, of particular relevance to human health is the lateral transfer of antibiotic resistance and pathogenicity determinants, leading to the emergence of pathogenic lineages [1]. Computational identification of HGT events relies upon the investigation of sequence composition or evolutionary history of genes. Sequence composition-based ("parametric") methods search for deviations from the genomic average, whereas evolutionary history-based ("phylogenetic") approaches identify genes whose evolutionary history significantly differs from that of the host species. The evaluation and benchmarking of HGT inference methods typically rely upon simulated genomes, for which the true history is known. On real data, different methods tend to infer different HGT events, and as a result it can be difficult to ascertain all but simple and clear-cut HGT events. PMID:26020646

  8. Crustal structure beneath the Japanese Islands inferred from receiver function analysis using similar earthquakes

    NASA Astrophysics Data System (ADS)

    Igarashi, Toshihiro

    2016-04-01

    The stress concentration and strain accumulation process due to inter-plate coupling of the subducting plate should have a large effect on inland shallow earthquakes that occur in the overriding plate. Information on the crustal structure and the crustal thickness is important to understanding their process. In this study, I applied receiver function analysis using similar earthquakes to estimate the crustal velocity structures beneath the Japanese Islands. Because similar earthquakes are caused repeatedly at almost the same place, they are useful for extracting information on spatial distribution and temporal changes of seismic velocity structures beneath the seismic stations. I used telemetric seismographic network data covered the Japanese Islands and moderate-sized similar earthquakes which occurred in the southern Hemisphere with epicentral distances between 30 and 90 degrees for about 26 years from October 1989. Data analysis was performed separately before and after the 2011 Tohoku-Oki earthquake. To identify the spatial distribution of crustal structure, I searched for the best-correlated model between an observed receiver function at each station and synthetic ones by using a grid search method. As results, I clarified the spatial distribution of the crustal velocity structures. The spatial patterns of velocities from the ground surface to 5 km deep are corresponding with basement depth models although the velocities are slower than those of tomography models. They indicate thick sediment layers in several plain and basin areas. The crustal velocity perturbations are consistent with existing tomography models. The active volcanoes correspond low-velocity zones from the upper crust to the crust-mantle transition. A comparison of the crustal structure before and after the 2011 Tohoku-Oki earthquake suggests that the northeastern Japan arc changed to lower velocities in some areas. This kind of velocity changes might be due to other effects such as changes of

  9. Definitions of enzyme function for the structural genomics era.

    PubMed

    Babbitt, Patricia C

    2003-04-01

    Questions are being asked about how enzyme function is described at the molecular level and the strengths and weaknesses of the EC system for this purpose. A new approach to describing enzyme function has been proposed that might improve our capabilities for functional inference for members of enzyme superfamilies.

  10. Genetic structure of Mesoamerican populations of Big-leaf mahogany (Swietenia macrophylla) inferred from microsatellite analysis.

    PubMed

    Novick, Rachel Roth; Dick, Christopher W; Lemes, Maristerra R; Navarro, Carlos; Caccone, Adalgisa; Bermingham, Eldredge

    2003-11-01

    While microsatellites have been used to examine genetic structure in local populations of Neotropical trees, genetic studies based on such high-resolution markers have not been carried out for Mesoamerica as a whole. Here we assess the genetic structure of the Mesoamerican mahogany Swietenia macrophylla King (big-leaf mahogany), a Neotropical tree species recently listed as endangered in CITES which is commercially extinct through much of its native range. We used seven variable microsatellite loci to assess genetic diversity and population structure in eight naturally established mahogany populations from six Mesoamerican countries. Measures of genetic differentiation (FST and RST) indicated significant differences between most populations. Unrooted dendrograms based on genetic distances between populations provide evidence of strong phylogeographic structure in Mesoamerican mahogany. The two populations on the Pacific coasts of Costa Rica and Panama were genetically distant from all the others, and from one another. The remaining populations formed two clusters, one comprised of the northern populations of Mexico, Belize and Guatemala and the other containing the southern Atlantic populations of Nicaragua and Costa Rica. Significant correlation was found between geographical distance and all pairwise measures of genetic divergence, suggesting the importance of regional biogeography and isolation by distance in Mesoamerican mahogany. The results of this study demonstrate greater phylogeographic structure than has been found across Amazon basin S. macrophylla. Our findings suggest a relatively complex Mesoamerican biogeographic history and lead to the prediction that other Central American trees will show similar patterns of regional differentiation.

  11. Recent Developments in Parameter Estimation and Structure Identification of Biochemical and Genomic Systems

    PubMed Central

    Chou, I-Chun; Voit, Eberhard O.

    2009-01-01

    The organization, regulation and dynamical responses of biological systems are in many cases too complex to allow intuitive predictions and require the support of mathematical modeling for quantitative assessments and a reliable understanding of system functioning. All steps of constructing mathematical models for biological systems are challenging, but arguably the most difficult task among them is the estimation of model parameters and the identification of the structure and regulation of the underlying biological networks. Recent advancements in modern high-throughput techniques have been allowing the generation of time series data that characterize the dynamics of genomic, proteomic, metabolic, and physiological responses and enable us, at least in principle, to tackle estimation and identification tasks using “top-down” or “inverse” approaches. While the rewards of a successful inverse estimation or identification are great, the process of extracting structural and regulatory information is technically difficult. The challenges can generally be categorized into four areas, namely, issues related to the data, the model, the mathematical structure of the system, and the optimization and support algorithms. Many recent articles have addressed inverse problems within the modeling framework of Biochemical Systems Theory (BST). BST was chosen for these tasks because of its unique structural flexibility and the fact that the structure and regulation of a biological system are mapped essentially one-to-one onto the parameters of the describing model. The proposed methods mainly focused on various optimization algorithms, but also on support techniques, including methods for circumventing the time consuming numerical integration of systems of differential equations, smoothing overly noisy data, estimating slopes of time series, reducing the complexity of the inference task, and constraining the parameter search space. Other methods targeted issues of data

  12. The thermal structure of Saturn: Inferences from ground-based and airborne infrared observations

    NASA Technical Reports Server (NTRS)

    Tokunaga, A.

    1978-01-01

    Spectroscopic and photometric infrared observations of Saturn are reviewed and compared to the expected flux from thermal structure models. Large uncertainties exist in the far-infrared measurements, but the available data indicate that the effective temperature of the disk of Saturn is 90 + or - 5 K. The thermal structure models proposed by Tokunaga and Cess and by Gautier et al. (model 'N') agree best with the observations. North-South limb scans of Saturn at 10 and 20 micrometers show that the temperature inversion is much stronger at the South polar region than at the equator.

  13. Micro and nanofluidic structures for cell sorting and genomic analysis

    NASA Astrophysics Data System (ADS)

    Morton, Keith J.

    Microfluidic systems promise rapid analysis of small samples in a compact and inexpensive format. But direct scaling of lab bench protocols on-chip is challenging because laminar flows in typical microfluidic devices are characterized by non-mixing streamlines. Common microfluidic mixers and sorters work by diffusion, limiting application to objects that diffuse slowly such as cells and DNA. Recently Huang et.al. developed a passive microfluidic element to continuously separate bio-particles deterministically. In Deterministic Lateral Displacement (DLD), objects are sorted by size as they transit an asymmetric array of microfabricated posts. This thesis further develops DLD arrays with applications in three broad new areas. First the arrays are used, not simply to sort particles, but to move streams of cells through functional flows for chemical treatment---such as on-chip immunofluorescent labeling of blood cells with washing, and on-chip E.coli cell lysis with simultaneous chromosome extraction. Secondly, modular tiling of the basic DLD element is used to construct complex particle handling modes that include beam steering for jets of cells and beads. Thirdly, nanostructured DLD arrays are built using Nanoimprint Lithography (NIL) and continuous-flow separation of 100 nm and 200 nm size particles is demonstrated. Finally a number of ancillary nanofabrication techniques were developed in support of these overall goals, including methods to interface nanofluidic structures with standard microfluidic components such as inlet channels and reservoirs, precision etching of ultra-high aspect ratio (>50:1) silicon nanostructures, and fabrication of narrow (˜ 35 nm) channels used to stretch genomic length DNA.

  14. Genome structure and dynamics of the yeast pathogen Candida glabrata

    PubMed Central

    Ahmad, Khadija M; Kokošar, Janez; Guo, Xiaoxian; Gu, Zhenglong; Ishchuk, Olena P; Piškur, Jure

    2014-01-01

    The yeast pathogen Candida glabrata is the second most frequent cause of Candida infections. However, from the phylogenetic point of view, C. glabrata is much closer to Saccharomyces cerevisiae than to Candida albicans. Apparently, this yeast has relatively recently changed its life style and become a successful opportunistic pathogen. Recently, several C. glabrata sister species, among them clinical and environmental isolates, have had their genomes characterized. Also, hundreds of C. glabrata clinical isolates have been characterized for their genomes. These isolates display enormous genomic plasticity. The number and size of chromosomes vary drastically, as well as intra- and interchromosomal segmental duplications occur frequently. The observed genome alterations could affect phenotypic properties and thus help to adapt to the highly variable and harsh habitats this yeast finds in different human patients and their tissues. Further genome sequencing of pathogenic isolates will provide a valuable tool to understand the mechanisms behind genome dynamics and help to elucidate the genes contributing to the virulence potential. PMID:24528571

  15. Full-length RNA structure prediction of the HIV-1 genome reveals a conserved core domain

    PubMed Central

    Sükösd, Zsuzsanna; Andersen, Ebbe S.; Seemann, Stefan E.; Jensen, Mads Krogh; Hansen, Mathias; Gorodkin, Jan; Kjems, Jørgen

    2015-01-01

    A distance constrained secondary structural model of the ≈10 kb RNA genome of the HIV-1 has been predicted but higher-order structures, involving long distance interactions, are currently unknown. We present the first global RNA secondary structure model for the HIV-1 genome, which integrates both comparative structure analysis and information from experimental data in a full-length prediction without distance constraints. Besides recovering known structural elements, we predict several novel structural elements that are conserved in HIV-1 evolution. Our results also indicate that the structure of the HIV-1 genome is highly variable in most regions, with a limited number of stable and conserved RNA secondary structures. Most interesting, a set of long distance interactions form a core organizing structure (COS) that organize the genome into three major structural domains. Despite overlapping protein-coding regions the COS is supported by a particular high frequency of compensatory base changes, suggesting functional importance for this element. This new structural element potentially organizes the whole genome into three major domains protruding from a conserved core structure with potential roles in replication and evolution for the virus. PMID:26476446

  16. Full-length RNA structure prediction of the HIV-1 genome reveals a conserved core domain.

    PubMed

    Sükösd, Zsuzsanna; Andersen, Ebbe S; Seemann, Stefan E; Jensen, Mads Krogh; Hansen, Mathias; Gorodkin, Jan; Kjems, Jørgen

    2015-12-02

    A distance constrained secondary structural model of the ≈10 kb RNA genome of the HIV-1 has been predicted but higher-order structures, involving long distance interactions, are currently unknown. We present the first global RNA secondary structure model for the HIV-1 genome, which integrates both comparative structure analysis and information from experimental data in a full-length prediction without distance constraints. Besides recovering known structural elements, we predict several novel structural elements that are conserved in HIV-1 evolution. Our results also indicate that the structure of the HIV-1 genome is highly variable in most regions, with a limited number of stable and conserved RNA secondary structures. Most interesting, a set of long distance interactions form a core organizing structure (COS) that organize the genome into three major structural domains. Despite overlapping protein-coding regions the COS is supported by a particular high frequency of compensatory base changes, suggesting functional importance for this element. This new structural element potentially organizes the whole genome into three major domains protruding from a conserved core structure with potential roles in replication and evolution for the virus.

  17. Population structure of Tor tor inferred from mitochondrial gene cytochrome b.

    PubMed

    Pasi, Komal Shyamakant; Lakra, W S; Bhatt, J P; Goswami, M; Malakar, A Kr

    2013-06-01

    Tor tor, commonly called as Tor mahseer, is a high-valued food and game fish endemic to trans-Himalayan region. Mitochondrial cytochrome b (cyt b) gene region of 967 bp was used to estimate the population structure of T. tor. Three populations of T. tor were collected from Narmada (Hosangabad), Ken (Madla), and Parbati river (Sheopur) in Madhya Pradesh, India. The sequence analysis revealed that the nucleotide diversity (π) was low, ranging from 0.000 to 0.0150. Haplotype diversity (h) ranged from 0.000 to 1.000. The analysis of molecular variance analysis indicated significant genetic divergence among the three populations of T. tor. Neighboring-joining tree also showed that all individuals from three populations clustered into three distinct clades. The data generated by cyt b marker revealed interesting insight about population structure of T. tor, which would serve as baseline data for conservation and management of mahseer fishery.

  18. Geophysical inferences of thermal-chemical structures in the lower mantle

    NASA Technical Reports Server (NTRS)

    Yuen, D. A.; Cadek, O.; Chopelas, A.; Matyska, C.

    1993-01-01

    Lateral variations of the temperature field in the lower mantle have been reconstructed using new results in mineral physics and seismic tomographic data. We show that, with the application of high-pressure experimental values of thermal expansivity and of sound velocities, the slow seismic anomalies in the lower mantle under the Pacific and Africa can be converted into realistic-looking plume structures with large dimensions of 0(1000 km). The outer fringes of the plumes have an excess temperature of around 400 K. In the core of the plumes are found tonguelike structures with extremely high thermal anomalies. These values can exceed 1200 K and are too high to be explained on the basis of thermal anomalies alone. We suggest that these major plumes in the deep mantle may be driven by both thermal and chemical buoyancies or that enhanced conductive heat-transfer may be important there.

  19. Annotation inconsistencies beyond sequence similarity-based function prediction - phylogeny and genome structure.

    PubMed

    Promponas, Vasilis J; Iliopoulos, Ioannis; Ouzounis, Christos A

    2015-01-01

    The function annotation process in computational biology has increasingly shifted from the traditional characterization of individual biochemical roles of protein molecules to the system-wide detection of entire metabolic pathways and genomic structures. The so-called genome-aware methods broaden misannotation inconsistencies in genome sequences beyond protein function assignments, encompassing phylogenetic anomalies and artifactual genomic regions. We outline three categories of error propagation in databases by providing striking examples - at various levels of appreciation by the community from traditional to emerging, thus raising awareness for future solutions.

  20. Structural inferences for the native skeletal muscle sodium channel as derived from patterns of endogenous proteolysis

    SciTech Connect

    Kraner, S.; Yang, J.; Barchi, R. )

    1989-08-05

    The alpha subunit (Mr approximately 260,000) of the rat skeletal muscle sodium channel is sensitive to cleavage by endogenous proteases during the isolation of muscle surface membrane. Antisera against synthetic oligopeptides were used to map the resultant fragments in order to identify protease-sensitive regions of the channel's structure in its native membrane environment. Antibodies to the amino terminus labeled major fragments of Mr approximately 130,000 and 90,000 and lesser amounts of other peptides as small as Mr approximately 12,000. Antisera to epitopes within the carboxyl-terminal half of the primary sequence recognized two fragments of Mr approximately 110,000 and 78,000. Individual antisera also selectively labeled smaller polypeptides in the most extensively cleaved preparations. The immunoreactivity patterns of monoclonal antibodies previously raised against the purified channel were then surveyed. The binding sites for one group of monoclonals, including several that recognize subtype-specific epitopes in the channel structure, were localized within a 12-kDa fragment near the amino terminus. The distribution of carbohydrate along the primary structure of the channel was also assessed by quantitating {sup 125}I-wheat germ agglutinin and 125I-concanavalin A binding to the proteolytic peptides. Most of the carbohydrate detected by these lectins was located between 22 and 90 kDa from the amino terminus of the protein. No lectin binding was detected to fragments arising from carboxyl-terminal half of the protein. These results were analyzed in terms of current models of sodium channel tertiary structure. In its normal membrane environment, the skeletal muscle sodium channel appears sensitive to cleavage by endogenous proteases in regions predicted to link the four repeat domains on the cytoplasmic side of the membrane while the repeat domains themselves are resistant to proteolysis.

  1. Phylogeography and population structure of the red stingray, Dasyatis akajei inferred by mitochondrial control region.

    PubMed

    Li, Ning; Chen, Xiao; Sun, Dianrong; Song, Na; Lin, Qin; Gao, Tianxiang

    2015-08-01

    The red stingray Dasyatis akajei is distributed in both marine and freshwater, but little is known about its phylogeography and population structure. We sampled 107 individuals from one freshwater region and 6 coastal localities within the distribution range of D. akajei. Analyses of the first hypervariable region of mitochondrial DNA control region of 474 bp revealed only 17 polymorphism sites that defined 28 haplotypes, with no unique haplotype for the freshwater population. A high level of haplotype diversity and low nucleotide diversity were observed in both marine (h = 0.9393 ± 0.0104, π = 0.0069 ± 0.0040) and freshwater populations (h = 0.8333 ± 0.2224, π = 0.0084 ± 0.0063). Significant level of genetic structure was detected between four marine populations (TZ, WZ, ND and ZZ) via both hierarchical molecular variance analysis (AMOVA) and pairwise FST (with two exceptions), which is unusual for elasmobranchs detected previously over such short geographical distance. However, limited sampling suggested that the freshwater population was not particularly distinct (p > 0.05), but additional samples would be needed to confirm it. Demersal and slow-moving characters likely have contributed to the genetically heterogeneous population structure. The demographic history of D. akajei examined by mismatch distribution analyses, neutrality tests and Bayesian skyline analyses suggested a sudden population expansion dating to upper Pleistocene. The information on genetic diversity and genetic structure will have implications for the management of fisheries and conservation efforts.

  2. Comparison of algorithms to infer genetic population structure from unlinked molecular markers.

    PubMed

    Peña-Malavera, Andrea; Bruno, Cecilia; Fernandez, Elmer; Balzarini, Monica

    2014-08-01

    Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high FST) and two numbers of sub-populations (K=3 and K=5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence (FST=0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data.

  3. Internal structure of a cold dark molecular cloud inferred from the extinction of background starlight.

    PubMed

    Alves, J F; Lada, C J; Lada, E A

    2001-01-11

    Stars and planets form within dark molecular clouds, but little is understood about the internal structure of these clouds, and consequently about the initial conditions that give rise to star and planet formation. The clouds are primarily composed of molecular hydrogen, which is virtually inaccessible to direct observation. But the clouds also contain dust, which is well mixed with the gas and which has well understood effects on the transmission of light. Here we use sensitive near-infrared measurements of the light from background stars as it is absorbed and scattered by trace amounts of dust to probe the internal structure of the dark cloud Barnard 68 with unprecedented detail. We find the cloud's density structure to be very well described by the equations for a pressure-confined, self-gravitating isothermal sphere that is critically stable according to the Bonnor-Ebert criteria. As a result we can precisely specify the physical conditions inside a dark cloud on the verge of collapse to form a star.

  4. The First Complete Chloroplast Genome Sequences in Actinidiaceae: Genome Structure and Comparative Analysis

    PubMed Central

    Yao, Xiaohong; Tang, Ping; Li, Zuozhou; Li, Dawei; Liu, Yifei; Huang, Hongwen

    2015-01-01

    Actinidia chinensis is an important economic plant belonging to the basal lineage of the asterids. Availability of a complete Actinidia chloroplast genome sequence is crucial to understanding phylogenetic relationships among major lineages of angiosperms and facilitates kiwifruit genetic improvement. We report here the complete nucleotide sequences of the chloroplast genomes for Actinidia chinensis and A. chinensis var deliciosa obtained through de novo assembly of Illumina paired-end reads produced by total DNA sequencing. The total genome size ranges from 155,446 to 157,557 bp, with an inverted repeat (IR) of 24,013 to 24,391 bp, a large single copy region (LSC) of 87,984 to 88,337 bp and a small single copy region (SSC) of 20,332 to 20,336 bp. The genome encodes 113 different genes, including 79 unique protein-coding genes, 30 tRNA genes and 4 ribosomal RNA genes, with 16 duplicated in the inverted repeats, and a tRNA gene (trnfM-CAU) duplicated once in the LSC region. Comparisons of IR boundaries among four asterid species showed that IR/LSC borders were extended into the 5’ portion of the psbA gene and IR contraction occurred in Actinidia. The clap gene has been lost from the chloroplast genome in Actinidia, and may have been transferred to the nucleus during chloroplast evolution. Twenty-seven polymorphic simple sequence repeat (SSR) loci were identified in the Actinidia chloroplast genome. Maximum parsimony analyses of a 72-gene, 16 taxa angiosperm dataset strongly support the placement of Actinidiaceae in Ericales within the basal asterids. PMID:26046631

  5. Inferring coarse-grain histone-DNA interaction potentials from high-resolution structures of the nucleosome

    NASA Astrophysics Data System (ADS)

    Meyer, Sam; Everaers, Ralf

    2015-02-01

    The histone-DNA interaction in the nucleosome is a fundamental mechanism of genomic compaction and regulation, which remains largely unknown despite increasing structural knowledge of the complex. In this paper, we propose a framework for the extraction of a nanoscale histone-DNA force-field from a collection of high-resolution structures, which may be adapted to a larger class of protein-DNA complexes. We applied the procedure to a large crystallographic database extended by snapshots from molecular dynamics simulations. The comparison of the structural models first shows that, at histone-DNA contact sites, the DNA base-pairs are shifted outwards locally, consistent with locally repulsive forces exerted by the histones. The second step shows that the various force profiles of the structures under analysis derive locally from a unique, sequence-independent, quadratic repulsive force-field, while the sequence preferences are entirely due to internal DNA mechanics. We have thus obtained the first knowledge-derived nanoscale interaction potential for histone-DNA in the nucleosome. The conformations obtained by relaxation of nucleosomal DNA with high-affinity sequences in this potential accurately reproduce the experimental values of binding preferences. Finally we address the more generic binding mechanisms relevant to the 80% genomic sequences incorporated in nucleosomes, by computing the conformation of nucleosomal DNA with sequence-averaged properties. This conformation differs from those found in crystals, and the analysis suggests that repulsive histone forces are related to local stretch tension in nucleosomal DNA, mostly between adjacent contact points. This tension could play a role in the stability of the complex.

  6. A sequence-based survey of the complex structural organization of tumor genomes

    SciTech Connect

    Collins, Colin; Raphael, Benjamin J.; Volik, Stanislav; Yu, Peng; Wu, Chunxiao; Huang, Guiqing; Linardopoulou, Elena V.; Trask, Barbara J.; Waldman, Frederic; Costello, Joseph; Pienta, Kenneth J.; Mills, Gordon B.; Bajsarowicz, Krystyna; Kobayashi, Yasuko; Sridharan, Shivaranjani; Paris, Pamela; Tao, Quanzhou; Aerni, Sarah J.; Brown, Raymond P.; Bashir, Ali; Gray, Joe W.; Cheng, Jan-Fang; de Jong, Pieter; Nefedov, Mikhail; Ried, Thomas; Padilla-Nash, Hesed M.; Collins, Colin C.

    2008-04-03

    The genomes of many epithelial tumors exhibit extensive chromosomal rearrangements. All classes of genome rearrangements can be identified using End Sequencing Profiling (ESP), which relies on paired-end sequencing of cloned tumor genomes. In this study, brain, breast, ovary and prostate tumors along with three breast cancer cell lines were surveyed with ESP yielding the largest available collection of sequence-ready tumor genome breakpoints and providing evidence that some rearrangements may be recurrent. Sequencing and fluorescence in situ hybridization (FISH) confirmed translocations and complex tumor genome structures that include coamplification and packaging of disparate genomic loci with associated molecular heterogeneity. Comparison of the tumor genomes suggests recurrent rearrangements. Some are likely to be novel structural polymorphisms, whereas others may be bona fide somatic rearrangements. A recurrent fusion transcript in breast tumors and a constitutional fusion transcript resulting from a segmental duplication were identified. Analysis of end sequences for single nucleotide polymorphisms (SNPs) revealed candidate somatic mutations and an elevated rate of novel SNPs in an ovarian tumor. These results suggest that the genomes of many epithelial tumors may be far more dynamic and complex than previously appreciated and that genomic fusions including fusion transcripts and proteins may be common, possibly yielding tumor-specific biomarkers and therapeutic targets.

  7. Phylogenetic Gaussian process model for the inference of functionally important regions in protein tertiary structures.

    PubMed

    Huang, Yi-Fei; Golding, G Brian

    2014-01-01

    A critical question in biology is the identification of functionally important amino acid sites in proteins. Because functionally important sites are under stronger purifying selection, site-specific substitution rates tend to be lower than usual at these sites. A large number of phylogenetic models have been developed to estimate site-specific substitution rates in proteins and the extraordinarily low substitution rates have been used as evidence of function. Most of the existing tools, e.g. Rate4Site, assume that site-specific substitution rates are independent across sites. However, site-specific substitution rates may be strongly correlated in the protein tertiary structure, since functionally important sites tend to be clustered together to form functional patches. We have developed a new model, GP4Rate, which incorporates the Gaussian process model with the standard phylogenetic model to identify slowly evolved regions in protein tertiary structures. GP4Rate uses the Gaussian process to define a nonparametric prior distribution of site-specific substitution rates, which naturally captures the spatial correlation of substitution rates. Simulations suggest that GP4Rate can potentially estimate site-specific substitution rates with a much higher accuracy than Rate4Site and tends to report slowly evolved regions rather than individual sites. In addition, GP4Rate can estimate the strength of the spatial correlation of substitution rates from the data. By applying GP4Rate to a set of mammalian B7-1 genes, we found a highly conserved region which coincides with experimental evidence. GP4Rate may be a useful tool for the in silico prediction of functionally important regions in the proteins with known structures.

  8. Functional and structural analyses of mouse genomic regions screened by the morphological specific-locus test.

    PubMed

    Russell, L B

    1989-05-01

    Genetic analyses of certain classes of mutations recovered in the mouse specific-locus test (SLT) have characterized arrays of deletions, overlapping at the marked loci. Complementation maps, generated for several of the regions, have identified a number of functional units surrounding each marked locus and have ordered the mutations into complementation groups. Molecular entry to all but one of the marked regions has been achieved by (1) identifying proviral integrations in, or close to, the specific loci (d, se, a, c); (2) mapping random anonymous clones from appropriately enriched libraries to the longest deleted segments, then submapping to more limited segments on the basis of complementation and deletion-breakpoint maps (c, p); (3) similarly mapping known clones thought to be located in pertinent chromosomal regions (p, c, d); and (4) cloning specific genes that reside in regions corresponding to the deletions (b, c, p). The molecular analyses have confirmed that genetically-inferred deletions are structural deletions of DNA. The emerging physical maps are concordant with the complementation maps, and in several cases have discriminated among members of a complementation group with respect to breakpoint positions. Deletion-breakpoint-fusion fragments have prove to be highly useful for making large chromosomal jumps to facilitate physical mapping. The recent advances toward correlating physical and functional maps of specific regions of the mouse genome owe much to the existence of arrays of mutations involving loci marked in the SLT. In turn, the characterizations of these regions have made it possible to demonstrate qualitative differences among mutations resulting from different treatments. This new capability for qualitative analysis, which will increase as the molecular studies proceed, further enhances the value of the SLT, which has been extensively used for quantitative studies in germ-cell mutagenesis.

  9. Insight into asphaltene nanoaggregate structure inferred by small angle neutron and X-ray scattering.

    PubMed

    Eyssautier, Joëlle; Levitz, Pierre; Espinat, Didier; Jestin, Jacques; Gummel, Jérémie; Grillo, Isabelle; Barré, Loïc

    2011-06-02

    Complementary neutron and X-ray small angle scattering results give prominent information on the asphaltene nanostructure. Precise SANS and SAXS measurements on a large q-scale were performed on the same dilute asphaltene-toluene solution, and absolute intensity scaling was carried out. Direct comparison of neutron and X-ray spectra enables description of a fractal organization made from the aggregation of small entities of 16 kDa, exhibiting an internal fine structure. Neutron contrast variation experiments enhance the description of this nanoaggregate in terms of core-shell disk organization, giving insight into core and shell dimensions and chemical compositions. The nanoaggregates are best described by a disk of total radius 32 Å with 30% polydispersity and a height of 6.7 Å. Composition and density calculations show that the core is a dense and aromatic structure, contrary to the shell, which is highly aliphatic. These results show a good agreement with the general view of the Yen model (Yen, T. F.; et al. Anal. Chem.1961, 33, 1587-1594) and as for the modified Yen model (Mullins, O. C. Energy Fuels2010, 24, 2179-2207), provide characteristic dimensions of the asphaltene nanoaggregate in good solvent.

  10. Inferring the interplay between network structure and market effects in Bitcoin

    NASA Astrophysics Data System (ADS)

    Kondor, Dániel; Csabai, István; Szüle, János; Pósfai, Márton; Vattay, Gábor

    2014-12-01

    A main focus in economics research is understanding the time series of prices of goods and assets. While statistical models using only the properties of the time series itself have been successful in many aspects, we expect to gain a better understanding of the phenomena involved if we can model the underlying system of interacting agents. In this article, we consider the history of Bitcoin, a novel digital currency system, for which the complete list of transactions is available for analysis. Using this dataset, we reconstruct the transaction network between users and analyze changes in the structure of the subgraph induced by the most active users. Our approach is based on the unsupervised identification of important features of the time variation of the network. Applying the widely used method of Principal Component Analysis to the matrix constructed from snapshots of the network at different times, we are able to show how structural changes in the network accompany significant changes in the exchange price of bitcoins.

  11. Genetic structure and diversity among maize inbred lines as inferred from DNA microsatellites.

    PubMed Central

    Liu, Kejun; Goodman, Major; Muse, Spencer; Smith, J Stephen; Buckler, Ed; Doebley, John

    2003-01-01

    Two hundred and sixty maize inbred lines, representative of the genetic diversity among essentially all public lines of importance to temperate breeding and many important tropical and subtropical lines, were assayed for polymorphism at 94 microsatellite loci. The 2039 alleles identified served as raw data for estimating genetic structure and diversity. A model-based clustering analysis placed the inbred lines in five clusters that correspond to major breeding groups plus a set of lines showing evidence of mixed origins. A "phylogenetic" tree was constructed to further assess the genetic structure of maize inbreds, showing good agreement with the pedigree information and the cluster analysis. Tropical and subtropical inbreds possess a greater number of alleles and greater gene diversity than their temperate counterparts. The temperate Stiff Stalk lines are on average the most divergent from all other inbred groups. Comparison of diversity in equivalent samples of inbreds and open-pollinated landraces revealed that maize inbreds capture <80% of the alleles in the landraces, suggesting that landraces can provide additional genetic diversity for maize breeding. The contributions of four different segments of the landrace gene pool to each inbred group's gene pool were estimated using a novel likelihood-based model. The estimates are largely consistent with known histories of the inbreds and indicate that tropical highland germplasm is poorly represented in maize inbreds. Core sets of inbreds that capture maximal allelic richness were defined. These or similar core sets can be used for a variety of genetic applications in maize. PMID:14704191

  12. A new chicken genome assembly provides insight into avian genome structure.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The importance of the Gallus gallus (chicken) as a model organism and agricultural animal merits a continuation of sequence assembly improvement efforts. We present a new version of the chicken genome assembly (Gallus_gallus-5.0; GCA_000002315.3) built from combined long single molecule sequencing t...

  13. Towards inferring elastic structural variations from Earth's response to surface mass loading

    NASA Astrophysics Data System (ADS)

    Martens, H. R.; Simons, M.; Rivera, L. A.; Owen, S. E.

    2015-12-01

    We explore the sensitivity of surface mass loading displacement response to perturbations in elastic structure, with the goal to refine profiles of elastic moduli and density through the crust and upper mantle. Examples of surface mass loads include tidal and non-tidal ocean loads, atmospheric loads and hydrological loads. Using software developed in-house (LoadDef), we derive sensitivity kernels for Love numbers and load Green's functions (LGFs) using calculus of variations and finite difference methods. Perturbations to the two elastic moduli and density exhibit unique LGF sensitivity patterns, retaining the possibility that the material parameters may be independently constrained given a spatially distributed set of sufficiently accurate loading response observations. To further elucidate the ability to invert for structure in a particular region, a thorough investigation into model resolution must also be performed. We garner a more palpable sense for the effects of structural variations on the response to surface mass loading by calculating and comparing sets of predicted ocean tidal loading (OTL) displacement responses across a global network of land-based locations, generated from convolutions of an ocean tide model with LGFs derived from a variety of reference Earth models. We find that discrepancies between predictions for the M2 harmonic differ by less than 0.2 mm at over 95% of the locations considered, a value generally exceeded, albeit not substantially, by current observational and forward modeling errors. Although predicted discrepancies can reach 2 mm or more at some coastal locations, errors in the ocean tide models and convolution algorithms are also largest near the coasts. As a case study, we examine the residuals between Global Positioning System (GPS) observations and modeled predictions of OTL response across the South American continent. A comparison of ocean models suggests that a common mode (mean displacement) accounts for a dominant

  14. From structure prediction to genomic screens for novel non-coding RNAs.

    PubMed

    Gorodkin, Jan; Hofacker, Ivo L

    2011-08-01

    Non-coding RNAs (ncRNAs) are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs). A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction of RNA structure with the aim of assisting in functional analysis. With the discovery of more and more ncRNAs, it has become clear that a large fraction of these are highly structured. Interestingly, a large part of the structure is comprised of regular Watson-Crick and GU wobble base pairs. This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early methods focused on energy-directed folding of single sequences, comparative analysis based on structure preserving changes of base pairs has been efficient in improving accuracy, and today this constitutes a key component in genomic screens. Here, we cover the basic principles of RNA folding and touch upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other.

  15. Local chromatin structure of heterochromatin regulates repeated DNA stability, nucleolus structure, and genome integrity

    SciTech Connect

    Peng, Jamy C.

    2007-01-01

    Heterochromatin constitutes a significant portion of the genome in higher eukaryotes; approximately 30% in Drosophila and human. Heterochromatin contains a high repeat DNA content and a low density of protein-encoding genes. In contrast, euchromatin is composed mostly of unique sequences and contains the majority of single-copy genes. Genetic and cytological studies demonstrated that heterochromatin exhibits regulatory roles in chromosome organization, centromere function and telomere protection. As an epigenetically regulated structure, heterochromatin formation is not defined by any DNA sequence consensus. Heterochromatin is characterized by its association with nucleosomes containing methylated-lysine 9 of histone H3 (H3K9me), heterochromatin protein 1 (HP1) that binds H3K9me, and Su(var)3-9, which methylates H3K9 and binds HP1. Heterochromatin formation and functions are influenced by HP1, Su(var)3-9, and the RNA interference (RNAi) pathway. My thesis project investigates how heterochromatin formation and function impact nuclear architecture, repeated DNA organization, and genome stability in Drosophila melanogaster. H3K9me-based chromatin reduces extrachromosomal DNA formation; most likely by restricting the access of repair machineries to repeated DNAs. Reducing extrachromosomal ribosomal DNA stabilizes rDNA repeats and the nucleolus structure. H3K9me-based chromatin also inhibits DNA damage in heterochromatin. Cells with compromised heterochromatin structure, due to Su(var)3-9 or dcr-2 (a component of the RNAi pathway) mutations, display severe DNA damage in heterochromatin compared to wild type. In these mutant cells, accumulated DNA damage leads to chromosomal defects such as translocations, defective DNA repair response, and activation of the G2-M DNA repair and mitotic checkpoints that ensure cellular and animal viability. My thesis research suggests that DNA replication, repair, and recombination mechanisms in heterochromatin differ from those in

  16. Causal Inference in Occupational Epidemiology: Accounting for the Healthy Worker Effect by Using Structural Nested Models

    PubMed Central

    Naimi, Ashley I.; Richardson, David B.; Cole, Stephen R.

    2013-01-01

    In a recent issue of the Journal, Kirkeleit et al. (Am J Epidemiol. 2013;177(11):1218–1224) provided empirical evidence for the potential of the healthy worker effect in a large cohort of Norwegian workers across a range of occupations. In this commentary, we provide some historical context, define the healthy worker effect by using causal diagrams, and use simulated data to illustrate how structural nested models can be used to estimate exposure effects while accounting for the healthy worker survivor effect in 4 simple steps. We provide technical details and annotated SAS software (SAS Institute, Inc., Cary, North Carolina) code corresponding to the example analysis in the Web Appendices, available at http://aje.oxfordjournals.org/. PMID:24077092

  17. Causal inference in occupational epidemiology: accounting for the healthy worker effect by using structural nested models.

    PubMed

    Naimi, Ashley I; Richardson, David B; Cole, Stephen R

    2013-12-15

    In a recent issue of the Journal, Kirkeleit et al. (Am J Epidemiol. 2013;177(11):1218-1224) provided empirical evidence for the potential of the healthy worker effect in a large cohort of Norwegian workers across a range of occupations. In this commentary, we provide some historical context, define the healthy worker effect by using causal diagrams, and use simulated data to illustrate how structural nested models can be used to estimate exposure effects while accounting for the healthy worker survivor effect in 4 simple steps. We provide technical details and annotated SAS software (SAS Institute, Inc., Cary, North Carolina) code corresponding to the example analysis in the Web Appendices, available at http://aje.oxfordjournals.org/.

  18. The solar corona structures as inferred from July 11, 1991 total solar eclipse observation

    NASA Astrophysics Data System (ADS)

    Sykora, Julius; Badalyan, O. G.

    1992-11-01

    During the 11 Jul. 1991 total solar eclipse (La Paz, Mexico), very complete photographic records of the linearly polarized light of the solar corona in continuum and in spectral line FeXIV 530.3 nm were obtained. Degree of polarization is considerably different for both the mentioned spectral regions. This follows, of course, most of all from a difference of the physical substance of polarization both in continuum and in emission spectral lines. The distinct variety of the white light degree of polarization according to the type of coronal structures is discussed. For example, the high degree of polarization is characteristic for the coronal streamers which in case of this eclipse were localized at unusually high solar latitudes. Position of the streamers is certainly reflected in an exceptional position of the heliospheric current sheet and, except of that, seriously casts doubts on the general validity of the Ludendorff's definition of the solar corona flattening.

  19. Inferring demographic structure with moccasin size data from the Promontory Caves, Utah.

    PubMed

    Billinger, Michael; Ives, John W

    2015-01-01

    The moccasin assemblage Julian Steward recovered from the Promontory caves in 1930-31 provides a novel example in which material culture can be used to understand the structure of an AD thirteenth century population. Several studies shed light on the relationship between shoe size, foot size, and stature. We develop an anthropometric model for understanding the composition of the Promontory Cave population by using moccasin size as a proxy for foot size. We then predict the stature of the individual who would have worn a moccasin. Stature is closely related to age for children, subadults and adult males. Although there are predictable sex and age factors biasing moccasin discard practices, moccasin dimensions suggest a relatively large proportion of children and subadults occupied the Promontory caves. This bison and antelope hunting population appears to have thrived during its stay on Promontory Point.

  20. Genetic structure of the Korean black scraper Thamnaconus modestus inferred from microsatellite marker analysis.

    PubMed

    An, Hye Suck; Lee, Jang Wook; Park, Jung Yeon; Jung, Hyung Taek

    2013-05-01

    The Korean black scraper, Thamnaconus modestus, is one of the most economically important maricultural fish species in Korea. However, the annual catch of this fish has been continuously declining over the past several decades. In this study, the genetic diversity and relationships among four wild populations and two hatchery stocks of Korean black scraper were assessed based on 16 microsatellite (MS) markers. A total of 319 different alleles were detected over all loci with an average of 19.94 alleles per locus. The hatchery stocks [mean number of alleles (N(A)) = 12, allelic richness (A(R)) = 12, expected heterozygosity (He) = 0.834] showed a slight reduction (P > 0.05) in genetic variability in comparison with wild populations (mean N(A) = 13.86, A(R) = 12.35, He = 0.844), suggesting a sufficient level of genetic variation in the hatchery populations. Similarly low levels of inbreeding and significant Hardy-Weinberg equilibrium deviations were detected in both wild and hatchery populations. The genetic subdivision among all six populations was low but significant (overall F(ST) = 0.008, P < 0.01). Pairwise F(ST), a phylogenetic tree, and multidimensional scaling analysis suggested the existence of three geographically structured populations based on different sea basin origins, although the isolation-by-distance model was rejected. This result was corroborated by an analysis of molecular variance. This genetic differentiation may result from the co-effects of various factors, such as historical dispersal, local environment and ocean currents. These three geographical groups can be considered as independent management units. Our results show that MS markers may be suitable not only for the genetic monitoring of hatchery stocks but also for revealing the population structure of Korean black scraper populations. These results will provide critical information for breeding programs, the management of cultured stocks and the conservation of this species.

  1. [Genomic structure and sex determination in squamate reptiles].

    PubMed

    Kichigin, I G; Trifonov, V A

    2013-01-01

    Squamata is the largest reptilian order including snakes and lizards which occupies a key position in phylogeny of amniotes. A variety of sex determination modes in lizards is one of the most interesting parts of the biology of this order. These mechanisms are genomic sex determination (both XY and ZW systems) and temperature-dependent sex determination. Studies of squamata sex chromosomes are pivotal for understanding evolution of other vertebrate sex chromosomes. Unfortunately, this clade has long been neglected by molecular geneticists. In this paper, we describe recent data on molecular cytogenetics and genomics of squamates, evolution of their sex chromosomes and sex determination mechanisms.

  2. The discrepancies in the results of bioinformatics tools for genomic structural annotation

    NASA Astrophysics Data System (ADS)

    Pawełkowicz, Magdalena; Nowak, Robert; Osipowski, Paweł; Rymuszka, Jacek; Świerkula, Katarzyna; Wojcieszek, Michał; Przybecki, Zbigniew

    2014-11-01

    A major focus of sequencing project is to identify genes in genomes. However it is necessary to define the variety of genes and the criteria for identifying them. In this work we present discrepancies and dependencies from the application of different bioinformatic programs for structural annotation performed on the cucumber data set from Polish Consortium of Cucumber Genome Sequencing. We use Fgenesh, GenScan and GeneMark to automated structural annotation, the results have been compared to reference annotation.

  3. Genetic Structure and Inferences on Potential Source Areas for Bactrocera dorsalis (Hendel) Based on Mitochondrial and Microsatellite Markers

    PubMed Central

    Shi, Wei; Kerdelhué, Carole; Ye, Hui

    2012-01-01

    Bactrocera dorsalis (Diptera: Tephritidae) is mainly distributed in tropical and subtropical Asia and in the Pacific region. Despite its economic importance, very few studies have addressed the question of the wide genetic structure and potential source area of this species. This pilot study attempts to infer the native region of this pest and its colonization pathways in Asia. Combining mitochondrial and microsatellite markers, we evaluated the level of genetic diversity, genetic structure, and the gene flow among fly populations collected across Southeast Asia and China. A complex and significant genetic structure corresponding to the geographic pattern was found with both types of molecular markers. However, the genetic structure found was rather weak in both cases, and no pattern of isolation by distance was identified. Multiple long-distance dispersal events and miscellaneous host selection by this species may explain the results. These complex patterns may have been influenced by human-mediated transportation of the pest from one area to another and the complex topography of the study region. For both mitochondrial and microsatellite data, no signs of bottleneck or founder events could be identified. Nonetheless, maximal genetic diversity was observed in Myanmar, Vietnam and Guangdong (China) and asymmetric migration patterns were found. These results provide indirect evidence that the tropical regions of Southeast Asia and southern coast of China may be considered as the native range of the species and the population expansion is northward. Yunnan (China) is a contact zone that has been colonized from different sources. Regions along the southern coast of Vietnam and China probably served to colonize mainly the southern region of China. Southern coastal regions of China may also have colonized central parts of China and of central Yunnan. PMID:22615898

  4. Computational inference of the structure and regulation of the lignin pathway in Panicum virgatum

    SciTech Connect

    Faraji, Mojdeh; Fonseca, Luis L.; Escamilla-Treviño, Luis; Dixon, Richard A.; Voit, Eberhard O.

    2015-09-17

    Switchgrass is a prime target for biofuel production from inedible plant parts and has been the subject of numerous investigations in recent years. Yet, one of the main obstacles to effective biofuel production remains to be the major problem of recalcitrance. Recalcitrance emerges in part from the 3-D structure of lignin as a polymer in the secondary cell wall. Lignin limits accessibility of the sugars in the cellulose and hemicellulose polymers to enzymes and ultimately decreases ethanol yield. Monolignols, the building blocks of lignin polymers, are synthesized in the cytosol and translocated to the plant cell wall, where they undergo polymerization. The biosynthetic pathway leading to monolignols in switchgrass is not completely known, and difficulties associated with in vivo measurements of these intermediates pose a challenge for a true understanding of the functioning of the pathway. In this study, a systems biological modeling approach is used to address this challenge and to elucidate the structure and regulation of the lignin pathway through a computational characterization of alternate candidate topologies. The analysis is based on experimental data characterizing stem and tiller tissue of four transgenic lines (knock-downs of genes coding for key enzymes in the pathway) as well as wild-type switchgrass plants. These data consist of the observed content and composition of monolignols. The possibility of a G-lignin specific metabolic channel associated with the production and degradation of coniferaldehyde is examined, and the results support previous findings from another plant species. The computational analysis suggests regulatory mechanisms of product inhibition and enzyme competition, which are well known in biochemistry, but so far had not been reported in switchgrass. By including these mechanisms, the pathway model is able to represent all observations. In conclusion, the results show that the presence of the coniferaldehyde channel is necessary

  5. Computational inference of the structure and regulation of the lignin pathway in Panicum virgatum

    DOE PAGES

    Faraji, Mojdeh; Fonseca, Luis L.; Escamilla-Treviño, Luis; ...

    2015-09-17

    Switchgrass is a prime target for biofuel production from inedible plant parts and has been the subject of numerous investigations in recent years. Yet, one of the main obstacles to effective biofuel production remains to be the major problem of recalcitrance. Recalcitrance emerges in part from the 3-D structure of lignin as a polymer in the secondary cell wall. Lignin limits accessibility of the sugars in the cellulose and hemicellulose polymers to enzymes and ultimately decreases ethanol yield. Monolignols, the building blocks of lignin polymers, are synthesized in the cytosol and translocated to the plant cell wall, where they undergomore » polymerization. The biosynthetic pathway leading to monolignols in switchgrass is not completely known, and difficulties associated with in vivo measurements of these intermediates pose a challenge for a true understanding of the functioning of the pathway. In this study, a systems biological modeling approach is used to address this challenge and to elucidate the structure and regulation of the lignin pathway through a computational characterization of alternate candidate topologies. The analysis is based on experimental data characterizing stem and tiller tissue of four transgenic lines (knock-downs of genes coding for key enzymes in the pathway) as well as wild-type switchgrass plants. These data consist of the observed content and composition of monolignols. The possibility of a G-lignin specific metabolic channel associated with the production and degradation of coniferaldehyde is examined, and the results support previous findings from another plant species. The computational analysis suggests regulatory mechanisms of product inhibition and enzyme competition, which are well known in biochemistry, but so far had not been reported in switchgrass. By including these mechanisms, the pathway model is able to represent all observations. In conclusion, the results show that the presence of the coniferaldehyde channel is

  6. Lithosphere structure underneath the North China Craton inferred from elevation, gravity and geoid anomalies

    NASA Astrophysics Data System (ADS)

    Wang, K.

    2015-12-01

    The North China Craton (NCC) is a classical example of ancient destroyed cratons. The NCC experienced widespread thermotectonic reactivations in the Phanerozoic. Recent work suggested that the old craton has been significantly modified or destroyed during this process. However, most of the studies were confined to the Eastern NCC, the nature and evolution of the lithosphere beneath the Central and Western NCC was less constrained due to the lack of data. While, recent geodetic data, with the advantages of high resolution and coverage, offers an opportunity to study the deep structure underneath the whole NCC. Here we construct a lithospheric-scale 3D model based on the integration of regional elevation, gravity, geoid and thermal data together with available seismic data. The combined interpretation of these data provides information on the density and temperature distribution at different depth ranges. In the Eastern NCC, a rapid thickness decrease of both crust and lithosphere is reflected, concordant with abrupt changes in surface topography and Bouguer gravity anomaly. Our results together with the widespread magmatic rocks suggest that the Eastern NCC has experienced significant destruction of the lithospheric mantle with substantial modifications and thinning of the crust. In the Central and Western NCC, the generally thick and 'cold' lithosphere suggests that the cratonic mantle root is preserved in the central and western NCC, in agreement with the relatively low heat flow, rare magmatic activity and long-term tectonic stability observed at the surface, with some areas mildly modified as indicated by thin lithosphere.

  7. The structure of the Kohistan-Arc terrane in northern Pakistan as inferred from gravity data

    NASA Astrophysics Data System (ADS)

    Malinconico, Lawrence L.

    1986-04-01

    Modelling of gravity data taken across the Kohistan Island-Arc terrane in northern Pakistan can be used to constrain the shape and thickness of the Arc. Over 600 new gravity measurements were made across the Kohistan Island-Arc terrane in northern Pakistan. These data were taken along traverses normal to the structures bounding the Arc and were reduced to terrain-corrected Bouguer values. The reduced data were then modelled using standard two-dimensional modelling techniques. The southern margin of the Arc, the Main Mantle Thrust (MMT), dips to the north at approximately 45° and gradually flattens out at a depth of 7-9 km. The northern margin of the Arc, the Main Karkoram Thrust (MKT), also dips towards the north, but at a shallower initial angle (15°). From the models, the Arc terrane now appears to be around 7-9 km thick with the thicker sections occurring closer to the southern margin. The proposed model, in particular the angle of the MMT and the MKT, may have been significantly affected by the recent and rapid uplift that is occurring along the Nanga Parbat-Haramosh Massif.

  8. Inferring the population structure and demographic history of the tick, Amblyomma americanum Linnaeus.

    PubMed

    Mixson, Tonya R; Lydy, Shari L; Dasch, Gregory A; Real, Leslie A

    2006-06-01

    A hierarchial population genetic study was conducted on 703 individual Amblyomma americanum from nine populations in Georgia, U.S.A. Populations were sampled from the Coastal Plain, midland Piedmont region, and the upper Piedmont region. Twenty-nine distinct haplotypes were found. A minimum spanning tree was constructed that indicated these haplotypes comprised two lineages, the root of which was distinctly star-like. The majority of the variation found was among ticks within each population, indicating high amounts of gene flow and little genetic differentiation between the three regions. An overall F(ST) value of 0.006 supported the lack of genetic structuring between collection sites in Georgia. Mantel regression analysis revealed no isolation by distance. Signatures of population expansion were detected in the shapes of the mismatch distribution and tests of neutrality. The absence of genetic differentiation combined with the rejection of the null model of isolation by distance may indicate recent range expansion in Georgia or insufficient time to reach an equilibrium where genetic drift may have affected allele frequencies. Alternatively, the high degree of panmixia found within A. americanum in Georgia may be due to bird-mediated dispersal of ticks increasing the genetic similarity between geographically separated populations.

  9. Phylogeographic Structure in Anastrepha ludens (Diptera: Tephritidae) Populations Inferred With mtDNA Sequencing.

    PubMed

    Ruiz-Arce, Raul; Owen, Christopher L; Thomas, Donald B; Barr, Norman B; McPheron, Bruce A

    2015-06-01

    Anastrepha ludens (Loew) (Diptera: Tephritidae), the Mexican fruit fly, is a major pest of citrus and mango. It has a wide distribution in Mexico and Central America, with infestations occurring in Texas, California, and Florida with origins believed to have been centered in northeastern Mexico. This research evaluates the utility of a sequence-based approach for two mitochondrial (COI and ND6) gene regions. We use these markers to examine genetic diversity, estimate population structure, and identify diagnostic information for A. ludens populations. We analyzed 543 individuals from 67 geographic collections and found one predominant haplotype occurring in the majority of specimens. We observed 68 haplotypes in all and see differences among haplotypes belonging to northern and southern collections. Mexico haplotypes differ by few bases possibly as a result of a recent bottleneck event. In contrast to the hypothesis suggesting northeastern Mexico as the origin of this species, we see that specimens from two southern collections show high genetic variability delineating three mitochondrial groups. These data suggest that Central America is the origin for A. ludens. We show that COI and ND6 are useful for phylogeographic studies of A. ludens.

  10. Inferences on pathogenic fungus population structures from microsatellite data: new insights from spatial genetics approaches.

    PubMed

    Rieux, A; Halkett, F; de Lapeyre de Bellaire, L; Zapater, M-F; Rousset, F; Ravigne, V; Carlier, J

    2011-04-01

    Landscape genetics, which combines population genetics, landscape ecology and spatial statistics, has emerged recently as a new discipline that can be used to assess how landscape features or environmental variables can influence gene flow and spatial genetic variation. We applied this approach to the invasive plant pathogenic fungus Mycosphaerella fijiensis, which causes black leaf streak disease of banana. Around 880 isolates were sampled within a 50 × 50 km area located in a fragmented banana production zone in Cameroon that includes several potential physical barriers to gene flow. Two clustering algorithms and a new F(ST) -based procedure were applied to define the number of genetic entities and their spatial domain without a priori assumptions. Two populations were clearly delineated, and the genetic discontinuity appeared sharp but asymmetric. Interestingly, no landscape features matched this genetic discontinuity, and no isolation by distance (IBD) was found within populations. Our results suggest that the genetic structure observed in this production area reflects the recent history of M. fijiensis expansion in Cameroon rather than resulting from contemporary gene flow. Finally, we discuss the influence of the suspected high effective population size for such an organism on (i) the absence of an IBD signal, (ii) the characterization of contemporary gene-flow events through assignation methods of analysis and (iii) the evolution of the genetic discontinuity detected in this study.

  11. Population genetic structure of sexual and parthenogenetic damselflies inferred from mitochondrial and nuclear markers

    PubMed Central

    Lorenzo-Carballa, M O; Hadrys, H; Cordero-Rivera, A; Andrés, J A

    2012-01-01

    It has been postulated that obligate asexual lineages may persist in the long term if they escape from negative interactions with either sexual lineages or biological enemies; and thus, parthenogenetic populations will be more likely to occur in places that are difficult for sexuals to colonize, or those in which biological interactions are rare, such as islands or island-like habitats. Ischnura hastata is the only known example of natural parthenogenesis within the insect order Odonata, and it represents also a typical example of geographic parthenogenesis, as sexual populations are widely distributed in North America, whereas parthenogenetic populations of this species have only been found at the Azores archipelago. In order to gain insight in the origin and distribution of parthenogenetic I. hastata lineages, we have used microsatellites, mitochondrial and nuclear DNA sequence data, to examine the population genetic structure of this species over a wide geographic area. Our results suggest that sexual populations of I. hastata in North America conform to a large subdivided population that has gone through a recent spatial expansion. A recent single long distance dispersal event, followed by a demographic expansion, is the most parsimonious hypothesis explaining the origin of the parthenogenetic population of this species in the Azores islands. PMID:21915148

  12. Comparative Chloroplast Genome Analyses of Streptophyte Green Algae Uncover Major Structural Alterations in the Klebsormidiophyceae, Coleochaetophyceae and Zygnematophyceae

    PubMed Central

    Lemieux, Claude; Otis, Christian; Turmel, Monique

    2016-01-01

    The Streptophyta comprises all land plants and six main lineages of freshwater green algae: Mesostigmatophyceae, Chlorokybophyceae, Klebsormidiophyceae, Charophyceae, Coleochaetophyceae and Zygnematophyceae. Previous comparisons of the chloroplast genome from nine streptophyte algae (including four zygnematophyceans) revealed that, although land plant chloroplast DNAs (cpDNAs) inherited most of their highly conserved structural features from green algal ancestors, considerable cpDNA changes took place during the evolution of the Zygnematophyceae, the sister group of land plants. To gain deeper insights into the evolutionary dynamics of the chloroplast genome in streptophyte algae, we sequenced the cpDNAs of nine additional taxa: two klebsormidiophyceans (Entransia fimbriata and Klebsormidium sp. SAG 51.86), one coleocheatophycean (Coleochaete scutata) and six zygnematophyceans (Cylindrocystis brebissonii, Netrium digitus, Roya obtusa, Spirogyra maxima, Cosmarium botrytis and Closterium baillyanum). Our comparative analyses of these genomes with their streptophyte algal counterparts indicate that the large inverted repeat (IR) encoding the rDNA operon experienced loss or expansion/contraction in all three sampled classes and that genes were extensively shuffled in both the Klebsormidiophyceae and Zygnematophyceae. The klebsormidiophycean genomes boast greatly expanded IRs, with the Entransia 60,590-bp IR being the largest known among green algae. The 206,025-bp Entransia cpDNA, which is one of the largest genome among streptophytes, encodes 118 standard genes, i.e., four additional genes compared to its Klebsormidium flaccidum homolog. We inferred that seven of the 21 group II introns usually found in land plants were already present in the common ancestor of the Klebsormidiophyceae and its sister lineages. At 107,236 bp and with 117 standard genes, the Coleochaete IR-less genome is both the smallest and most compact among the streptophyte algal cpDNAs analyzed thus

  13. Comparative Chloroplast Genome Analyses of Streptophyte Green Algae Uncover Major Structural Alterations in the Klebsormidiophyceae, Coleochaetophyceae and Zygnematophyceae.

    PubMed

    Lemieux, Claude; Otis, Christian; Turmel, Monique

    2016-01-01

    The Streptophyta comprises all land plants and six main lineages of freshwater green algae: Mesostigmatophyceae, Chlorokybophyceae, Klebsormidiophyceae, Charophyceae, Coleochaetophyceae and Zygnematophyceae. Previous comparisons of the chloroplast genome from nine streptophyte algae (including four zygnematophyceans) revealed that, although land plant chloroplast DNAs (cpDNAs) inherited most of their highly conserved structural features from green algal ancestors, considerable cpDNA changes took place during the evolution of the Zygnematophyceae, the sister group of land plants. To gain deeper insights into the evolutionary dynamics of the chloroplast genome in streptophyte algae, we sequenced the cpDNAs of nine additional taxa: two klebsormidiophyceans (Entransia fimbriata and Klebsormidium sp. SAG 51.86), one coleocheatophycean (Coleochaete scutata) and six zygnematophyceans (Cylindrocystis brebissonii, Netrium digitus, Roya obtusa, Spirogyra maxima, Cosmarium botrytis and Closterium baillyanum). Our comparative analyses of these genomes with their streptophyte algal counterparts indicate that the large inverted repeat (IR) encoding the rDNA operon experienced loss or expansion/contraction in all three sampled classes and that genes were extensively shuffled in both the Klebsormidiophyceae and Zygnematophyceae. The klebsormidiophycean genomes boast greatly expanded IRs, with the Entransia 60,590-bp IR being the largest known among green algae. The 206,025-bp Entransia cpDNA, which is one of the largest genome among streptophytes, encodes 118 standard genes, i.e., four additional genes compared to its Klebsormidium flaccidum homolog. We inferred that seven of the 21 group II introns usually found in land plants were already present in the common ancestor of the Klebsormidiophyceae and its sister lineages. At 107,236 bp and with 117 standard genes, the Coleochaete IR-less genome is both the smallest and most compact among the streptophyte algal cpDNAs analyzed thus

  14. Perceptual inference.

    PubMed

    Aggelopoulos, Nikolaos C

    2015-08-01

    Perceptual inference refers to the ability to infer sensory stimuli from predictions that result from internal neural representations built through prior experience. Methods of Bayesian statistical inference and decision theory model cognition adequately by using error sensing either in guiding action or in "generative" models that predict the sensory information. In this framework, perception can be seen as a process qualitatively distinct from sensation, a process of information evaluation using previously acquired and stored representations (memories) that is guided by sensory feedback. The stored representations can be utilised as internal models of sensory stimuli enabling long term associations, for example in operant conditioning. Evidence for perceptual inference is contributed by such phenomena as the cortical co-localisation of object perception with object memory, the response invariance in the responses of some neurons to variations in the stimulus, as well as from situations in which perception can be dissociated from sensation. In the context of perceptual inference, sensory areas of the cerebral cortex that have been facilitated by a priming signal may be regarded as comparators in a closed feedback loop, similar to the better known motor reflexes in the sensorimotor system. The adult cerebral cortex can be regarded as similar to a servomechanism, in using sensory feedback to correct internal models, producing predictions of the outside world on the basis of past experience.

  15. Inferring proximity to the reconnection site via structural changes to the magnetopause caused by asymmetric reconnection.

    NASA Astrophysics Data System (ADS)

    Argall, M. R.; Chen, L. J.; Torbert, R. B.; Daughton, W. S.; Yoo, J.; Yamada, M.

    2014-12-01

    The mechanisms of field line breaking and magnetic energy dissipation that result in magnetic reconnection have yet to be determined by spacecraft observations. Many parameters have been proposed to locate the reconnection site, but they either fail to identify uniquely the reconnection site or have not been tested for asymmetric reconnection. We demonstrate that the change in magnetopause structure caused by reconnection can be used to locate and estimate proximity to the site of reconnection. Cluster observations of quiet magnetopause crossings, for which no evidence of reconnection is found, show no obvious spatial dependence of the DC electric field, while the plasma density and velocity make the transition from magnetosheath to magnetosphere values simultaneously with the tangential magnetic field (BL) reversal. Conversely, in-situ observations of several active crossings, for which signs of reconnection are evident, show that the density transition and BL reversal can occur simultaneously or be offset from one another by over 100 ion skin depths (λi) (assuming a constant magnetopause velocity), the outflow jet can occur anywhere from the BL reversal to several λi earthward of the density gradient, and the DC electric field changes sign on either side of the density gradient. Laboratory experiments and 2D and 3D particle-in-cell simulations of asymmetric reconnection reveal that the relative transition offsets are due to exhaust crossings at different proximities to the X-line. Only within the thin electron current layer surrounding the X-line do the transitions remain concurrent. We present one reconnection event during which the transitions in plasma density, DC electric field, and BL are simultaneous in two of the four Cluster spacecraft and offset in the other two spacecraft. The multiple satellite encounter allows us to examine spatial features in the region surrounding the X-line.

  16. Crustal and uppermost mantle structures of Atlas Mountains of Morocco inferred from electromagnetic imaging

    NASA Astrophysics Data System (ADS)

    Kiyan, D.; Jones, A. G.; Fullea, J.; Ledo, J.; Siniscalchi, A.; Romano, G.

    2012-12-01

    The second phase of the PICASSO (Program to Investigate Convective Alboran Sea System Overturn) project and the concomitant TopoMed (Plate re-organization in the western Mediterranean: Lithospheric causes and topographic consequences - an ESF EUROCORES TOPO-EUROPE Collaborative Research Project) is designed to determine the internal structure of the crust and lithosphere of the Atlas Mountains of Morocco. A multi-institutional magnetotelluric (MT) experiment across the Atlas Mountains region comprises the acquisition of broadband and long period MT data along two profiles: a N-S oriented profile through Middle Atlas to the east and a NE-SW profile through Marrakech to the west. The preliminary results of interpretation of the MT data collected over the first profile were presented in the paper by Ledo et al. (2011). In this study, we present the results from 3D MT inversion using the codes WSINV3DMT (Siripunvaraporn et al., 2005) and Modular system for Electromagnetic Inversion (ModEM; Egbert and Kelbert, 2012). There is a general good agreement between the main features obtained from the 2D models and the new results of the 3D modelling. Models inverting for only off-diagonal tensor components showed a distinct conductivity contrast between Middle-High Atlas and Anti Atlas correlates with the South Atlas Front fault, the depth extent of which appears to be limited to uppermost mantle (approximately 55 km). The resistivity of the lithosphere is gradually increasing towards Anti Atlas. Beside this, a prominent conducting anomaly at the lower crust/uppermost mantle is imaged west of the profile in the junction between the High and Middle Atlas (Moulouya plain). The conductive body, which extends from the southern boundary of Middle Atlas to the northern boundary of High Atlas, is interpreted as due to the presence of partial melt and/or migrated fluids.

  17. Deep structure of the Argentine margin inferred from 3D gravity and temperature modelling, Colorado Basin

    NASA Astrophysics Data System (ADS)

    Autin, J.; Scheck-Wenderoth, M.; Götze, H.-J.; Reichert, C.; Marchal, D.

    2016-04-01

    Following previous work on the Colorado Basin using a 3D crustal structural model, we now investigate the presence of lower crustal bodies at the base of the crust using 3D lithospheric gravity modelling and calculations of the conductive thermal field. Our first study highlighted two fault directions and depocentres associated with thinned crust (NW-SE in the West and NE-SW at the distal margin). Fault relative chronology argues for two periods of extension: (1) NW-SE faulting and thinning in the western Colorado Basin and (2) NE-SW faulting and thinning related to the continental breakup and formation of the NE-SW-striking volcanic margins of the Atlantic Ocean. In this study, the geometry of modelled high-density Lower Crustal Bodies (LCBs) enables the reproduction of the gravimetric field as well as of the temperature measured in wells down to 4500 m. The modelled LCBs correlate with geological observations: (1) NW-SE LCBs below the deepest depocentres in the West, (2) NE-SW LCBs below the distal margin faults and the seaward dipping reflectors. Thus the proposed poly-phased evolution of the margin could as well correspond to two emplacement phases of the LCBs. The calculated conductive thermal field fits the measured temperatures best if the thermal properties (thermal conductivity and radiogenic heat production) assigned to the LCBs correspond to either high-grade metamorphic rocks or to mafic magmatic intrusions. To explain the possible lithology of the LCBs, we propose that the two successive phases of extension are accompanied by magma supply, emplaced (1) in the thinnest crust below the older NW-SE depocentres, then (2) along the NE-SW continentward boundary of the distal margin and below the volcanic seaward dipping reflectors. The South African conjugate margin records only the second NE-SW event and we discuss hypotheses which could explain these differences between the conjugate margins.

  18. Geologic structure of the Yucaipa area inferred from gravity data, San Bernardino and Riverside Counties, California

    USGS Publications Warehouse

    Mendez, Gregory O.; Langenheim, V.E.; Morita, Andrew; Danskin, Wesley R.

    2016-09-30

    In the spring of 2009, the U.S. Geological Survey, in cooperation with the San Bernardino Valley Municipal Water District, began working on a gravity survey in the Yucaipa area to explore the three-dimensional shape of the sedimentary fill (alluvial deposits) and the surface of the underlying crystalline basement rocks. As water use has increased in pace with rapid urbanization, water managers have need for better information about the subsurface geometry and the boundaries of groundwater subbasins in the Yucaipa area. The large density contrast between alluvial deposits and the crystalline basement complex permits using modeling of gravity data to estimate the thickness of alluvial deposits. The bottom of the alluvial deposits is considered to be the top of crystalline basement rocks. The gravity data, integrated with geologic information from surface outcrops and 51 subsurface borings (15 of which penetrated basement rock), indicated a complex basin configuration where steep slopes coincide with mapped faults―such as the Crafton Hills Fault and the eastern section of the Banning Fault―and concealed ridges separate hydrologically defined subbasins.Gravity measurements and well logs were the primary data sets used to define the thickness and structure of the groundwater basin. Gravity measurements were collected at 256 new locations along profiles that totaled approximately 104.6 km (65 mi) in length; these data supplemented previously collected gravity measurements. Gravity data were reduced to isostatic anomalies and separated into an anomaly field representing the valley fill. The ‘valley-fill-deposits gravity anomaly’ was converted to thickness by using an assumed, depth-varying density contrast between the alluvial deposits and the underlying bedrock.To help visualize the basin geometry, an animation of the elevation of the top of the basement-rocks was prepared. The animation “flies over” the Yucaipa groundwater basin, viewing the land surface

  19. Head structures of Priacma serrata Leconte (Coleptera, Archostemata) inferred</