Science.gov

Sample records for inferring genomic structural

  1. Structure-based inference of molecular functions of proteins of unknown function from Berkeley Structural Genomics Center

    SciTech Connect

    Kim, Sung-Hou; Shin, Dong Hae; Hou, Jingtong; Chandonia, John-Marc; Das, Debanu; Choi, In-Geol; Kim, Rosalind; Kim, Sung-Hou

    2007-09-02

    Advances in sequence genomics have resulted in an accumulation of a huge number of protein sequences derived from genome sequences. However, the functions of a large portion of them cannot be inferred based on the current methods of sequence homology detection to proteins of known functions. Three-dimensional structure can have an important impact in providing inference of molecular function (physical and chemical function) of a protein of unknown function. Structural genomics centers worldwide have been determining many 3-D structures of the proteins of unknown functions, and possible molecular functions of them have been inferred based on their structures. Combined with bioinformatics and enzymatic assay tools, the successful acceleration of the process of protein structure determination through high throughput pipelines enables the rapid functional annotation of a large fraction of hypothetical proteins. We present a brief summary of the process we used at the Berkeley Structural Genomics Center to infer molecular functions of proteins of unknown function.

  2. Joint modeling of RNase footprint sequencing profiles for genome-wide inference of RNA structure.

    PubMed

    Zou, Chenchen; Ouyang, Zhengqing

    2015-10-30

    Recent studies have revealed significant roles of RNA structure in almost every step of RNA processing, including transcription, splicing, transport and translation. RNase footprint sequencing (RNase-seq) has emerged to dissect RNA structures at the genome scale. However, it remains challenging to analyze RNase-seq data because of the issues of signal sparsity, variability and correlations among various RNases. We present a probabilistic framework, joint Poisson-gamma mixture (JPGM), for integrative modeling of multiple RNase-seq profiles. Combining JPGM with hidden Markov model allows genome-wide inference of RNA structures. We apply the joint modeling approach for inferring base pairing states on simulated data sets and RNase-seq profiles of the double-strand specific RNase V1 and single-strand specific RNase S1 in yeast. We demonstrate that joint analysis of V1 and S1 profiles outputs interpretable RNA structure states, while approaches that analyze each profile separately do not. The joint modeling approach predicts the structure states of all nucleotides in 3196 transcripts of yeast without compromising accuracy, while the simple thresholding approach misses 43% of the nucleotides. Furthermore, the posterior probabilities outputted by our model are able to resolve the structural ambiguity of ≈300 000 nucleotides with overlapping V1 and S1 cleavage sites. Our model also generates RNA accessibilities, which are associated with three-dimensional conformations. PMID:26400167

  3. Structural features of conopeptide genes inferred from partial sequences of the Conus tribblei genome.

    PubMed

    Barghi, Neda; Concepcion, Gisela P; Olivera, Baldomero M; Lluisma, Arturo O

    2016-02-01

    The evolvability of venom components (in particular, the gene-encoded peptide toxins) in venomous species serves as an adaptive strategy allowing them to target new prey types or respond to changes in the prey field. The structure, organization, and expression of the venom peptide genes may provide insights into the molecular mechanisms that drive the evolution of such genes. Conus is a particularly interesting group given the high chemical diversity of their venom peptides, and the rapid evolution of the conopeptide-encoding genes. Conus genomes, however, are large and characterized by a high proportion of repetitive sequences. As a result, the structure and organization of conopeptide genes have remained poorly known. In this study, a survey of the genome of Conus tribblei was undertaken to address this gap. A partial assembly of C. tribblei genome was generated; the assembly, though consisting of a large number of fragments, accounted for 2160.5 Mb of sequence. A large number of repetitive genomic elements consisting of 642.6 Mb of retrotransposable elements, simple repeats, and novel interspersed repeats were observed. We characterized the structural organization and distribution of conotoxin genes in the genome. A significant number of conopeptide genes (estimated to be between 148 and 193) belonging to different superfamilies with complete or nearly complete exon regions were observed, ~60 % of which were expressed. The unexpressed conopeptide genes represent hidden but significant conotoxin diversity. The conotoxin genes also differed in the frequency and length of the introns. The interruption of exons by long introns in the conopeptide genes and the presence of repeats in the introns may indicate the importance of introns in facilitating recombination, evolution and diversification of conotoxins. These findings advance our understanding of the structural framework that promotes the gene-level molecular evolution of venom peptides. PMID:26423067

  4. Infer Metagenomic Abundance and Reveal Homologous Genomes Based on the Structure of Taxonomy Tree.

    PubMed

    Qiu, Yu-Qing; Tian, Xue; Zhang, Shihua

    2015-01-01

    Metagenomic research uses sequencing technologies to investigate the genetic biodiversity of microbiomes presented in various ecosystems or animal tissues. The composition of a microbial community is highly associated with the environment in which the organisms exist. As large amount of sequencing short reads of microorganism genomes obtained, accurately estimating the abundance of microorganisms within a metagenomic sample is becoming an increasing challenge in bioinformatics. In this paper, we describe a hierarchical taxonomy tree-based mixture model (HTTMM) for estimating the abundance of taxon within a microbial community by incorporating the structure of the taxonomy tree. In this model, genome-specific short reads and homologous short reads among genomes can be distinguished and represented by leaf and intermediate nodes in the taxonomy tree, respectively. We adopt an expectation-maximization algorithm to solve this model. Using simulated and real-world data, we demonstrate that the proposed method is superior to both flat mixture model and lowest common ancestry-based methods. Moreover, this model can reveal previously unaddressed homologous genomes. PMID:26451823

  5. Pseudoscorpion mitochondria show rearranged genes and genome-wide reductions of RNA gene sizes and inferred structures, yet typical nucleotide composition bias

    PubMed Central

    2012-01-01

    Background Pseudoscorpions are chelicerates and have historically been viewed as being most closely related to solifuges, harvestmen, and scorpions. No mitochondrial genomes of pseudoscorpions have been published, but the mitochondrial genomes of some lineages of Chelicerata possess unusual features, including short rRNA genes and tRNA genes that lack sequence to encode arms of the canonical cloverleaf-shaped tRNA. Additionally, some chelicerates possess an atypical guanine-thymine nucleotide bias on the major coding strand of their mitochondrial genomes. Results We sequenced the mitochondrial genomes of two divergent taxa from the chelicerate order Pseudoscorpiones. We find that these genomes possess unusually short tRNA genes that do not encode cloverleaf-shaped tRNA structures. Indeed, in one genome, all 22 tRNA genes lack sequence to encode canonical cloverleaf structures. We also find that the large ribosomal RNA genes are substantially shorter than those of most arthropods. We inferred secondary structures of the LSU rRNAs from both pseudoscorpions, and find that they have lost multiple helices. Based on comparisons with the crystal structure of the bacterial ribosome, two of these helices were likely contact points with tRNA T-arms or D-arms as they pass through the ribosome during protein synthesis. The mitochondrial gene arrangements of both pseudoscorpions differ from the ancestral chelicerate gene arrangement. One genome is rearranged with respect to the location of protein-coding genes, the small rRNA gene, and at least 8 tRNA genes. The other genome contains 6 tRNA genes in novel locations. Most chelicerates with rearranged mitochondrial genes show a genome-wide reversal of the CA nucleotide bias typical for arthropods on their major coding strand, and instead possess a GT bias. Yet despite their extensive rearrangement, these pseudoscorpion mitochondrial genomes possess a CA bias on the major coding strand. Phylogenetic analyses of all 13

  6. Haplo-block structure of Southern African village chicken populations inferred using genome-wide SNP data.

    PubMed

    Khanyile, K S; Dzomba, E F; Muchadeyi, F C

    2015-01-01

    This study investigated the haplo-block structure, haplotype sharing, and diversity in extensively raised chicken populations of Southern Africa. Two hundred ninety village chickens from Malawi (N = 30), South Africa (N = 132), and Zimbabwe (N = 128) were included in the study, from which 649, 2104, and 2442 haplo-blocks were observed, respectively. The majority of haplo-blocks were smaller than 25 kb in size and only five blocks were more than 2000 kb in size. The low chromosomal coverage of haplo-blocks observed across the genome suggests that multiple recombination events fragmented the ancestral haplo-blocks into smaller sizes. Haplo-block sharing was observed between populations with 2325 haplo-blocks common between Zimbabwe and Malawi and 2689 between South Africa and Zimbabwe. Haplotype sharing allows transferability of genomic tools between these extensively raised chicken populations of Southern Africa. The unique haplo-blocks could have originated from isolated evolution taking place in specific agro-ecological zones. Quantitative trait loci analysis revealed that genes related to body composition were spanned by these haplo-blocks. Body composition traits are important for village chicken populations, which have to harness poor quality feed obtained from the environment to meet their maintenance and production needs. PMID:26505376

  7. Structure, expression profile and phylogenetic inference of chalcone isomerase-like genes from the narrow-leafed lupin (Lupinus angustifolius L.) genome

    PubMed Central

    Przysiecka, Łucja; Książkiewicz, Michał; Wolko, Bogdan; Naganowska, Barbara

    2015-01-01

    Lupins, like other legumes, have a unique biosynthesis scheme of 5-deoxy-type flavonoids and isoflavonoids. A key enzyme in this pathway is chalcone isomerase (CHI), a member of CHI-fold protein family, encompassing subfamilies of CHI1, CHI2, CHI-like (CHIL), and fatty acid-binding (FAP) proteins. Here, two Lupinus angustifolius (narrow-leafed lupin) CHILs, LangCHIL1 and LangCHIL2, were identified and characterized using DNA fingerprinting, cytogenetic and linkage mapping, sequencing and expression profiling. Clones carrying CHIL sequences were assembled into two contigs. Full gene sequences were obtained from these contigs, and mapped in two L. angustifolius linkage groups by gene-specific markers. Bacterial artificial chromosome fluorescence in situ hybridization approach confirmed the localization of two LangCHIL genes in distinct chromosomes. The expression profiles of both LangCHIL isoforms were very similar. The highest level of transcription was in the roots of the third week of plant growth; thereafter, expression declined. The expression of both LangCHIL genes in leaves and stems was similar and low. Comparative mapping to reference legume genome sequences revealed strong syntenic links; however, LangCHIL2 contig had a much more conserved structure than LangCHIL1. LangCHIL2 is assumed to be an ancestor gene, whereas LangCHIL1 probably appeared as a result of duplication. As both copies are transcriptionally active, questions arise concerning their hypothetical functional divergence. Screening of the narrow-leafed lupin genome and transcriptome with CHI-fold protein sequences, followed by Bayesian inference of phylogeny and cross-genera synteny survey, identified representatives of all but one (CHI1) main subfamilies. They are as follows: two copies of CHI2, FAPa2 and CHIL, and single copies of FAPb and FAPa1. Duplicated genes are remnants of whole genome duplication which is assumed to have occurred after the divergence of Lupinus, Arachis, and Glycine

  8. Genome-Wide Inference of Ancestral Recombination Graphs

    PubMed Central

    Rasmussen, Matthew D.; Hubisz, Melissa J.; Gronau, Ilan; Siepel, Adam

    2014-01-01

    The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the “ancestral recombination graph” (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of chromosomes conditional on an ARG of chromosomes, an operation we call “threading.” Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps. PMID:24831947

  9. Genome-wide inference of ancestral recombination graphs.

    PubMed

    Rasmussen, Matthew D; Hubisz, Melissa J; Gronau, Ilan; Siepel, Adam

    2014-01-01

    The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the "ancestral recombination graph" (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of [Formula: see text] chromosomes conditional on an ARG of [Formula: see text] chromosomes, an operation we call "threading." Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps. PMID:24831947

  10. Inferring ancestry from population genomic data and its applications

    PubMed Central

    Padhukasahasram, Badri

    2014-01-01

    Ancestry inference is a frequently encountered problem and has many applications such as forensic analyses, genetic association studies, and personal genomics. The main goal of ancestry inference is to identify an individual’s population of origin based on our knowledge of natural populations. Because both self-reported ancestry in humans or the sampling location of an organism can be inaccurate for this purpose, the use of genetic markers can facilitate accurate and reliable inference of an individual’s ancestral origins. At a higher level, there are two different paradigms in ancestry inference: global ancestry inference which tries to compute the genome-wide average of the population contributions and local ancestry inference which tries to identify the regional ancestry of a genomic segment. In this mini review, I describe the numerous approaches that are currently available for both kinds of ancestry inference from population genomic datasets. I first describe the general ideas underlying such inference methods and their relationship to one another. Then, I describe practical applications in which inference of ancestry has proven useful. Lastly, I discuss challenges and directions for future research work in this area. PMID:25071832

  11. Structural inference for uncertain networks

    NASA Astrophysics Data System (ADS)

    Martin, Travis; Ball, Brian; Newman, M. E. J.

    2016-01-01

    In the study of networked systems such as biological, technological, and social networks the available data are often uncertain. Rather than knowing the structure of a network exactly, we know the connections between nodes only with a certain probability. In this paper we develop methods for the analysis of such uncertain data, focusing particularly on the problem of community detection. We give a principled maximum-likelihood method for inferring community structure and demonstrate how the results can be used to make improved estimates of the true structure of the network. Using computer-generated benchmark networks we demonstrate that our methods are able to reconstruct known communities more accurately than previous approaches based on data thresholding. We also give an example application to the detection of communities in a protein-protein interaction network.

  12. Inferring parental genomic ancestries using pooled semi-Markov processes

    PubMed Central

    Zou, James Y.; Halperin, Eran; Burchard, Esteban; Sankararaman, Sriram

    2015-01-01

    Motivation: A basic problem of broad public and scientific interest is to use the DNA of an individual to infer the genomic ancestries of the parents. In particular, we are often interested in the fraction of each parent’s genome that comes from specific ancestries (e.g. European, African, Native American, etc). This has many applications ranging from understanding the inheritance of ancestry-related risks and traits to quantifying human assortative mating patterns. Results: We model the problem of parental genomic ancestry inference as a pooled semi-Markov process. We develop a general mathematical framework for pooled semi-Markov processes and construct efficient inference algorithms for these models. Applying our inference algorithm to genotype data from 231 Mexican trios and 258 Puerto Rican trios where we have the true genomic ancestry of each parent, we demonstrate that our method accurately infers parameters of the semi-Markov processes and parents’ genomic ancestries. We additionally validated the method on simulations. Our model of pooled semi-Markov process and inference algorithms may be of independent interest in other settings in genomics and machine learning. Contact: jazo@microsoft.com PMID:26072482

  13. Genetic Network Inference Using Hierarchical Structure.

    PubMed

    Kimura, Shuhei; Tokuhisa, Masato; Okada-Hatakeyama, Mariko

    2016-01-01

    Many methods for inferring genetic networks have been proposed, but the regulations they infer often include false-positives. Several researchers have attempted to reduce these erroneous regulations by proposing the use of a priori knowledge about the properties of genetic networks such as their sparseness, scale-free structure, and so on. This study focuses on another piece of a priori knowledge, namely, that biochemical networks exhibit hierarchical structures. Based on this idea, we propose an inference approach that uses the hierarchical structure in a target genetic network. To obtain a reasonable hierarchical structure, the first step of the proposed approach is to infer multiple genetic networks from the observed gene expression data. We take this step using an existing method that combines a genetic network inference method with a bootstrap method. The next step is to extract a hierarchical structure from the inferred networks that is consistent with most of the networks. Third, we use the hierarchical structure obtained to assign confidence values to all candidate regulations. Numerical experiments are also performed to demonstrate the effectiveness of using the hierarchical structure in the genetic network inference. The improvement accomplished by the use of the hierarchical structure is small. However, the hierarchical structure could be used to improve the performances of many existing inference methods. PMID:26941653

  14. Inferring Correlation Networks from Genomic Survey Data

    PubMed Central

    Friedman, Jonathan; Alm, Eric J.

    2012-01-01

    High-throughput sequencing based techniques, such as 16S rRNA gene profiling, have the potential to elucidate the complex inner workings of natural microbial communities - be they from the world's oceans or the human gut. A key step in exploring such data is the identification of dependencies between members of these communities, which is commonly achieved by correlation analysis. However, it has been known since the days of Karl Pearson that the analysis of the type of data generated by such techniques (referred to as compositional data) can produce unreliable results since the observed data take the form of relative fractions of genes or species, rather than their absolute abundances. Using simulated and real data from the Human Microbiome Project, we show that such compositional effects can be widespread and severe: in some real data sets many of the correlations among taxa can be artifactual, and true correlations may even appear with opposite sign. Additionally, we show that community diversity is the key factor that modulates the acuteness of such compositional effects, and develop a new approach, called SparCC (available at https://bitbucket.org/yonatanf/sparcc), which is capable of estimating correlation values from compositional data. To illustrate a potential application of SparCC, we infer a rich ecological network connecting hundreds of interacting species across 18 sites on the human body. Using the SparCC network as a reference, we estimated that the standard approach yields 3 spurious species-species interactions for each true interaction and misses 60% of the true interactions in the human microbiome data, and, as predicted, most of the erroneous links are found in the samples with the lowest diversity. PMID:23028285

  15. Inferring correlation networks from genomic survey data.

    PubMed

    Friedman, Jonathan; Alm, Eric J

    2012-01-01

    High-throughput sequencing based techniques, such as 16S rRNA gene profiling, have the potential to elucidate the complex inner workings of natural microbial communities - be they from the world's oceans or the human gut. A key step in exploring such data is the identification of dependencies between members of these communities, which is commonly achieved by correlation analysis. However, it has been known since the days of Karl Pearson that the analysis of the type of data generated by such techniques (referred to as compositional data) can produce unreliable results since the observed data take the form of relative fractions of genes or species, rather than their absolute abundances. Using simulated and real data from the Human Microbiome Project, we show that such compositional effects can be widespread and severe: in some real data sets many of the correlations among taxa can be artifactual, and true correlations may even appear with opposite sign. Additionally, we show that community diversity is the key factor that modulates the acuteness of such compositional effects, and develop a new approach, called SparCC (available at https://bitbucket.org/yonatanf/sparcc), which is capable of estimating correlation values from compositional data. To illustrate a potential application of SparCC, we infer a rich ecological network connecting hundreds of interacting species across 18 sites on the human body. Using the SparCC network as a reference, we estimated that the standard approach yields 3 spurious species-species interactions for each true interaction and misses 60% of the true interactions in the human microbiome data, and, as predicted, most of the erroneous links are found in the samples with the lowest diversity. PMID:23028285

  16. Use of Whole Genome Sequence Data To Infer Baculovirus Phylogeny

    PubMed Central

    Herniou, Elisabeth A.; Luque, Teresa; Chen, Xinwen; Vlak, Just M.; Winstanley, Doreen; Cory, Jennifer S.; O'Reilly, David R.

    2001-01-01

    Several phylogenetic methods based on whole genome sequence data were evaluated using data from nine complete baculovirus genomes. The utility of three independent character sets was assessed. The first data set comprised the sequences of the 63 genes common to these viruses. The second set of characters was based on gene order, and phylogenies were inferred using both breakpoint distance analysis and a novel method developed here, termed neighbor pair analysis. The third set recorded gene content by scoring gene presence or absence in each genome. All three data sets yielded phylogenies supporting the separation of the Nucleopolyhedrovirus (NPV) and Granulovirus (GV) genera, the division of the NPVs into groups I and II, and species relationships within group I NPVs. Generation of phylogenies based on the combined sequences of all 63 shared genes proved to be the most effective approach to resolving the relationships among the group II NPVs and the GVs. The history of gene acquisitions and losses that have accompanied baculovirus diversification was visualized by mapping the gene content data onto the phylogenetic tree. This analysis highlighted the fluid nature of baculovirus genomes, with evidence of frequent genome rearrangements and multiple gene content changes during their evolution. Of more than 416 genes identified in the genomes analyzed, only 63 are present in all nine genomes, and 200 genes are found only in a single genome. Despite this fluidity, the whole genome-based methods we describe are sufficiently powerful to recover the underlying phylogeny of the viruses. PMID:11483757

  17. Robust Demographic Inference from Genomic and SNP Data

    PubMed Central

    Excoffier, Laurent; Dupanloup, Isabelle; Huerta-Sánchez, Emilia; Sousa, Vitor C.; Foll, Matthieu

    2013-01-01

    We introduce a flexible and robust simulation-based framework to infer demographic parameters from the site frequency spectrum (SFS) computed on large genomic datasets. We show that our composite-likelihood approach allows one to study evolutionary models of arbitrary complexity, which cannot be tackled by other current likelihood-based methods. For simple scenarios, our approach compares favorably in terms of accuracy and speed with , the current reference in the field, while showing better convergence properties for complex models. We first apply our methodology to non-coding genomic SNP data from four human populations. To infer their demographic history, we compare neutral evolutionary models of increasing complexity, including unsampled populations. We further show the versatility of our framework by extending it to the inference of demographic parameters from SNP chips with known ascertainment, such as that recently released by Affymetrix to study human origins. Whereas previous ways of handling ascertained SNPs were either restricted to a single population or only allowed the inference of divergence time between a pair of populations, our framework can correctly infer parameters of more complex models including the divergence of several populations, bottlenecks and migration. We apply this approach to the reconstruction of African demography using two distinct ascertained human SNP panels studied under two evolutionary models. The two SNP panels lead to globally very similar estimates and confidence intervals, and suggest an ancient divergence (>110 Ky) between Yoruba and San populations. Our methodology appears well suited to the study of complex scenarios from large genomic data sets. PMID:24204310

  18. Ancestral genome inference using a genetic algorithm approach.

    PubMed

    Gao, Nan; Yang, Ning; Tang, Jijun

    2013-01-01

    Recent advancement of technologies has now made it routine to obtain and compare gene orders within genomes. Rearrangements of gene orders by operations such as reversal and transposition are rare events that enable researchers to reconstruct deep evolutionary histories. An important application of genome rearrangement analysis is to infer gene orders of ancestral genomes, which is valuable for identifying patterns of evolution and for modeling the evolutionary processes. Among various available methods, parsimony-based methods (including GRAPPA and MGR) are the most widely used. Since the core algorithms of these methods are solvers for the so called median problem, providing efficient and accurate median solver has attracted lots of attention in this field. The "double-cut-and-join" (DCJ) model uses the single DCJ operation to account for all genome rearrangement events. Because mathematically it is much simpler than handling events directly, parsimony methods using DCJ median solvers has better speed and accuracy. However, the DCJ median problem is NP-hard and although several exact algorithms are available, they all have great difficulties when given genomes are distant. In this paper, we present a new algorithm that combines genetic algorithm (GA) with genomic sorting to produce a new method which can solve the DCJ median problem in limited time and space, especially in large and distant datasets. Our experimental results show that this new GA-based method can find optimal or near optimal results for problems ranging from easy to very difficult. Compared to existing parsimony methods which may severely underestimate the true number of evolutionary events, the sorting-based approach can infer ancestral genomes which are much closer to their true ancestors. The code is available at http://phylo.cse.sc.edu. PMID:23658708

  19. A Cooperative Co-Evolutionary Genetic Algorithm for Tree Scoring and Ancestral Genome Inference.

    PubMed

    Gao, Nan; Zhang, Yan; Feng, Bing; Tang, Jijun

    2015-01-01

    Recent advances of technology have made it easy to obtain and compare whole genomes. Rearrangements of genomes through operations such as reversals and transpositions are rare events that enable researchers to reconstruct deep evolutionary history among species. Some of the popular methods need to search a large tree space for the best scored tree, thus it is desirable to have a fast and accurate method that can score a given tree efficiently. During the tree scoring procedure, the genomic structures of internal tree nodes are also provided, which provide important information for inferring ancestral genomes and for modeling the evolutionary processes. However, computing tree scores and ancestral genomes are very difficult and a lot of researchers have to rely on heuristic methods which have various disadvantages. In this paper, we describe the first genetic algorithm for tree scoring and ancestor inference, which uses a fitness function considering co-evolution, adopts different initial seeding methods to initialize the first population pool, and utilizes a sorting-based approach to realize evolution. Our extensive experiments show that compared with other existing algorithms, this new method is more accurate and can infer ancestral genomes that are much closer to the true ancestors. PMID:26671797

  20. Inferring Strain Mixture within Clinical Plasmodium falciparum Isolates from Genomic Sequence Data.

    PubMed

    O'Brien, John D; Iqbal, Zamin; Wendler, Jason; Amenga-Etego, Lucas

    2016-06-01

    We present a rigorous statistical model that infers the structure of P. falciparum mixtures-including the number of strains present, their proportion within the samples, and the amount of unexplained mixture-using whole genome sequence (WGS) data. Applied to simulation data, artificial laboratory mixtures, and field samples, the model provides reasonable inference with as few as 10 reads or 50 SNPs and works efficiently even with much larger data sets. Source code and example data for the model are provided in an open source fashion. We discuss the possible uses of this model as a window into within-host selection for clinical and epidemiological studies. PMID:27362949

  1. Inferring Strain Mixture within Clinical Plasmodium falciparum Isolates from Genomic Sequence Data

    PubMed Central

    O’Brien, John D.; Amenga-Etego, Lucas

    2016-01-01

    We present a rigorous statistical model that infers the structure of P. falciparum mixtures—including the number of strains present, their proportion within the samples, and the amount of unexplained mixture—using whole genome sequence (WGS) data. Applied to simulation data, artificial laboratory mixtures, and field samples, the model provides reasonable inference with as few as 10 reads or 50 SNPs and works efficiently even with much larger data sets. Source code and example data for the model are provided in an open source fashion. We discuss the possible uses of this model as a window into within-host selection for clinical and epidemiological studies. PMID:27362949

  2. Population genetic inference from personal genome data: impact of ancestry and admixture on human genomic variation.

    PubMed

    Kidd, Jeffrey M; Gravel, Simon; Byrnes, Jake; Moreno-Estrada, Andres; Musharoff, Shaila; Bryc, Katarzyna; Degenhardt, Jeremiah D; Brisbin, Abra; Sheth, Vrunda; Chen, Rong; McLaughlin, Stephen F; Peckham, Heather E; Omberg, Larsson; Bormann Chung, Christina A; Stanley, Sarah; Pearlstein, Kevin; Levandowsky, Elizabeth; Acevedo-Acevedo, Suehelay; Auton, Adam; Keinan, Alon; Acuña-Alonzo, Victor; Barquera-Lozano, Rodrigo; Canizales-Quinteros, Samuel; Eng, Celeste; Burchard, Esteban G; Russell, Archie; Reynolds, Andy; Clark, Andrew G; Reese, Martin G; Lincoln, Stephen E; Butte, Atul J; De La Vega, Francisco M; Bustamante, Carlos D

    2012-10-01

    Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from 11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas-70% of the European ancestry in today's African Americans dates back to European gene flow happening only 7-8 generations ago. PMID:23040495

  3. Population Genetic Inference from Personal Genome Data: Impact of Ancestry and Admixture on Human Genomic Variation

    PubMed Central

    Kidd, Jeffrey M.; Gravel, Simon; Byrnes, Jake; Moreno-Estrada, Andres; Musharoff, Shaila; Bryc, Katarzyna; Degenhardt, Jeremiah D.; Brisbin, Abra; Sheth, Vrunda; Chen, Rong; McLaughlin, Stephen F.; Peckham, Heather E.; Omberg, Larsson; Bormann Chung, Christina A.; Stanley, Sarah; Pearlstein, Kevin; Levandowsky, Elizabeth; Acevedo-Acevedo, Suehelay; Auton, Adam; Keinan, Alon; Acuña-Alonzo, Victor; Barquera-Lozano, Rodrigo; Canizales-Quinteros, Samuel; Eng, Celeste; Burchard, Esteban G.; Russell, Archie; Reynolds, Andy; Clark, Andrew G.; Reese, Martin G.; Lincoln, Stephen E.; Butte, Atul J.; De La Vega, Francisco M.; Bustamante, Carlos D.

    2012-01-01

    Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from 11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas—70% of the European ancestry in today’s African Americans dates back to European gene flow happening only 7–8 generations ago. PMID:23040495

  4. Genomic inference of the metabolism of cosmopolitan subsurface Archaea, Hadesarchaea.

    PubMed

    Baker, Brett J; Saw, Jimmy H; Lind, Anders E; Lazar, Cassandre Sara; Hinrichs, Kai-Uwe; Teske, Andreas P; Ettema, Thijs J G

    2016-01-01

    The subsurface biosphere is largely unexplored and contains a broad diversity of uncultured microbes(1). Despite being one of the few prokaryotic lineages that is cosmopolitan in both the terrestrial and marine subsurface(2-4), the physiological and ecological roles of SAGMEG (South-African Gold Mine Miscellaneous Euryarchaeal Group) Archaea are unknown. Here, we report the metabolic capabilities of this enigmatic group as inferred from genomic reconstructions. Four high-quality (63-90% complete) genomes were obtained from White Oak River estuary and Yellowstone National Park hot spring sediment metagenomes. Phylogenomic analyses place SAGMEG Archaea as a deeply rooting sister clade of the Thermococci, leading us to propose the name Hadesarchaea for this new Archaeal class. With an estimated genome size of around 1.5 Mbp, the genomes of Hadesarchaea are distinctly streamlined, yet metabolically versatile. They share several physiological mechanisms with strict anaerobic Euryarchaeota. Several metabolic characteristics make them successful in the subsurface, including genes involved in CO and H2 oxidation (or H2 production), with potential coupling to nitrite reduction to ammonia (DNRA). This first glimpse into the metabolic capabilities of these cosmopolitan Archaea suggests they are mediating key geochemical processes and are specialized for survival in the subsurface biosphere. PMID:27572167

  5. Improved genome inference in the MHC using a population reference graph

    PubMed Central

    Dilthey, Alexander; Cox, Charles; Iqbal, Zamin; Nelson, Matthew R.; McVean, Gil

    2015-01-01

    While much is known about human genetic variation, such information is typically ignored in assembling novel genomes. Instead, reads are mapped to a single reference, which can lead to poor characterization of regions of high sequence or structural diversity. We introduce a population reference graph, which combines multiple reference sequences and catalogues of variation. The genomes of novel samples are reconstructed as paths through the graph using an efficient hidden Markov model, allowing for recombination between different haplotypes and additional variants. By applying the method to the 4.5Mb extended MHC region on human chromosome 6, combining eight assembled haplotypes, sequences of known classical HLA alleles and 87,640 SNP variants from the 1000 Genomes Project, we demonstrate, using simulations, SNP genotyping, short-read and long-read data, how the method improves the accuracy of genome inference and reveals regions where the current set of reference sequences is substantially incomplete. PMID:25915597

  6. Inferring Genome-Wide Recombination Landscapes from Advanced Intercross Lines: Application to Yeast Crosses

    PubMed Central

    Illingworth, Christopher J. R.; Parts, Leopold; Bergström, Anders; Liti, Gianni; Mustonen, Ville

    2013-01-01

    Accurate estimates of recombination rates are of great importance for understanding evolution. In an experimental genetic cross, recombination breaks apart and rejoins genetic material, such that the genomes of the resulting isolates are comprised of distinct blocks of differing parental origin. We here describe a method exploiting this fact to infer genome-wide recombination profiles from sequenced isolates from an advanced intercross line (AIL). We verified the accuracy of the method against simulated data. Next, we sequenced 192 isolates from a twelve-generation cross between West African and North American yeast Saccharomyces cerevisiae strains and inferred the underlying recombination landscape at a fine genomic resolution (mean segregating site distance 0.22 kb). Comparison was made with landscapes inferred for a similar cross between four yeast strains, and with a previous single-generation, intra-strain cross (Mancera et al., Nature 2008). Moderate congruence was identified between landscapes (correlation 0.58–0.77 at 5 kb resolution), albeit with variance between mean genome-wide recombination rates. The multiple generations of mating undergone in the AILs gave more precise inference of recombination rates than could be achieved from a single-generation cross, in particular in identifying recombination cold-spots. The recombination landscapes we describe have particular utility; both AILs are part of a resource to study complex yeast traits (see e.g. Parts et al., Genome Res 2011). Our results will enable future applications of this resource to take better account of local linkage structure heterogeneities. Our method has general applicability to other crossing experiments, including a variety of experimental designs. PMID:23658715

  7. Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data

    PubMed Central

    Pickrell, Joseph K.; Pritchard, Jonathan K.

    2012-01-01

    Many aspects of the historical relationships between populations in a species are reflected in genetic data. Inferring these relationships from genetic data, however, remains a challenging task. In this paper, we present a statistical model for inferring the patterns of population splits and mixtures in multiple populations. In our model, the sampled populations in a species are related to their common ancestor through a graph of ancestral populations. Using genome-wide allele frequency data and a Gaussian approximation to genetic drift, we infer the structure of this graph. We applied this method to a set of 55 human populations and a set of 82 dog breeds and wild canids. In both species, we show that a simple bifurcating tree does not fully describe the data; in contrast, we infer many migration events. While some of the migration events that we find have been detected previously, many have not. For example, in the human data, we infer that Cambodians trace approximately 16% of their ancestry to a population ancestral to other extant East Asian populations. In the dog data, we infer that both the boxer and basenji trace a considerable fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to domestication and that East Asian toy breeds (the Shih Tzu and the Pekingese) result from admixture between modern toy breeds and “ancient” Asian breeds. Software implementing the model described here, called TreeMix, is available at http://treemix.googlecode.com. PMID:23166502

  8. Solar structure: Models and inferences from helioseismology

    SciTech Connect

    Guzik, J.A.

    1998-12-31

    In this review the author summarizes results published during approximately the least three years concerning the state of one-dimensional solar interior modeling. She discusses the effects of refinements to the input physics, motivated by improving the agreement between calculated and observed solar oscillation frequencies, or between calculated and inferred solar structure. She has omitted two- and three-dimensional aspects of the solar structure, such as the rotation profile, detailed modeling of turbulent convection, and magnetic fields, although further progress in refining solar interior models may require including such two- and three-dimensional dynamical effects.

  9. Inferring the Dynamics of Effective Population Size Using Autosomal Genomes.

    PubMed

    Hou, Zheng; Luo, Yin; Wang, Zhisheng; Zheng, Hong-Xiang; Wang, Yi; Zhou, Hang; Wu, Leqin; Jin, Li

    2016-01-01

    Next-generation sequencing technology has provided a great opportunity for inferring human demographic history by investigating changes in the effective population size (Ne). In this report, we introduce a strategy for estimating Ne dynamics, allowing the exploration of large multi-locus SNP datasets. We applied this strategy to the Phase 1 Han Chinese samples from the 1000 Genomes Project. The Han Chinese population has undergone a continuous expansion since 25,000 years ago, at first slowly from about 7,300 to 9,800 (at the end of the last glacial maximum about 15,000 YBP), then more quickly to about 46,000 (at the beginning of the Neolithic about 8,000 YBP), and then even more quickly to reach a population size of about 140,000 (recently). PMID:26832887

  10. Inferring the Dynamics of Effective Population Size Using Autosomal Genomes

    PubMed Central

    Hou, Zheng; Luo, Yin; Wang, Zhisheng; Zheng, Hong-Xiang; Wang, Yi; Zhou, Hang; Wu, Leqin; Jin, Li

    2016-01-01

    Next-generation sequencing technology has provided a great opportunity for inferring human demographic history by investigating changes in the effective population size (Ne). In this report, we introduce a strategy for estimating Ne dynamics, allowing the exploration of large multi-locus SNP datasets. We applied this strategy to the Phase 1 Han Chinese samples from the 1000 Genomes Project. The Han Chinese population has undergone a continuous expansion since 25,000 years ago, at first slowly from about 7,300 to 9,800 (at the end of the last glacial maximum about 15,000 YBP), then more quickly to about 46,000 (at the beginning of the Neolithic about 8,000 YBP), and then even more quickly to reach a population size of about 140,000 (recently). PMID:26832887

  11. Nonparametric inference of network structure and dynamics

    NASA Astrophysics Data System (ADS)

    Peixoto, Tiago P.

    The network structure of complex systems determine their function and serve as evidence for the evolutionary mechanisms that lie behind them. Despite considerable effort in recent years, it remains an open challenge to formulate general descriptions of the large-scale structure of network systems, and how to reliably extract such information from data. Although many approaches have been proposed, few methods attempt to gauge the statistical significance of the uncovered structures, and hence the majority cannot reliably separate actual structure from stochastic fluctuations. Due to the sheer size and high-dimensionality of many networks, this represents a major limitation that prevents meaningful interpretations of the results obtained with such nonstatistical methods. In this talk, I will show how these issues can be tackled in a principled and efficient fashion by formulating appropriate generative models of network structure that can have their parameters inferred from data. By employing a Bayesian description of such models, the inference can be performed in a nonparametric fashion, that does not require any a priori knowledge or ad hoc assumptions about the data. I will show how this approach can be used to perform model comparison, and how hierarchical models yield the most appropriate trade-off between model complexity and quality of fit based on the statistical evidence present in the data. I will also show how this general approach can be elegantly extended to networks with edge attributes, that are embedded in latent spaces, and that change in time. The latter is obtained via a fully dynamic generative network model, based on arbitrary-order Markov chains, that can also be inferred in a nonparametric fashion. Throughout the talk I will illustrate the application of the methods with many empirical networks such as the internet at the autonomous systems level, the global airport network, the network of actors and films, social networks, citations among

  12. How to Infer Relative Fitness from a Sample of Genomic Sequences

    PubMed Central

    Dayarian, Adel; Shraiman, Boris I.

    2014-01-01

    Mounting evidence suggests that natural populations can harbor extensive fitness diversity with numerous genomic loci under selection. It is also known that genealogical trees for populations under selection are quantifiably different from those expected under neutral evolution and described statistically by Kingman’s coalescent. While differences in the statistical structure of genealogies have long been used as a test for the presence of selection, the full extent of the information that they contain has not been exploited. Here we demonstrate that the shape of the reconstructed genealogical tree for a moderately large number of random genomic samples taken from a fitness diverse, but otherwise unstructured, asexual population can be used to predict the relative fitness of individuals within the sample. To achieve this we define a heuristic algorithm, which we test in silico, using simulations of a Wright–Fisher model for a realistic range of mutation rates and selection strength. Our inferred fitness ranking is based on a linear discriminator that identifies rapidly coalescing lineages in the reconstructed tree. Inferred fitness ranking correlates strongly with actual fitness, with a genome in the top 10% ranked being in the top 20% fittest with false discovery rate of 0.1–0.3, depending on the mutation/selection parameters. The ranking also enables us to predict the genotypes that future populations inherit from the present one. While the inference accuracy increases monotonically with sample size, samples of 200 nearly saturate the performance. We propose that our approach can be used for inferring relative fitness of genomes obtained in single-cell sequencing of tumors and in monitoring viral outbreaks. PMID:24770330

  13. How to infer relative fitness from a sample of genomic sequences.

    PubMed

    Dayarian, Adel; Shraiman, Boris I

    2014-07-01

    Mounting evidence suggests that natural populations can harbor extensive fitness diversity with numerous genomic loci under selection. It is also known that genealogical trees for populations under selection are quantifiably different from those expected under neutral evolution and described statistically by Kingman's coalescent. While differences in the statistical structure of genealogies have long been used as a test for the presence of selection, the full extent of the information that they contain has not been exploited. Here we demonstrate that the shape of the reconstructed genealogical tree for a moderately large number of random genomic samples taken from a fitness diverse, but otherwise unstructured, asexual population can be used to predict the relative fitness of individuals within the sample. To achieve this we define a heuristic algorithm, which we test in silico, using simulations of a Wright-Fisher model for a realistic range of mutation rates and selection strength. Our inferred fitness ranking is based on a linear discriminator that identifies rapidly coalescing lineages in the reconstructed tree. Inferred fitness ranking correlates strongly with actual fitness, with a genome in the top 10% ranked being in the top 20% fittest with false discovery rate of 0.1-0.3, depending on the mutation/selection parameters. The ranking also enables us to predict the genotypes that future populations inherit from the present one. While the inference accuracy increases monotonically with sample size, samples of 200 nearly saturate the performance. We propose that our approach can be used for inferring relative fitness of genomes obtained in single-cell sequencing of tumors and in monitoring viral outbreaks. PMID:24770330

  14. Genomic and evolutionary inferences between American and global strains of porcine epidemic diarrhea virus.

    PubMed

    Jarvis, Matthew C; Lam, Ham Ching; Zhang, Yan; Wang, Leyi; Hesse, Richard A; Hause, Ben M; Vlasova, Anastasia; Wang, Qiuhong; Zhang, Jianqiang; Nelson, Martha I; Murtaugh, Michael P; Marthaler, Douglas

    2016-01-01

    Porcine epidemic diarrhea virus (PEDV) has caused severe economic losses both recently in the United States (US) and historically throughout Europe and Asia. Traditionally, analysis of the spike gene has been used to determine phylogenetic relationships between PEDV strains. We determined the complete genomes of 93 PEDV field samples from US swine and analyzed the data in conjunction with complete genome sequences available from GenBank (n=126) to determine the most variable genomic areas. Our results indicate high levels of variation within the ORF1 and spike regions while the C-terminal domains of structural genes were highly conserved. Analysis of the Receptor Binding Domains in the spike gene revealed a limited number of amino acid substitutions in US strains compared to Asian strains. Phylogenetic analysis of the complete genome sequence data revealed high rates of recombination, resulting in differing evolutionary patterns in phylogenies inferred for the spike region versus whole genomes. These finding suggest that significant genetic events outside of the spike region have contributed to the evolution of PEDV. PMID:26611651

  15. Phylogeny Inference of Closely Related Bacterial Genomes: Combining the Features of Both Overlapping Genes and Collinear Genomic Regions.

    PubMed

    Zhang, Yan-Cong; Lin, Kui

    2015-01-01

    Overlapping genes (OGs) represent one type of widespread genomic feature in bacterial genomes and have been used as rare genomic markers in phylogeny inference of closely related bacterial species. However, the inference may experience a decrease in performance for phylogenomic analysis of too closely or too distantly related genomes. Another drawback of OGs as phylogenetic markers is that they usually take little account of the effects of genomic rearrangement on the similarity estimation, such as intra-chromosome/genome translocations, horizontal gene transfer, and gene losses. To explore such effects on the accuracy of phylogeny reconstruction, we combine phylogenetic signals of OGs with collinear genomic regions, here called locally collinear blocks (LCBs). By putting these together, we refine our previous metric of pairwise similarity between two closely related bacterial genomes. As a case study, we used this new method to reconstruct the phylogenies of 88 Enterobacteriale genomes of the class Gammaproteobacteria. Our results demonstrated that the topological accuracy of the inferred phylogeny was improved when both OGs and LCBs were simultaneously considered, suggesting that combining these two phylogenetic markers may reduce, to some extent, the influence of gene loss on phylogeny inference. Such phylogenomic studies, we believe, will help us to explore a more effective approach to increasing the robustness of phylogeny reconstruction of closely related bacterial organisms. PMID:26715828

  16. Phylogeny Inference of Closely Related Bacterial Genomes: Combining the Features of Both Overlapping Genes and Collinear Genomic Regions

    PubMed Central

    Zhang, Yan-Cong; Lin, Kui

    2015-01-01

    Overlapping genes (OGs) represent one type of widespread genomic feature in bacterial genomes and have been used as rare genomic markers in phylogeny inference of closely related bacterial species. However, the inference may experience a decrease in performance for phylogenomic analysis of too closely or too distantly related genomes. Another drawback of OGs as phylogenetic markers is that they usually take little account of the effects of genomic rearrangement on the similarity estimation, such as intra-chromosome/genome translocations, horizontal gene transfer, and gene losses. To explore such effects on the accuracy of phylogeny reconstruction, we combine phylogenetic signals of OGs with collinear genomic regions, here called locally collinear blocks (LCBs). By putting these together, we refine our previous metric of pairwise similarity between two closely related bacterial genomes. As a case study, we used this new method to reconstruct the phylogenies of 88 Enterobacteriale genomes of the class Gammaproteobacteria. Our results demonstrated that the topological accuracy of the inferred phylogeny was improved when both OGs and LCBs were simultaneously considered, suggesting that combining these two phylogenetic markers may reduce, to some extent, the influence of gene loss on phylogeny inference. Such phylogenomic studies, we believe, will help us to explore a more effective approach to increasing the robustness of phylogeny reconstruction of closely related bacterial organisms. PMID:26715828

  17. Boolean Models of Genomic Regulatory Networks: Reduction Mappings, Inference, and External Control

    PubMed Central

    Ivanov, Ivan

    2009-01-01

    Computational modeling of genomic regulation has become an important focus of systems biology and genomic signal processing for the past several years. It holds the promise to uncover both the structure and dynamical properties of the complex gene, protein or metabolic networks responsible for the cell functioning in various contexts and regimes. This, in turn, will lead to the development of optimal intervention strategies for prevention and control of disease. At the same time, constructing such computational models faces several challenges. High complexity is one of the major impediments for the practical applications of the models. Thus, reducing the size/complexity of a model becomes a critical issue in problems such as model selection, construction of tractable subnetwork models, and control of its dynamical behavior. We focus on the reduction problem in the context of two specific models of genomic regulation: Boolean networks with perturbation (BNP) and probabilistic Boolean networks (PBN). We also compare and draw a parallel between the reduction problem and two other important problems of computational modeling of genomic networks: the problem of network inference and the problem of designing external control policies for intervention/altering the dynamics of the model. PMID:20190953

  18. A Molecular Phylogeny of Hemiptera Inferred from Mitochondrial Genome Sequences

    PubMed Central

    Song, Nan; Liang, Ai-Ping; Bu, Cui-Ping

    2012-01-01

    Classically, Hemiptera is comprised of two suborders: Homoptera and Heteroptera. Homoptera includes Cicadomorpha, Fulgoromorpha and Sternorrhyncha. However, according to previous molecular phylogenetic studies based on 18S rDNA, Fulgoromorpha has a closer relationship to Heteroptera than to other hemipterans, leaving Homoptera as paraphyletic. Therefore, the position of Fulgoromorpha is important for studying phylogenetic structure of Hemiptera. We inferred the evolutionary affiliations of twenty-five superfamilies of Hemiptera using mitochondrial protein-coding genes and rRNAs. We sequenced three mitogenomes, from Pyrops candelaria, Lycorma delicatula and Ricania marginalis, representing two additional families in Fulgoromorpha. Pyrops and Lycorma are representatives of an additional major family Fulgoridae in Fulgoromorpha, whereas Ricania is a second representative of the highly derived clade Ricaniidae. The organization and size of these mitogenomes are similar to those of the sequenced fulgoroid species. Our consensus phylogeny of Hemiptera largely supported the relationships (((Fulgoromorpha,Sternorrhyncha),Cicadomorpha),Heteroptera), and thus supported the classic phylogeny of Hemiptera. Selection of optimal evolutionary models (exclusion and inclusion of two rRNA genes or of third codon positions of protein-coding genes) demonstrated that rapidly evolving and saturated sites should be removed from the analyses. PMID:23144967

  19. Global efforts in structural genomics.

    PubMed

    Stevens, R C; Yokoyama, S; Wilson, I A

    2001-10-01

    A worldwide initiative in structural genomics aims to capitalize on the recent successes of the genome projects. Substantial new investments in structural genomics in the past 2 years indicate the high level of support for these international efforts. Already, enormous progress has been made on high-throughput methodologies and technologies that will speed up macromolecular structure determinations. Recent international meetings have resulted in the formation of an International Structural Genomics Organization to formulate policy and foster cooperation between the public and private efforts. PMID:11588249

  20. Genetic Variation Between Two Cucumber Genotypes Inferred from Genome-wide Microsatellite Polymorphism Analysis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Variability at microsatellite loci has been used widely to infer the extent of genetic diversity among related plant taxa. However, typically, only the most polymorphic loci in the genome were analyzed that may result in a biased, and generally overestimated picture of genome-wide microsatellite div...

  1. Informational laws of genome structures

    PubMed Central

    Bonnici, Vincenzo; Manca, Vincenzo

    2016-01-01

    In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined. PMID:27354155

  2. Informational laws of genome structures.

    PubMed

    Bonnici, Vincenzo; Manca, Vincenzo

    2016-01-01

    In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined. PMID:27354155

  3. Informational laws of genome structures

    NASA Astrophysics Data System (ADS)

    Bonnici, Vincenzo; Manca, Vincenzo

    2016-06-01

    In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined.

  4. Inferring Demography from Runs of Homozygosity in Whole-Genome Sequence, with Correction for Sequence Errors

    PubMed Central

    MacLeod, Iona M.; Larkin, Denis M.; Lewin, Harris A.; Hayes, Ben J.; Goddard, Mike E.

    2013-01-01

    Whole-genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using whole-genome sequence data from two Holstein cattle, we demonstrate a new method to correct for bias caused by hidden errors and then infer stepwise changes in ancestral demography up to present. There was a strong upward bias in estimates of recent effective population size (Ne) if the correction method was not applied to the data, both for our method and the Li and Durbin (Inference of human population history from individual whole-genome sequences. Nature 475:493–496) pairwise sequentially Markovian coalescent method. To infer demography, we use an analytical predictor of multiloci linkage disequilibrium (LD) based on a simple coalescent model that allows for changes in Ne. The LD statistic summarizes the distribution of runs of homozygosity for any given demography. We infer a best fit demography as one that predicts a match with the observed distribution of runs of homozygosity in the corrected sequence data. We use multiloci LD because it potentially holds more information about ancestral demography than pairwise LD. The inferred demography indicates a strong reduction in the Ne around 170,000 years ago, possibly related to the divergence of African and European Bos taurus cattle. This is followed by a further reduction coinciding with the period of cattle domestication, with Ne of between 3,500 and 6,000. The most recent reduction of Ne to approximately 100 in the Holstein breed agrees well with estimates from pedigrees. Our approach can be applied to whole-genome sequence from any diploid species and can be scaled up to use sequence from multiple individuals. PMID:23842528

  5. Using Genetic Distance to Infer the Accuracy of Genomic Prediction.

    PubMed

    Scutari, Marco; Mackay, Ian; Balding, David

    2016-09-01

    The prediction of phenotypic traits using high-density genomic data has many applications such as the selection of plants and animals of commercial interest; and it is expected to play an increasing role in medical diagnostics. Statistical models used for this task are usually tested using cross-validation, which implicitly assumes that new individuals (whose phenotypes we would like to predict) originate from the same population the genomic prediction model is trained on. In this paper we propose an approach based on clustering and resampling to investigate the effect of increasing genetic distance between training and target populations when predicting quantitative traits. This is important for plant and animal genetics, where genomic selection programs rely on the precision of predictions in future rounds of breeding. Therefore, estimating how quickly predictive accuracy decays is important in deciding which training population to use and how often the model has to be recalibrated. We find that the correlation between true and predicted values decays approximately linearly with respect to either FST or mean kinship between the training and the target populations. We illustrate this relationship using simulations and a collection of data sets from mice, wheat and human genetics. PMID:27589268

  6. Genome Size Variation and Species Relationships in Hieracium Sub-genus Pilosella (Asteraceae) as Inferred by Flow Cytometry

    PubMed Central

    Suda, Jan; Krahulcová, Anna; Trávníček, Pavel; Rosenbaumová, Radka; Peckert, Tomáš; Krahulec, František

    2007-01-01

    Background and Aims Hieracium sub-genus Pilosella (hawkweeds) is a taxonomically complicated group of vascular plants, the structure of which is substantially influenced by frequent interspecific hybridization and polyploidization. Two kinds of species, ‘basic’ and ‘intermediate’ (i.e. hybridogenous), are usually recognized. In this study, genome size variation was investigated in a representative set of Central European hawkweeds in order to assess the value of such a data set for species delineation and inference of evolutionary relationships. Methods Holoploid and monoploid genome sizes (C- and Cx-values) were determined using propidium iodide flow cytometry for 376 homogeneously cultivated individuals of Hieracium sub-genus Pilosella, including 24 species (271 individuals), five recent natural hybrids (seven individuals) and experimental F1 hybrids from four parental combinations (98 individuals). Chromosome counts were available for more than half of the plant accessions. Base composition (proportion of AT/GC bases) was cytometrically estimated in 73 individuals. Key Results Seven different ploidy levels (2x–8x) were detected, with intraspecific ploidy polymorphism (up to four different cytotypes) occurring in 11 wild species. Mean 2C-values varied approx. 4·3-fold from 3·53 pg in diploid H. hoppeanum to 15·30 pg in octoploid H. brachiatum. 1Cx-values ranged from 1·72 pg in H. pilosella to 2·16 pg in H. echioides (1·26-fold). The DNA content of (high) polyploids was usually proportional to the DNA values of their diploid/low polyploid counterparts, indicating lack of processes altering genome size (i.e. genome down-sizing). Most species showed constant nuclear DNA amounts, exceptions being three hybridogenous taxa, in which introgressive hybridization was suggested as a presumable trigger for genome size variation. Monoploid genome sizes of hybridogenous species were always between the corresponding values of their putative parents. In addition

  7. Inferring gene transcriptional modulatory relations: a genetical genomics approach

    SciTech Connect

    Li, Hongqiang; Lu, Lu; Manly, Kenneth; Chesler, Elissa J; Bao, Lei; Wang, Jintao; Zhou, Mi; Williams, Robert; Cui, Yan

    2005-01-01

    Bayesian network modeling is a promising approach to define and evaluate gene expression circuits in diverse tissues and cell types under different experimental conditions. The power and practicality of this approach can be improved by restricting the number of potential interactions among genes and by defining causal relations before evaluating posterior probabilities for billions of networks. A newly developed genetical genomics method that combines transcriptome profiling with complex trait analysis now provides strong constraints on network architecture. This method detects those chromosomal intervals responsible for differences in mRNA expression using quantitative trait locus (QTL) mapping. We have developed an efficient Bayesian approach that exploits the genetical genomics method to focus computational effort on the most plausible gene modulatory networks. We exploit a dense marker map for a genetic reference population (GRP) that consists of 32 BXD strains of mice made by intercrossing two progenitor strains- C57BL/6J and DBA/2J. These progenitors differ at 1.3 million known single nucleotide polymorphisms (SNPs), all of which can be exploited to estimate the probability that a gene contains functional polymorphisms that segregate within the GRP. We constructed 66 candidate networks that include all the candidate modulator genes located in the 209 statistically significant trans-acting QTL regions. SNPs that distinguish between the two progenitor strains were used to further winnow the list of candidate modulators. Bayesian network was then used to identify the genetic modulatory relations that best explain the microarray data.

  8. Inferring Epidemic Contact Structure from Phylogenetic Trees

    PubMed Central

    Leventhal, Gabriel E.; Kouyos, Roger; Stadler, Tanja; von Wyl, Viktor; Yerly, Sabine; Böni, Jürg; Cellerai, Cristina; Klimkait, Thomas; Günthard, Huldrych F.; Bonhoeffer, Sebastian

    2012-01-01

    Contact structure is believed to have a large impact on epidemic spreading and consequently using networks to model such contact structure continues to gain interest in epidemiology. However, detailed knowledge of the exact contact structure underlying real epidemics is limited. Here we address the question whether the structure of the contact network leaves a detectable genetic fingerprint in the pathogen population. To this end we compare phylogenies generated by disease outbreaks in simulated populations with different types of contact networks. We find that the shape of these phylogenies strongly depends on contact structure. In particular, measures of tree imbalance allow us to quantify to what extent the contact structure underlying an epidemic deviates from a null model contact network and illustrate this in the case of random mixing. Using a phylogeny from the Swiss HIV epidemic, we show that this epidemic has a significantly more unbalanced tree than would be expected from random mixing. PMID:22412361

  9. EMu: probabilistic inference of mutational processes and their localization in the cancer genome

    PubMed Central

    2013-01-01

    The spectrum of mutations discovered in cancer genomes can be explained by the activity of a few elementary mutational processes. We present a novel probabilistic method, EMu, to infer the mutational signatures of these processes from a collection of sequenced tumors. EMu naturally incorporates the tumor-specific opportunity for different mutation types according to sequence composition. Applying EMu to breast cancer data, we derive detailed maps of the activity of each process, both genome-wide and within specific local regions of the genome. Our work provides new opportunities to study the mutational processes underlying cancer development. EMu is available at http://www.sanger.ac.uk/resources/software/emu/. PMID:23628380

  10. Covariance Between Genotypic Effects and its Use for Genomic Inference in Half-Sib Families

    PubMed Central

    Wittenburg, Dörte; Teuscher, Friedrich; Klosa, Jan; Reinsch, Norbert

    2016-01-01

    In livestock, current statistical approaches utilize extensive molecular data, e.g., single nucleotide polymorphisms (SNPs), to improve the genetic evaluation of individuals. The number of model parameters increases with the number of SNPs, so the multicollinearity between covariates can affect the results obtained using whole genome regression methods. In this study, dependencies between SNPs due to linkage and linkage disequilibrium among the chromosome segments were explicitly considered in methods used to estimate the effects of SNPs. The population structure affects the extent of such dependencies, so the covariance among SNP genotypes was derived for half-sib families, which are typical in livestock populations. Conditional on the SNP haplotypes of the common parent (sire), the theoretical covariance was determined using the haplotype frequencies of the population from which the individual parent (dam) was derived. The resulting covariance matrix was included in a statistical model for a trait of interest, and this covariance matrix was then used to specify prior assumptions for SNP effects in a Bayesian framework. The approach was applied to one family in simulated scenarios (few and many quantitative trait loci) and using semireal data obtained from dairy cattle to identify genome segments that affect performance traits, as well as to investigate the impact on predictive ability. Compared with a method that does not explicitly consider any of the relationship among predictor variables, the accuracy of genetic value prediction was improved by 10–22%. The results show that the inclusion of dependence is particularly important for genomic inference based on small sample sizes. PMID:27402363

  11. The History of Slavs Inferred from Complete Mitochondrial Genome Sequences

    PubMed Central

    Mielnik-Sikorska, Marta; Daca, Patrycja; Malyarchuk, Boris; Derenko, Miroslava; Skonieczna, Katarzyna; Perkova, Maria; Dobosz, Tadeusz; Grzybowski, Tomasz

    2013-01-01

    To shed more light on the processes leading to crystallization of a Slavic identity, we investigated variability of complete mitochondrial genomes belonging to haplogroups H5 and H6 (63 mtDNA genomes) from the populations of Eastern and Western Slavs, including new samples of Poles, Ukrainians and Czechs presented here. Molecular dating implies formation of H5 approximately 11.5–16 thousand years ago (kya) in the areas of southern Europe. Within ancient haplogroup H6, dated at around 15–28 kya, there is a subhaplogroup H6c, which probably survived the last glaciation in Europe and has undergone expansion only 3–4 kya, together with the ancestors of some European groups, including the Slavs, because H6c has been detected in Czechs, Poles and Slovaks. Detailed analysis of complete mtDNAs allowed us to identify a number of lineages that seem specific for Central and Eastern Europe (H5a1f, H5a2, H5a1r, H5a1s, H5b4, H5e1a, H5u1, some subbranches of H5a1a and H6a1a9). Some of them could possibly be traced back to at least ∼4 kya, which indicates that some of the ancestors of today's Slavs (Poles, Czechs, Slovaks, Ukrainians and Russians) inhabited areas of Central and Eastern Europe much earlier than it was estimated on the basis of archaeological and historical data. We also sequenced entire mitochondrial genomes of several non-European lineages (A, C, D, G, L) found in contemporary populations of Poland and Ukraine. The analysis of these haplogroups confirms the presence of Siberian (C5c1, A8a1) and Ashkenazi-specific (L2a1l2a) mtDNA lineages in Slavic populations. Moreover, we were able to pinpoint some lineages which could possibly reflect the relatively recent contacts of Slavs with nomadic Altaic peoples (C4a1a, G2a, D5a2a1a1). PMID:23342138

  12. Inferring synaptic structure in presence of neural interaction time scales.

    PubMed

    Capone, Cristiano; Filosa, Carla; Gigante, Guido; Ricci-Tersenghi, Federico; Del Giudice, Paolo

    2015-01-01

    Biological networks display a variety of activity patterns reflecting a web of interactions that is complex both in space and time. Yet inference methods have mainly focused on reconstructing, from the network's activity, the spatial structure, by assuming equilibrium conditions or, more recently, a probabilistic dynamics with a single arbitrary time-step. Here we show that, under this latter assumption, the inference procedure fails to reconstruct the synaptic matrix of a network of integrate-and-fire neurons when the chosen time scale of interaction does not closely match the synaptic delay or when no single time scale for the interaction can be identified; such failure, moreover, exposes a distinctive bias of the inference method that can lead to infer as inhibitory the excitatory synapses with interaction time scales longer than the model's time-step. We therefore introduce a new two-step method, that first infers through cross-correlation profiles the delay-structure of the network and then reconstructs the synaptic matrix, and successfully test it on networks with different topologies and in different activity regimes. Although step one is able to accurately recover the delay-structure of the network, thus getting rid of any a priori guess about the time scales of the interaction, the inference method introduces nonetheless an arbitrary time scale, the time-bin dt used to binarize the spike trains. We therefore analytically and numerically study how the choice of dt affects the inference in our network model, finding that the relationship between the inferred couplings and the real synaptic efficacies, albeit being quadratic in both cases, depends critically on dt for the excitatory synapses only, whilst being basically independent of it for the inhibitory ones. PMID:25807389

  13. Inference of Homologous Recombination in Bacteria Using Whole-Genome Sequences

    PubMed Central

    Didelot, Xavier; Lawson, Daniel; Darling, Aaron; Falush, Daniel

    2010-01-01

    Bacteria and archaea reproduce clonally, but sporadically import DNA into their chromosomes from other organisms. In many of these events, the imported DNA replaces an homologous segment in the recipient genome. Here we present a new method to reconstruct the history of recombination events that affected a given sample of bacterial genomes. We introduce a mathematical model that represents both the donor and the recipient of each DNA import as an ancestor of the genomes in the sample. The model represents a simplification of the previously described coalescent with gene conversion. We implement a Monte Carlo Markov chain algorithm to perform inference under this model from sequence data alignments and show that inference is feasible for whole-genome alignments through parallelization. Using simulated data, we demonstrate accurate and reliable identification of individual recombination events and global recombination rate parameters. We applied our approach to an alignment of 13 whole genomes from the Bacillus cereus group. We find, as expected from laboratory experiments, that the recombination rate is higher between closely related organisms and also that the genome contains several broad regions of elevated levels of recombination. Application of the method to the genomic data sets that are becoming available should reveal the evolutionary history and private lives of populations of bacteria and archaea. The methods described in this article have been implemented in a computer software package, ClonalOrigin, which is freely available from http://code.google.com/p/clonalorigin/. PMID:20923983

  14. Sigma: Strain-level inference of genomes from metagenomic analysis for biosurveillance

    SciTech Connect

    Ahn, Tae-Hyuk; Chai, Juanjuan; Pan, Chongle

    2014-09-29

    Motivation: Metagenomic sequencing of clinical samples provides a promising technique for direct pathogen detection and characterization in biosurveillance. Taxonomic analysis at the strain level can be used to resolve serotypes of a pathogen in biosurveillance. Sigma was developed for strain-level identification and quantification of pathogens using their reference genomes based on metagenomic analysis. Results: Sigma provides not only accurate strain-level inferences, but also three unique capabilities: (i) Sigma quantifies the statistical uncertainty of its inferences, which includes hypothesis testing of identified genomes and confidence interval estimation of their relative abundances; (ii) Sigma enables strain variant calling by assigning metagenomic reads to their most likely reference genomes; and (iii) Sigma supports parallel computing for fast analysis of large datasets. In conclusion, the algorithm performance was evaluated using simulated mock communities and fecal samples with spike-in pathogen strains. Availability and Implementation: Sigma was implemented in C++ with source codes and binaries freely available at http://sigma.omicsbio.org.

  15. Structural Genomics of Protein Phosphatases

    SciTech Connect

    Almo,S.; Bonanno, J.; Sauder, J.; Emtage, S.; Dilorenzo, T.; Malashkevich, V.; Wasserman, S.; Swaminathan, S.; Eswaramoorthy, S.; et al

    2007-01-01

    The New York SGX Research Center for Structural Genomics (NYSGXRC) of the NIGMS Protein Structure Initiative (PSI) has applied its high-throughput X-ray crystallographic structure determination platform to systematic studies of all human protein phosphatases and protein phosphatases from biomedically-relevant pathogens. To date, the NYSGXRC has determined structures of 21 distinct protein phosphatases: 14 from human, 2 from mouse, 2 from the pathogen Toxoplasma gondii, 1 from Trypanosoma brucei, the parasite responsible for African sleeping sickness, and 2 from the principal mosquito vector of malaria in Africa, Anopheles gambiae. These structures provide insights into both normal and pathophysiologic processes, including transcriptional regulation, regulation of major signaling pathways, neural development, and type 1 diabetes. In conjunction with the contributions of other international structural genomics consortia, these efforts promise to provide an unprecedented database and materials repository for structure-guided experimental and computational discovery of inhibitors for all classes of protein phosphatases.

  16. Genealogical lineage sorting leads to significant, but incorrect Bayesian multilocus inference of population structure

    PubMed Central

    OROZCO-terWENGEL, PABLO; CORANDER, JUKKA; SCHLÖTTERER, CHRISTIAN

    2011-01-01

    Over the past decades, the use of molecular markers has revolutionized biology and led to the foundation of a new research discipline—phylogeography. Of particular interest has been the inference of population structure and biogeography. While initial studies focused on mtDNA as a molecular marker, it has become apparent that selection and genealogical lineage sorting could lead to erroneous inferences. As it is not clear to what extent these forces affect a given marker, it has become common practice to use the combined evidence from a set of molecular markers as an attempt to recover the signals that approximate the true underlying demography. Typically, the number of markers used is determined by either budget constraints or by statistical power required to recognize significant population differentiation. Using microsatellite markers from Drosophila and humans, we show that even large numbers of loci (>50) can frequently result in statistically well-supported, but incorrect inference of population structure using the software baps. Most importantly, genomic features, such as chromosomal location, variability of the markers, or recombination rate, cannot explain this observation. Instead, it can be attributed to sampling variation among loci with different realizations of the stochastic lineage sorting. This phenomenon is particularly pronounced for low levels of population differentiation. Our results have important implications for ongoing studies of population differentiation, as we unambiguously demonstrate that statistical significance of population structure inferred from a random set of genetic markers cannot necessarily be taken as evidence for a reliable demographic inference. PMID:21244537

  17. Structure and inference in annotated networks

    NASA Astrophysics Data System (ADS)

    Newman, M. E. J.; Clauset, Aaron

    2016-06-01

    For many networks of scientific interest we know both the connections of the network and information about the network nodes, such as the age or gender of individuals in a social network. Here we demonstrate how this `metadata' can be used to improve our understanding of network structure. We focus in particular on the problem of community detection in networks and develop a mathematically principled approach that combines a network and its metadata to detect communities more accurately than can be done with either alone. Crucially, the method does not assume that the metadata are correlated with the communities we are trying to find. Instead, the method learns whether a correlation exists and correctly uses or ignores the metadata depending on whether they contain useful information. We demonstrate our method on synthetic networks with known structure and on real-world networks, large and small, drawn from social, biological and technological domains.

  18. Structure and inference in annotated networks.

    PubMed

    Newman, M E J; Clauset, Aaron

    2016-01-01

    For many networks of scientific interest we know both the connections of the network and information about the network nodes, such as the age or gender of individuals in a social network. Here we demonstrate how this 'metadata' can be used to improve our understanding of network structure. We focus in particular on the problem of community detection in networks and develop a mathematically principled approach that combines a network and its metadata to detect communities more accurately than can be done with either alone. Crucially, the method does not assume that the metadata are correlated with the communities we are trying to find. Instead, the method learns whether a correlation exists and correctly uses or ignores the metadata depending on whether they contain useful information. We demonstrate our method on synthetic networks with known structure and on real-world networks, large and small, drawn from social, biological and technological domains. PMID:27306566

  19. Structure and inference in annotated networks

    PubMed Central

    Newman, M. E. J.; Clauset, Aaron

    2016-01-01

    For many networks of scientific interest we know both the connections of the network and information about the network nodes, such as the age or gender of individuals in a social network. Here we demonstrate how this ‘metadata' can be used to improve our understanding of network structure. We focus in particular on the problem of community detection in networks and develop a mathematically principled approach that combines a network and its metadata to detect communities more accurately than can be done with either alone. Crucially, the method does not assume that the metadata are correlated with the communities we are trying to find. Instead, the method learns whether a correlation exists and correctly uses or ignores the metadata depending on whether they contain useful information. We demonstrate our method on synthetic networks with known structure and on real-world networks, large and small, drawn from social, biological and technological domains. PMID:27306566

  20. Comparative Analysis of Mitochondrial Genomes in Diplura (Hexapoda, Arthropoda): Taxon Sampling Is Crucial for Phylogenetic Inferences

    PubMed Central

    Chen, Wan-Jun; Koch, Markus; Mallatt, Jon M.; Luan, Yun-Xia

    2014-01-01

    Two-pronged bristletails (Diplura) are traditionally classified into three major superfamilies: Campodeoidea, Projapygoidea, and Japygoidea. The interrelationships of these three superfamilies and the monophyly of Diplura have been much debated. Few previous studies included Projapygoidea in their phylogenetic considerations, and its position within Diplura still is a puzzle from both morphological and molecular points of view. Until now, no mitochondrial genome has been sequenced for any projapygoid species. To fill in this gap, we determined and annotated the complete mitochondrial genome of Octostigma sinensis (Octostigmatidae, Projapygoidea), and of three more dipluran species, one each from the Campodeidae, Parajapygidae, and Japygidae. All four newly sequenced dipluran mtDNAs encode the same set of genes in the same gene order as shared by most crustaceans and hexapods. Secondary structure truncations have occurred in trnR, trnC, trnS1, and trnS2, and the reduction of transfer RNA D-arms was found to be taxonomically correlated, with Campodeoidea having experienced the most reduction. Partitioned phylogenetic analyses, based on both amino acids and nucleotides of the protein-coding genes plus the ribosomal RNA genes, retrieve significant support for a monophyletic Diplura within Pancrustacea, with Projapygoidea more closely related to Campodeoidea than to Japygoidea. Another key finding is that monophyly of Diplura cannot be recovered unless Projapygoidea is included in the phylogenetic analyses; this explains the dipluran polyphyly found by past mitogenomic studies. Including Projapygoidea increased the sampling density within Diplura and probably helped by breaking up a long-branch-attraction artifact. This finding provides an example of how proper sampling is significant for phylogenetic inference. PMID:24391151

  1. Unmet Challenges of Structural Genomics

    PubMed Central

    Chruszcz, Maksymilian; Domagalski, Marcin; Osinski, Tomasz; Wlodawer, Alexander; Minor, Wladek

    2010-01-01

    Summary Structural genomics (SG) programs have developed during the last decade many novel methodologies for faster and more accurate structure determination. These new tools and approaches led to determination of thousands of protein structures. The generation of enormous amounts of experimental data resulted in significant improvements in the understanding of many biological processes at molecular levels. However, the amount of data collected so far is so large that traditional analysis methods are limiting the rate of extraction of biological and biochemical information from 3-D models. This situation has prompted us to review the challenges that remain unmet by structural genomics, as well as the areas in which the potential impact of SG could exceed what has been achieved so far. PMID:20810277

  2. Systems modeling approaches for microbial community studies: from metagenomics to inference of the community structure.

    PubMed

    Hanemaaijer, Mark; Röling, Wilfred F M; Olivier, Brett G; Khandelwal, Ruchir A; Teusink, Bas; Bruggeman, Frank J

    2015-01-01

    Microbial communities play important roles in health, industrial applications and earth's ecosystems. With current molecular techniques we can characterize these systems in unprecedented detail. However, such methods provide little mechanistic insight into how the genetic properties and the dynamic couplings between individual microorganisms give rise to their dynamic activities. Neither do they give insight into what we call "the community state", that is the fluxes and concentrations of nutrients within the community. This knowledge is a prerequisite for rational control and intervention in microbial communities. Therefore, the inference of the community structure from experimental data is a major current challenge. We will argue that this inference problem requires mathematical models that can integrate heterogeneous experimental data with existing knowledge. We propose that two types of models are needed. Firstly, mathematical models that integrate existing genomic, physiological, and physicochemical information with metagenomics data so as to maximize information content and predictive power. This can be achieved with the use of constraint-based genome-scale stoichiometric modeling of community metabolism which is ideally suited for this purpose. Next, we propose a simpler coarse-grained model, which is tailored to solve the inference problem from the experimental data. This model unambiguously relate to the more detailed genome-scale stoichiometric models which act as heterogeneous data integrators. The simpler inference models are, in our opinion, key to understanding microbial ecosystems, yet until now, have received remarkably little attention. This has led to the situation where the modeling of microbial communities, using only genome-scale models is currently more a computational, theoretical exercise than a method useful to the experimentalist. PMID:25852671

  3. An application of collaborative targeted maximum likelihood estimation in causal inference and genomics.

    PubMed

    Gruber, Susan; van der Laan, Mark J

    2010-01-01

    A concrete example of the collaborative double-robust targeted likelihood estimator (C-TMLE) introduced in a companion article in this issue is presented, and applied to the estimation of causal effects and variable importance parameters in genomic data. The focus is on non-parametric estimation in a point treatment data structure. Simulations illustrate the performance of C-TMLE relative to current competitors such as the augmented inverse probability of treatment weighted estimator that relies on an external non-collaborative estimator of the treatment mechanism, and inefficient estimation procedures including propensity score matching and standard inverse probability of treatment weighting. C-TMLE is also applied to the estimation of the covariate-adjusted marginal effect of individual HIV mutations on resistance to the anti-retroviral drug lopinavir. The influence curve of the C-TMLE is used to establish asymptotically valid statistical inference. The list of mutations found to have a statistically significant association with resistance is in excellent agreement with mutation scores provided by the Stanford HIVdb mutation scores database. PMID:21731530

  4. An Application of Collaborative Targeted Maximum Likelihood Estimation in Causal Inference and Genomics

    PubMed Central

    Gruber, Susan; van der Laan, Mark J.

    2010-01-01

    A concrete example of the collaborative double-robust targeted likelihood estimator (C-TMLE) introduced in a companion article in this issue is presented, and applied to the estimation of causal effects and variable importance parameters in genomic data. The focus is on non-parametric estimation in a point treatment data structure. Simulations illustrate the performance of C-TMLE relative to current competitors such as the augmented inverse probability of treatment weighted estimator that relies on an external non-collaborative estimator of the treatment mechanism, and inefficient estimation procedures including propensity score matching and standard inverse probability of treatment weighting. C-TMLE is also applied to the estimation of the covariate-adjusted marginal effect of individual HIV mutations on resistance to the anti-retroviral drug lopinavir. The influence curve of the C-TMLE is used to establish asymptotically valid statistical inference. The list of mutations found to have a statistically significant association with resistance is in excellent agreement with mutation scores provided by the Stanford HIVdb mutation scores database. PMID:21731530

  5. Structural variations in plant genomes

    PubMed Central

    Edwards, David; Varshney, Rajeev K.

    2014-01-01

    Differences between plant genomes range from single nucleotide polymorphisms to large-scale duplications, deletions and rearrangements. The large polymorphisms are termed structural variants (SVs). SVs have received significant attention in human genetics and were found to be responsible for various chronic diseases. However, little effort has been directed towards understanding the role of SVs in plants. Many recent advances in plant genetics have resulted from improvements in high-resolution technologies for measuring SVs, including microarray-based techniques, and more recently, high-throughput DNA sequencing. In this review we describe recent reports of SV in plants and describe the genomic technologies currently used to measure these SVs. PMID:24907366

  6. Higher-level phylogeny of paraneopteran insects inferred from mitochondrial genome sequences

    PubMed Central

    Li, Hu; Shao, Renfu; Song, Nan; Song, Fan; Jiang, Pei; Li, Zhihong; Cai, Wanzhi

    2015-01-01

    Mitochondrial (mt) genome data have been proven to be informative for animal phylogenetic studies but may also suffer from systematic errors, due to the effects of accelerated substitution rate and compositional heterogeneity. We analyzed the mt genomes of 25 insect species from the four paraneopteran orders, aiming to better understand how accelerated substitution rate and compositional heterogeneity affect the inferences of the higher-level phylogeny of this diverse group of hemimetabolous insects. We found substantial heterogeneity in base composition and contrasting rates in nucleotide substitution among these paraneopteran insects, which complicate the inference of higher-level phylogeny. The phylogenies inferred with concatenated sequences of mt genes using maximum likelihood and Bayesian methods and homogeneous models failed to recover Psocodea and Hemiptera as monophyletic groups but grouped, instead, the taxa that had accelerated substitution rates together, including Sternorrhyncha (a suborder of Hemiptera), Thysanoptera, Phthiraptera and Liposcelididae (a family of Psocoptera). Bayesian inference with nucleotide sequences and heterogeneous models (CAT and CAT + GTR), however, recovered Psocodea, Thysanoptera and Hemiptera each as a monophyletic group. Within Psocodea, Liposcelididae is more closely related to Phthiraptera than to other species of Psocoptera. Furthermore, Thysanoptera was recovered as the sister group to Hemiptera. PMID:25704094

  7. Inference of gorilla demographic and selective history from whole-genome sequence data.

    PubMed

    McManus, Kimberly F; Kelley, Joanna L; Song, Shiya; Veeramah, Krishna R; Woerner, August E; Stevison, Laurie S; Ryder, Oliver A; Ape Genome Project, Great; Kidd, Jeffrey M; Wall, Jeffrey D; Bustamante, Carlos D; Hammer, Michael F

    2015-03-01

    Although population-level genomic sequence data have been gathered extensively for humans, similar data from our closest living relatives are just beginning to emerge. Examination of genomic variation within great apes offers many opportunities to increase our understanding of the forces that have differentially shaped the evolutionary history of hominid taxa. Here, we expand upon the work of the Great Ape Genome Project by analyzing medium to high coverage whole-genome sequences from 14 western lowland gorillas (Gorilla gorilla gorilla), 2 eastern lowland gorillas (G. beringei graueri), and a single Cross River individual (G. gorilla diehli). We infer that the ancestors of western and eastern lowland gorillas diverged from a common ancestor approximately 261 ka, and that the ancestors of the Cross River population diverged from the western lowland gorilla lineage approximately 68 ka. Using a diffusion approximation approach to model the genome-wide site frequency spectrum, we infer a history of western lowland gorillas that includes an ancestral population expansion of 1.4-fold around 970 ka and a recent 5.6-fold contraction in population size 23 ka. The latter may correspond to a major reduction in African equatorial forests around the Last Glacial Maximum. We also analyze patterns of variation among western lowland gorillas to identify several genomic regions with strong signatures of recent selective sweeps. We find that processes related to taste, pancreatic and saliva secretion, sodium ion transmembrane transport, and cardiac muscle function are overrepresented in genomic regions predicted to have experienced recent positive selection. PMID:25534031

  8. Inference of Gorilla Demographic and Selective History from Whole-Genome Sequence Data

    PubMed Central

    McManus, Kimberly F.; Kelley, Joanna L.; Song, Shiya; Veeramah, Krishna R.; Woerner, August E.; Stevison, Laurie S.; Ryder, Oliver A.; Ape Genome Project, Great; Kidd, Jeffrey M.; Wall, Jeffrey D.; Bustamante, Carlos D.; Hammer, Michael F.

    2015-01-01

    Although population-level genomic sequence data have been gathered extensively for humans, similar data from our closest living relatives are just beginning to emerge. Examination of genomic variation within great apes offers many opportunities to increase our understanding of the forces that have differentially shaped the evolutionary history of hominid taxa. Here, we expand upon the work of the Great Ape Genome Project by analyzing medium to high coverage whole-genome sequences from 14 western lowland gorillas (Gorilla gorilla gorilla), 2 eastern lowland gorillas (G. beringei graueri), and a single Cross River individual (G. gorilla diehli). We infer that the ancestors of western and eastern lowland gorillas diverged from a common ancestor approximately 261 ka, and that the ancestors of the Cross River population diverged from the western lowland gorilla lineage approximately 68 ka. Using a diffusion approximation approach to model the genome-wide site frequency spectrum, we infer a history of western lowland gorillas that includes an ancestral population expansion of 1.4-fold around 970 ka and a recent 5.6-fold contraction in population size 23 ka. The latter may correspond to a major reduction in African equatorial forests around the Last Glacial Maximum. We also analyze patterns of variation among western lowland gorillas to identify several genomic regions with strong signatures of recent selective sweeps. We find that processes related to taste, pancreatic and saliva secretion, sodium ion transmembrane transport, and cardiac muscle function are overrepresented in genomic regions predicted to have experienced recent positive selection. PMID:25534031

  9. Causal inference and the hierarchical structure of experience

    PubMed Central

    Johnson, Samuel G. B.; Keil, Frank C.

    2014-01-01

    Children and adults make rich causal inferences about the physical and social world, even in novel situations where they cannot rely on prior knowledge of causal mechanisms. We propose that this capacity is supported in part by constraints provided by event structure—the cognitive organization of experience into discrete events that are hierarchically organized. These event-structured causal inferences are guided by a level-matching principle, with events conceptualized at one level of an event hierarchy causally matched to other events at that same level, and a boundary-blocking principle, with events causally matched to other events that are parts of the same superordinate event. These principles are used to constrain inferences about plausible causal candidates in unfamiliar situations, both in diagnosing causes (Experiment 1) and predicting effects (Experiment 2). The results could not be explained by construal level (Experiment 3) or similarity-matching (Experiment 4), and were robust across a variety of physical and social causal systems. Taken together, these experiments demonstrate a novel way in which non-causal information we extract from the environment can help to constrain inferences about causal structure. PMID:25347533

  10. ClonalFrameML: Efficient Inference of Recombination in Whole Bacterial Genomes

    PubMed Central

    Didelot, Xavier; Wilson, Daniel J.

    2015-01-01

    Recombination is an important evolutionary force in bacteria, but it remains challenging to reconstruct the imports that occurred in the ancestry of a genomic sample. Here we present ClonalFrameML, which uses maximum likelihood inference to simultaneously detect recombination in bacterial genomes and account for it in phylogenetic reconstruction. ClonalFrameML can analyse hundreds of genomes in a matter of hours, and we demonstrate its usefulness on simulated and real datasets. We find evidence for recombination hotspots associated with mobile elements in Clostridium difficile ST6 and a previously undescribed 310kb chromosomal replacement in Staphylococcus aureus ST582. ClonalFrameML is freely available at http://clonalframeml.googlecode.com/. PMID:25675341

  11. Functional Insights from Structural Genomics

    SciTech Connect

    Forouhar,F.; Kuzin, A.; Seetharaman, J.; Lee, I.; Zhou, W.; Abashidze, M.; Chen, Y.; Montelione, G.; Tong, L.; et al

    2007-01-01

    Structural genomics efforts have produced structural information, either directly or by modeling, for thousands of proteins over the past few years. While many of these proteins have known functions, a large percentage of them have not been characterized at the functional level. The structural information has provided valuable functional insights on some of these proteins, through careful structural analyses, serendipity, and structure-guided functional screening. Some of the success stories based on structures solved at the Northeast Structural Genomics Consortium (NESG) are reported here. These include a novel methyl salicylate esterase with important role in plant innate immunity, a novel RNA methyltransferase (H. influenzae yggJ (HI0303)), a novel spermidine/spermine N-acetyltransferase (B. subtilis PaiA), a novel methyltransferase or AdoMet binding protein (A. fulgidus AF{_}0241), an ATP:cob(I)alamin adenosyltransferase (B. subtilis YvqK), a novel carboxysome pore (E. coli EutN), a proline racemase homolog with a disrupted active site (B. melitensis BME11586), an FMN-dependent enzyme (S. pneumoniae SP{_}1951), and a 12-stranded {beta}-barrel with a novel fold (V. parahaemolyticus VPA1032).

  12. Extensive error in the number of genes inferred from draft genome assemblies.

    PubMed

    Denton, James F; Lugo-Martinez, Jose; Tucker, Abraham E; Schrider, Daniel R; Warren, Wesley C; Hahn, Matthew W

    2014-12-01

    Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process. PMID:25474019

  13. 2004 Structural, Function and Evolutionary Genomics

    SciTech Connect

    Douglas L. Brutlag Nancy Ryan Gray

    2005-03-23

    This Gordon conference will cover the areas of structural, functional and evolutionary genomics. It will take a systematic approach to genomics, examining the evolution of proteins, protein functional sites, protein-protein interactions, regulatory networks, and metabolic networks. Emphasis will be placed on what we can learn from comparative genomics and entire genomes and proteomes.

  14. Structural damage identification using piezoelectric impedance and Bayesian inference

    NASA Astrophysics Data System (ADS)

    Shuai, Q.; Zhou, K.; Tang, J.

    2015-04-01

    Structural damage identification is a challenging subject in the structural health monitoring research. The piezoelectric impedance-based damage identification, which usually utilizes the matrix inverse-based optimization, may in theory identify the damage location and damage severity. However, the sensitivity matrix is oftentimes ill-conditioned in practice, since the number of unknowns may far exceed the useful measurements/inputs. In this research, a new method based on intelligent inference framework for damage identification is presented. Bayesian inference is used to directly predict damage location and severity using impedance measurement through forward prediction and comparison. Gaussian process is employed to enrich the forward analysis result, thereby reducing computational cost. Case study is carried out to illustrate the identification performance.

  15. Novel multi-sample scheme for inferring phylogenetic markers from whole genome tumor profiles

    PubMed Central

    Subramanian, Ayshwarya; Shackney, Stanley; Schwartz, Russell

    2013-01-01

    Computational cancer phylogenetics seeks to enumerate the temporal sequences of aberrations in tumor evolution, thereby delineating the evolution of possible tumor progression pathways, molecular subtypes and mechanisms of action. We previously developed a pipeline for constructing phylogenies describing evolution between major recurring cell types computationally inferred from whole-genome tumor profiles. The accuracy and detail of the phylogenies, however, depends on the identification of accurate, high-resolution molecular markers of progression, i.e., reproducible regions of aberration that robustly differentiate different subtypes and stages of progression. Here we present a novel hidden Markov model (HMM) scheme for the problem of inferring such phylogenetically significant markers through joint segmentation and calling of multi-sample tumor data. Our method classifies sets of genome-wide DNA copy number measurements into a partitioning of samples into normal (diploid) or amplified at each probe. It differs from other similar HMM methods in its design specifically for the needs of tumor phylogenetics, by seeking to identify robust markers of progression conserved across a set of copy number profiles. We show an analysis of our method in comparison to other methods on both synthetic and real tumor data, which confirms its effectiveness for tumor phylogeny inference and suggests avenues for future advances. PMID:24407301

  16. Sigma: Strain-level inference of genomes from metagenomic analysis for biosurveillance

    DOE PAGESBeta

    Ahn, Tae-Hyuk; Chai, Juanjuan; Pan, Chongle

    2014-09-29

    Motivation: Metagenomic sequencing of clinical samples provides a promising technique for direct pathogen detection and characterization in biosurveillance. Taxonomic analysis at the strain level can be used to resolve serotypes of a pathogen in biosurveillance. Sigma was developed for strain-level identification and quantification of pathogens using their reference genomes based on metagenomic analysis. Results: Sigma provides not only accurate strain-level inferences, but also three unique capabilities: (i) Sigma quantifies the statistical uncertainty of its inferences, which includes hypothesis testing of identified genomes and confidence interval estimation of their relative abundances; (ii) Sigma enables strain variant calling by assigning metagenomic readsmore » to their most likely reference genomes; and (iii) Sigma supports parallel computing for fast analysis of large datasets. In conclusion, the algorithm performance was evaluated using simulated mock communities and fecal samples with spike-in pathogen strains. Availability and Implementation: Sigma was implemented in C++ with source codes and binaries freely available at http://sigma.omicsbio.org.« less

  17. Inferring Bottlenecks from Genome-Wide Samples of Short Sequence Blocks.

    PubMed

    Bunnefeld, Lynsey; Frantz, Laurent A F; Lohse, Konrad

    2015-11-01

    The advent of the genomic era has necessitated the development of methods capable of analyzing large volumes of genomic data efficiently. Being able to reliably identify bottlenecks--extreme population size changes of short duration--not only is interesting in the context of speciation and extinction but also matters (as a null model) when inferring selection. Bottlenecks can be detected in polymorphism data via their distorting effect on the shape of the underlying genealogy. Here, we use the generating function of genealogies to derive the probability of mutational configurations in short sequence blocks under a simple bottleneck model. Given a large number of nonrecombining blocks, we can compute maximum-likelihood estimates of the time and strength of the bottleneck. Our method relies on a simple summary of the joint distribution of polymorphic sites. We extend the site frequency spectrum by counting mutations in frequency classes in short sequence blocks. Using linkage information over short distances in this way gives greater power to detect bottlenecks than the site frequency spectrum and potentially opens up a wide range of demographic histories to blockwise inference. Finally, we apply our method to genomic data from a species of pig (Sus cebifrons) endemic to islands in the center and west of the Philippines to estimate whether a bottleneck occurred upon island colonization and compare our scheme to Li and Durbin's pairwise sequentially Markovian coalescent (PSMC) both for the pig data and using simulations. PMID:26341659

  18. RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach

    PubMed Central

    Novichkov, Pavel S.; Rodionov, Dmitry A.; Stavrovskaya, Elena D.; Novichkova, Elena S.; Kazakov, Alexey E.; Gelfand, Mikhail S.; Arkin, Adam P.; Mironov, Andrey A.; Dubchak, Inna

    2010-01-01

    RegPredict web server is designed to provide comparative genomics tools for reconstruction and analysis of microbial regulons using comparative genomics approach. The server allows the user to rapidly generate reference sets of regulons and regulatory motif profiles in a group of prokaryotic genomes. The new concept of a cluster of co-regulated orthologous operons allows the user to distribute the analysis of large regulons and to perform the comparative analysis of multiple clusters independently. Two major workflows currently implemented in RegPredict are: (i) regulon reconstruction for a known regulatory motif and (ii) ab initio inference of a novel regulon using several scenarios for the generation of starting gene sets. RegPredict provides a comprehensive collection of manually curated positional weight matrices of regulatory motifs. It is based on genomic sequences, ortholog and operon predictions from the MicrobesOnline. An interactive web interface of RegPredict integrates and presents diverse genomic and functional information about the candidate regulon members from several web resources. RegPredict is freely accessible at http://regpredict.lbl.gov. PMID:20542910

  19. RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach

    SciTech Connect

    Novichkov, Pavel S.; Rodionov, Dmitry A.; Stavrovskaya, Elena D.; Novichkova, Elena S.; Kazakov, Alexey E.; Gelfand, Mikhail S.; Arkin, Adam P.; Mironov, Andrey A.; Dubchak, Inna

    2010-05-26

    RegPredict web server is designed to provide comparative genomics tools for reconstruction and analysis of microbial regulons using comparative genomics approach. The server allows the user to rapidly generate reference sets of regulons and regulatory motif profiles in a group of prokaryotic genomes. The new concept of a cluster of co-regulated orthologous operons allows the user to distribute the analysis of large regulons and to perform the comparative analysis of multiple clusters independently. Two major workflows currently implemented in RegPredict are: (i) regulon reconstruction for a known regulatory motif and (ii) ab initio inference of a novel regulon using several scenarios for the generation of starting gene sets. RegPredict provides a comprehensive collection of manually curated positional weight matrices of regulatory motifs. It is based on genomic sequences, ortholog and operon predictions from the MicrobesOnline. An interactive web interface of RegPredict integrates and presents diverse genomic and functional information about the candidate regulon members from several web resources. RegPredict is freely accessible at http://regpredict.lbl.gov.

  20. Simulated tornado debris tracks: implications for inferring corner flow structure

    NASA Astrophysics Data System (ADS)

    Zimmerman, Michael; Lewellen, David

    2011-11-01

    A large collection of three-dimensional large eddy simulations of tornadoes with fine debris have been recently been performed as part of a longstanding effort at West Virginia University to understand tornado corner flow structure and dynamics. Debris removal and deposition is accounted for at the surface, in effect simulating formation of tornado surface marks. Physical origins and properties of the most prominent marks will be presented, and the possibility of inferring tornado corner flow structure from real marks in the field will be discussed. This material is based upon work supported by the National Science Foundation under Grants No. 0635681 and AGS-1013154.

  1. Inferring Where and When Replication Initiates from Genome-Wide Replication Timing Data

    NASA Astrophysics Data System (ADS)

    Baker, A.; Audit, B.; Yang, S. C.-H.; Bechhoefer, J.; Arneodo, A.

    2012-06-01

    Based on an analogy between DNA replication and one dimensional nucleation-and-growth processes, various attempts to infer the local initiation rate I(x,t) of DNA replication origins from replication timing data have been developed in the framework of phase transition kinetics theories. These works have all used curve-fit strategies to estimate I(x,t) from genome-wide replication timing data. Here, we show how to invert analytically the Kolmogorov-Johnson-Mehl-Avrami model and extract I(x,t) directly. Tests on both simulated and experimental budding-yeast data confirm the location and firing-time distribution of replication origins.

  2. Inferring human population size and separation history from multiple genome sequences

    PubMed Central

    Schiffels, Stephan; Durbin, Richard

    2014-01-01

    The availability of complete human genome sequences from populations across the world has given rise to new population genetic inference methods that explicitly model their ancestral relationship under recombination and mutation. So far, application of these methods to evolutionary history more recent than 20-30 thousand years ago and to population separations has been limited. Here we present a new method that overcomes these shortcomings. The Multiple Sequentially Markovian Coalescent (MSMC) analyses the observed pattern of mutations in multiple individuals, focusing on the first coalescence between any two individuals. Results from applying MSMC to genome sequences from nine populations across the world suggest that the genetic separation of non-African ancestors from African Yoruban ancestors started long before 50,000 years ago, and give information about human population history as recently as 2,000 years ago, including the bottleneck in the peopling of the Americas, and separations within Africa, East Asia and Europe. PMID:24952747

  3. Proteomics-inferred genome typing (PIGT) demonstrates inter-populationrecombination as a strategy for environmental adaptation

    SciTech Connect

    Denef, Vincent; Verberkmoes, Nathan C; Shah, Manesh B; Abraham, Paul E; Lefsrud, Mark G; Hettich, Robert {Bob} L; Banfield, Jillian F.

    2009-01-01

    Analyses of ecological and evolutionary processes that shape microbial consortia are facilitated by comprehensive studies of ecosystems with low species richness. In the current study we evaluated the role of recombination in altering the fitness of chemoautotrophic bacteria in their natural environment. Proteomics-inferred genome typing (PIGT) was used to determine the genomic make-up of Leptospirillum group II populations in 27 biofilms sampled from six locations in the Richmond Mine acid mine drainage system (Iron Mountain, CA) over a four-year period. We observed six distinct genotypes that are recombinants comprised of segments from two parental genotypes. Community genomic analyses revealed additional low abundance recombinant variants. The dominance of some genotypes despite a larger available genome pool, and patterns of spatiotemporal distribution within the ecosystem, indicate selection for distinct recombinants. Genes involved in motility, signal transduction and transport were overrepresented in the tens to hundreds of kilobase recombinant blocks, whereas core metabolic functions were significantly underrepresented. Our findings demonstrate the power of PIGT and reveal that recombination is a mechanism for fine-scale adaptation in this system.

  4. Redundancy-weighting for better inference of protein structural features

    PubMed Central

    Yanover, Chen; Vanetik, Natalia; Levitt, Michael; Kolodny, Rachel; Keasar, Chen

    2014-01-01

    Motivation: Structural knowledge, extracted from the Protein Data Bank (PDB), underlies numerous potential functions and prediction methods. The PDB, however, is highly biased: many proteins have more than one entry, while entire protein families are represented by a single structure, or even not at all. The standard solution to this problem is to limit the studies to non-redundant subsets of the PDB. While alleviating biases, this solution hides the many-to-many relations between sequences and structures. That is, non-redundant datasets conceal the diversity of sequences that share the same fold and the existence of multiple conformations for the same protein. A particularly disturbing aspect of non-redundant subsets is that they hardly benefit from the rapid pace of protein structure determination, as most newly solved structures fall within existing families. Results: In this study we explore the concept of redundancy-weighted datasets, originally suggested by Miyazawa and Jernigan. Redundancy-weighted datasets include all available structures and associate them (or features thereof) with weights that are inversely proportional to the number of their homologs. Here, we provide the first systematic comparison of redundancy-weighted datasets with non-redundant ones. We test three weighting schemes and show that the distributions of structural features that they produce are smoother (having higher entropy) compared with the distributions inferred from non-redundant datasets. We further show that these smoothed distributions are both more robust and more correct than their non-redundant counterparts. We suggest that the better distributions, inferred using redundancy-weighting, may improve the accuracy of knowledge-based potentials and increase the power of protein structure prediction methods. Consequently, they may enhance model-driven molecular biology. Contact: cheny@il.ibm.com or chen.keasar@gmail.com PMID:24771517

  5. From algae to angiosperms–inferring the phylogeny of green plants (Viridiplantae) from 360 plastid genomes

    PubMed Central

    2014-01-01

    Background Next-generation sequencing has provided a wealth of plastid genome sequence data from an increasingly diverse set of green plants (Viridiplantae). Although these data have helped resolve the phylogeny of numerous clades (e.g., green algae, angiosperms, and gymnosperms), their utility for inferring relationships across all green plants is uncertain. Viridiplantae originated 700-1500 million years ago and may comprise as many as 500,000 species. This clade represents a major source of photosynthetic carbon and contains an immense diversity of life forms, including some of the smallest and largest eukaryotes. Here we explore the limits and challenges of inferring a comprehensive green plant phylogeny from available complete or nearly complete plastid genome sequence data. Results We assembled protein-coding sequence data for 78 genes from 360 diverse green plant taxa with complete or nearly complete plastid genome sequences available from GenBank. Phylogenetic analyses of the plastid data recovered well-supported backbone relationships and strong support for relationships that were not observed in previous analyses of major subclades within Viridiplantae. However, there also is evidence of systematic error in some analyses. In several instances we obtained strongly supported but conflicting topologies from analyses of nucleotides versus amino acid characters, and the considerable variation in GC content among lineages and within single genomes affected the phylogenetic placement of several taxa. Conclusions Analyses of the plastid sequence data recovered a strongly supported framework of relationships for green plants. This framework includes: i) the placement of Zygnematophyceace as sister to land plants (Embryophyta), ii) a clade of extant gymnosperms (Acrogymnospermae) with cycads + Ginkgo sister to remaining extant gymnosperms and with gnetophytes (Gnetophyta) sister to non-Pinaceae conifers (Gnecup trees), and iii) within the monilophyte clade

  6. mStruct: Inference of Population Structure in Light of Both Genetic Admixing and Allele Mutations

    PubMed Central

    Shringarpure, Suyash; Xing, Eric P.

    2009-01-01

    Traditional methods for analyzing population structure, such as the Structure program, ignore the influence of the effect of allele mutations between the ancestral and current alleles of genetic markers, which can dramatically influence the accuracy of the structural estimation of current populations. Studying these effects can also reveal additional information about population evolution such as the divergence time and migration history of admixed populations. We propose mStruct, an admixture of population-specific mixtures of inheritance models that addresses the task of structure inference and mutation estimation jointly through a hierarchical Bayesian framework, and a variational algorithm for inference. We validated our method on synthetic data and used it to analyze the Human Genome Diversity Project–Centre d'Etude du Polymorphisme Humain (HGDP–CEPH) cell line panel of microsatellites and HGDP single-nucleotide polymorphism (SNP) data. A comparison of the structural maps of world populations estimated by mStruct and Structure is presented, and we also report potentially interesting mutation patterns in world populations estimated by mStruct. PMID:19363128

  7. Genome-Wide SNP Discovery, Genotyping and Their Preliminary Applications for Population Genetic Inference in Spotted Sea Bass (Lateolabrax maculatus)

    PubMed Central

    Wang, Juan; Xue, Dong-Xiu; Zhang, Bai-Dong; Li, Yu-Long; Liu, Bing-Jian; Liu, Jin-Xian

    2016-01-01

    Next-generation sequencing and the collection of genome-wide single-nucleotide polymorphisms (SNPs) allow identifying fine-scale population genetic structure and genomic regions under selection. The spotted sea bass (Lateolabrax maculatus) is a non-model species of ecological and commercial importance and widely distributed in northwestern Pacific. A total of 22 648 SNPs was discovered across the genome of L. maculatus by paired-end sequencing of restriction-site associated DNA (RAD-PE) for 30 individuals from two populations. The nucleotide diversity (π) for each population was 0.0028±0.0001 in Dandong and 0.0018±0.0001 in Beihai, respectively. Shallow but significant genetic differentiation was detected between the two populations analyzed by using both the whole data set (FST = 0.0550, P < 0.001) and the putatively neutral SNPs (FST = 0.0347, P < 0.001). However, the two populations were highly differentiated based on the putatively adaptive SNPs (FST = 0.6929, P < 0.001). Moreover, a total of 356 SNPs representing 298 unique loci were detected as outliers putatively under divergent selection by FST-based outlier tests as implemented in BAYESCAN and LOSITAN. Functional annotation of the contigs containing putatively adaptive SNPs yielded hits for 22 of 55 (40%) significant BLASTX matches. Candidate genes for local selection constituted a wide array of functions, including binding, catalytic and metabolic activities, etc. The analyses with the SNPs developed in the present study highlighted the importance of genome-wide genetic variation for inference of population structure and local adaptation in L. maculatus. PMID:27336696

  8. Fastbreak: a tool for analysis and visualization of structural variations in genomic data

    PubMed Central

    2012-01-01

    Genomic studies are now being undertaken on thousands of samples requiring new computational tools that can rapidly analyze data to identify clinically important features. Inferring structural variations in cancer genomes from mate-paired reads is a combinatorially difficult problem. We introduce Fastbreak, a fast and scalable toolkit that enables the analysis and visualization of large amounts of data from projects such as The Cancer Genome Atlas. PMID:23046488

  9. Inferring causal genomic alterations in breast cancer using gene expression data

    PubMed Central

    2011-01-01

    Background One of the primary objectives in cancer research is to identify causal genomic alterations, such as somatic copy number variation (CNV) and somatic mutations, during tumor development. Many valuable studies lack genomic data to detect CNV; therefore, methods that are able to infer CNVs from gene expression data would help maximize the value of these studies. Results We developed a framework for identifying recurrent regions of CNV and distinguishing the cancer driver genes from the passenger genes in the regions. By inferring CNV regions across many datasets we were able to identify 109 recurrent amplified/deleted CNV regions. Many of these regions are enriched for genes involved in many important processes associated with tumorigenesis and cancer progression. Genes in these recurrent CNV regions were then examined in the context of gene regulatory networks to prioritize putative cancer driver genes. The cancer driver genes uncovered by the framework include not only well-known oncogenes but also a number of novel cancer susceptibility genes validated via siRNA experiments. Conclusions To our knowledge, this is the first effort to systematically identify and validate drivers for expression based CNV regions in breast cancer. The framework where the wavelet analysis of copy number alteration based on expression coupled with the gene regulatory network analysis, provides a blueprint for leveraging genomic data to identify key regulatory components and gene targets. This integrative approach can be applied to many other large-scale gene expression studies and other novel types of cancer data such as next-generation sequencing based expression (RNA-Seq) as well as CNV data. PMID:21806811

  10. Structure identification in fuzzy inference using reinforcement learning

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.; Khedkar, Pratap

    1993-01-01

    In our previous work on the GARIC architecture, we have shown that the system can start with surface structure of the knowledge base (i.e., the linguistic expression of the rules) and learn the deep structure (i.e., the fuzzy membership functions of the labels used in the rules) by using reinforcement learning. Assuming the surface structure, GARIC refines the fuzzy membership functions used in the consequents of the rules using a gradient descent procedure. This hybrid fuzzy logic and reinforcement learning approach can learn to balance a cart-pole system and to backup a truck to its docking location after a few trials. In this paper, we discuss how to do structure identification using reinforcement learning in fuzzy inference systems. This involves identifying both surface as well as deep structure of the knowledge base. The term set of fuzzy linguistic labels used in describing the values of each control variable must be derived. In this process, splitting a label refers to creating new labels which are more granular than the original label and merging two labels creates a more general label. Splitting and merging of labels directly transform the structure of the action selection network used in GARIC by increasing or decreasing the number of hidden layer nodes.

  11. Integration of Multiple Genomic and Phenotype Data to Infer Novel miRNA-Disease Associations.

    PubMed

    Shi, Hongbo; Zhang, Guangde; Zhou, Meng; Cheng, Liang; Yang, Haixiu; Wang, Jing; Sun, Jie; Wang, Zhenzhen

    2016-01-01

    MicroRNAs (miRNAs) play an important role in the development and progression of human diseases. The identification of disease-associated miRNAs will be helpful for understanding the molecular mechanisms of diseases at the post-transcriptional level. Based on different types of genomic data sources, computational methods for miRNA-disease association prediction have been proposed. However, individual source of genomic data tends to be incomplete and noisy; therefore, the integration of various types of genomic data for inferring reliable miRNA-disease associations is urgently needed. In this study, we present a computational framework, CHNmiRD, for identifying miRNA-disease associations by integrating multiple genomic and phenotype data, including protein-protein interaction data, gene ontology data, experimentally verified miRNA-target relationships, disease phenotype information and known miRNA-disease connections. The performance of CHNmiRD was evaluated by experimentally verified miRNA-disease associations, which achieved an area under the ROC curve (AUC) of 0.834 for 5-fold cross-validation. In particular, CHNmiRD displayed excellent performance for diseases without any known related miRNAs. The results of case studies for three human diseases (glioblastoma, myocardial infarction and type 1 diabetes) showed that all of the top 10 ranked miRNAs having no known associations with these three diseases in existing miRNA-disease databases were directly or indirectly confirmed by our latest literature mining. All these results demonstrated the reliability and efficiency of CHNmiRD, and it is anticipated that CHNmiRD will serve as a powerful bioinformatics method for mining novel disease-related miRNAs and providing a new perspective into molecular mechanisms underlying human diseases at the post-transcriptional level. CHNmiRD is freely available at http://www.bio-bigdata.com/CHNmiRD. PMID:26849207

  12. Adaptation, Ecology, and Evolution of the Halophilic Stromatolite Archaeon Halococcus hamelinensis Inferred through Genome Analyses

    PubMed Central

    Gudhka, Reema K.; Neilan, Brett A.; Burns, Brendan P.

    2015-01-01

    Halococcus hamelinensis was the first archaeon isolated from stromatolites. These geomicrobial ecosystems are thought to be some of the earliest known on Earth, yet, despite their evolutionary significance, the role of Archaea in these systems is still not well understood. Detailed here is the genome sequencing and analysis of an archaeon isolated from stromatolites. The genome of H. hamelinensis consisted of 3,133,046 base pairs with an average G+C content of 60.08% and contained 3,150 predicted coding sequences or ORFs, 2,196 (68.67%) of which were protein-coding genes with functional assignments and 954 (29.83%) of which were of unknown function. Codon usage of the H. hamelinensis genome was consistent with a highly acidic proteome, a major adaptive mechanism towards high salinity. Amino acid transport and metabolism, inorganic ion transport and metabolism, energy production and conversion, ribosomal structure, and unknown function COG genes were overrepresented. The genome of H. hamelinensis also revealed characteristics reflecting its survival in its extreme environment, including putative genes/pathways involved in osmoprotection, oxidative stress response, and UV damage repair. Finally, genome analyses indicated the presence of putative transposases as well as positive matches of genes of H. hamelinensis against various genomes of Bacteria, Archaea, and viruses, suggesting the potential for horizontal gene transfer. PMID:25709556

  13. Adaptation, ecology, and evolution of the halophilic stromatolite archaeon Halococcus hamelinensis inferred through genome analyses.

    PubMed

    Gudhka, Reema K; Neilan, Brett A; Burns, Brendan P

    2015-01-01

    Halococcus hamelinensis was the first archaeon isolated from stromatolites. These geomicrobial ecosystems are thought to be some of the earliest known on Earth, yet, despite their evolutionary significance, the role of Archaea in these systems is still not well understood. Detailed here is the genome sequencing and analysis of an archaeon isolated from stromatolites. The genome of H. hamelinensis consisted of 3,133,046 base pairs with an average G+C content of 60.08% and contained 3,150 predicted coding sequences or ORFs, 2,196 (68.67%) of which were protein-coding genes with functional assignments and 954 (29.83%) of which were of unknown function. Codon usage of the H. hamelinensis genome was consistent with a highly acidic proteome, a major adaptive mechanism towards high salinity. Amino acid transport and metabolism, inorganic ion transport and metabolism, energy production and conversion, ribosomal structure, and unknown function COG genes were overrepresented. The genome of H. hamelinensis also revealed characteristics reflecting its survival in its extreme environment, including putative genes/pathways involved in osmoprotection, oxidative stress response, and UV damage repair. Finally, genome analyses indicated the presence of putative transposases as well as positive matches of genes of H. hamelinensis against various genomes of Bacteria, Archaea, and viruses, suggesting the potential for horizontal gene transfer. PMID:25709556

  14. Paleolithic Contingent in Modern Japanese: Estimation and Inference using Genome-wide Data

    PubMed Central

    He, Yungang; Wang, Wei R.; Xu, Shuhua; Jin, Li; SNP Consortium, Pan-Asia

    2012-01-01

    The genetic origins of Japanese populations have been controversial. Upper Paleolithic Japanese, i.e. Jomon, developed independently in Japanese islands for more than 10,000 years until the isolation was ended with the influxes of continental immigrants about 2,000 years ago. However, the knowledge of origin of Jomon and its contribution to the genetic pool of contemporary Japanese is still limited, albeit the extensive studies using mtDNA and Y chromosomes. In this report, we aimed to infer the origin of Jomon and to estimate its contribution to Japanese by fitting an admixture model with missing data from Jomon to a genome-wide data from 94 worldwide populations. Our results showed that the genetic contributions of Jomon, the Paleolithic contingent in Japanese, are 54.3∼62.3% in Ryukyuans and 23.1∼39.5% in mainland Japanese, respectively. Utilizing inferred allele frequencies of the Jomon population, we further showed the Paleolithic contingent in Japanese had a Northeast Asia origin. PMID:22482036

  15. Inferring species divergence times using pairwise sequential Markovian coalescent modelling and low-coverage genomic data.

    PubMed

    Cahill, James A; Soares, André E R; Green, Richard E; Shapiro, Beth

    2016-07-19

    Understanding when species diverged aids in identifying the drivers of speciation, but the end of gene flow between populations can be difficult to ascertain from genetic data. We explore the use of pairwise sequential Markovian coalescent (PSMC) modelling to infer the timing of divergence between species and populations. PSMC plots generated using artificial hybrid genomes show rapid increases in effective population size at the time when the two parent lineages diverge, and this approach has been used previously to infer divergence between human lineages. We show that, even without high coverage or phased input data, PSMC can detect the end of significant gene flow between populations by comparing the PSMC output from artificial hybrids to the output of simulations with known demographic histories. We then apply PSMC to detect divergence times among lineages within two real datasets: great apes and bears within the genus Ursus Our results confirm most previously proposed divergence times for these lineages, and suggest that gene flow between recently diverged lineages may have been common among bears and great apes, including up to one million years of continued gene flow between chimpanzees and bonobos after the formation of the Congo River.This article is part of the themed issue 'Dating species divergences using rocks and clocks'. PMID:27325835

  16. Inferring Selection Intensity and Allele Age from Multilocus Haplotype Structure

    PubMed Central

    Chen, Hua; Slatkin, Montgomery

    2013-01-01

    It is a challenging task to infer selection intensity and allele age from population genetic data. Here we present a method that can efficiently estimate selection intensity and allele age from the multilocus haplotype structure in the vicinity of a segregating mutant under positive selection. We use a structured-coalescent approach to model the effect of directional selection on the gene genealogies of neutral markers linked to the selected mutant. The frequency trajectory of the selected allele follows the Wright-Fisher model. Given the position of the selected mutant, we propose a simplified multilocus haplotype model that can efficiently model the dynamics of the ancestral haplotypes under the joint influence of selection and recombination. This model approximates the ancestral genealogies of the sample, which reduces the number of states from an exponential function of the number of single-nucleotide polymorphism loci to a quadratic function. That allows parameter inference from data covering DNA regions as large as several hundred kilo-bases. Importance sampling algorithms are adopted to evaluate the probability of a sample by exploring the space of both allele frequency trajectories of the selected mutation and gene genealogies of the linked sites. We demonstrate by simulation that the method can accurately estimate selection intensity for moderate and strong positive selection. We apply the method to a data set of the G6PD gene in an African population and obtain an estimate of 0.0456 (95% confidence interval 0.0144−0.0769) for the selection intensity. The proposed method is novel in jointly modeling the multilocus haplotype pattern caused by recombination and mutation, allowing the analysis of haplotype data in recombining regions. Moreover, the method is applicable to data from populations under exponential growth and a variety of other demographic histories. PMID:23797107

  17. Genome-Scale Phylogenetics: Inferring the Plant Tree of Life from 18,896 Gene Trees

    PubMed Central

    Burleigh, J. Gordon; Bansal, Mukul S.; Eulenstein, Oliver; Hartmann, Stefanie; Wehe, André; Vision, Todd J.

    2011-01-01

    Phylogenetic analyses using genome-scale data sets must confront incongruence among gene trees, which in plants is exacerbated by frequent gene duplications and losses. Gene tree parsimony (GTP) is a phylogenetic optimization criterion in which a species tree that minimizes the number of gene duplications induced among a set of gene trees is selected. The run time performance of previous implementations has limited its use on large-scale data sets. We used new software that incorporates recent algorithmic advances to examine the performance of GTP on a plant data set consisting of 18,896 gene trees containing 510,922 protein sequences from 136 plant taxa (giving a combined alignment length of >2.9 million characters). The relationships inferred from the GTP analysis were largely consistent with previous large-scale studies of backbone plant phylogeny and resolved some controversial nodes. The placement of taxa that were present in few gene trees generally varied the most among GTP bootstrap replicates. Excluding these taxa either before or after the GTP analysis revealed high levels of phylogenetic support across plants. The analyses supported magnoliids sister to a eudicot + monocot clade and did not support the eurosid I and II clades. This study presents a nuclear genomic perspective on the broad-scale phylogenic relationships among plants, and it demonstrates that nuclear genes with a history of duplication and loss can be phylogenetically informative for resolving the plant tree of life. PMID:21186249

  18. ecoPrimers: inference of new DNA barcode markers from whole genome sequence analysis

    PubMed Central

    Riaz, Tiayyba; Shehzad, Wasim; Viari, Alain; Pompanon, François; Taberlet, Pierre; Coissac, Eric

    2011-01-01

    Using non-conventional markers, DNA metabarcoding allows biodiversity assessment from complex substrates. In this article, we present ecoPrimers, a software for identifying new barcode markers and their associated PCR primers. ecoPrimers scans whole genomes to find such markers without a priori knowledge. ecoPrimers optimizes two quality indices measuring taxonomical range and discrimination to select the most efficient markers from a set of reference sequences, according to specific experimental constraints such as marker length or specifically targeted taxa. The key step of the algorithm is the identification of conserved regions among reference sequences for anchoring primers. We propose an efficient algorithm based on data mining, that allows the analysis of huge sets of sequences. We evaluate the efficiency of ecoPrimers by running it on three different sequence sets: mitochondrial, chloroplast and bacterial genomes. Identified barcode markers correspond either to barcode regions already in use for plants or animals, or to new potential barcodes. Results from empirical experiments carried out on a promising new barcode for analyzing vertebrate diversity fully agree with expectations based on bioinformatics analysis. These tests demonstrate the efficiency of ecoPrimers for inferring new barcodes fitting with diverse experimental contexts. ecoPrimers is available as an open source project at: http://www.grenoble.prabi.fr/trac/ecoPrimers. PMID:21930509

  19. Inferring Quantitative Trait Pathways Associated with Bull Fertility from a Genome-Wide Association Study

    PubMed Central

    Peñagaricano, Francisco; Weigel, Kent A.; Rosa, Guilherme J. M.; Khatib, Hasan

    2013-01-01

    Whole-genome association studies typically focus on genetic markers with the strongest evidence of association. However, single markers often explain only a small component of the genetic variance and hence offer a limited understanding of the trait under study. As such, the objective of this study was to perform a pathway-based association analysis in Holstein dairy cattle in order to identify relevant pathways involved in bull fertility. The results of a single-marker association analysis, using 1,755 bulls with sire conception rate data and genotypes for 38,650 single nucleotide polymorphisms (SNPs), were used in this study. A total of 16,819 annotated genes, including 2,767 significantly associated with bull fertility, were used to interrogate a total of 662 Gene Ontology (GO) terms and 248 InterPro (IP) entries using a test of proportions based on the cumulative hypergeometric distribution. After multiple-testing correction, 20 GO categories and one IP entry showed significant overrepresentation of genes statistically associated with bull fertility. Several of these functional categories such as small GTPases mediated signal transduction, neurogenesis, calcium ion binding, and cytoskeleton are known to be involved in biological processes closely related to male fertility. These results could provide insight into the genetic architecture of this complex trait in dairy cattle. In addition, this study shows that quantitative trait pathways inferred from single-marker analyses could enhance our interpretations of the results of genome-wide association studies. PMID:23335935

  20. Genomic heterogeneity of historical gene flow between two species of newts inferred from transcriptome data.

    PubMed

    Stuglik, Michał T; Babik, Wiesław

    2016-07-01

    The role of gene flow in species formation is a major unresolved issue in speciation biology. Progress in this area requires information on the long-term patterns of gene flow between diverging species. Here, we used thousands of single-nucleotide polymorphisms derived from transcriptome resequencing and a method modeling the joint frequency spectrum of these polymorphisms to reconstruct patterns of historical gene flow between two Lissotriton newts: L. vulgaris (Lv) and L. montandoni (Lm). We tested several models of divergence including complete isolation and various scenarios of historical gene flow. The model of secondary contact received the highest support. According to this model, the species split from their common ancestor ca. 5.5 million years (MY) ago, evolved in isolation for ca. 2 MY, and have been exchanging genes for the last 3.5 MY Demographic changes have been inferred in both species, with the current effective population size of ca. 0.7 million in Lv and 0.2 million in Lm. The postdivergence gene flow resulted in two-directional introgression which affected the genomes of both species, but was more pronounced from Lv to Lm. Interestingly, we found evidence for genomic heterogeneity of interspecific gene flow. This study demonstrates the complexity of long-term gene flow between distinct but incompletely reproductively isolated taxa which divergence was initiated millions of years ago. PMID:27386093

  1. Orthology Inference in Nonmodel Organisms Using Transcriptomes and Low-Coverage Genomes: Improving Accuracy and Matrix Occupancy for Phylogenomics

    PubMed Central

    Yang, Ya; Smith, Stephen A.

    2014-01-01

    Orthology inference is central to phylogenomic analyses. Phylogenomic data sets commonly include transcriptomes and low-coverage genomes that are incomplete and contain errors and isoforms. These properties can severely violate the underlying assumptions of orthology inference with existing heuristics. We present a procedure that uses phylogenies for both homology and orthology assignment. The procedure first uses similarity scores to infer putative homologs that are then aligned, constructed into phylogenies, and pruned of spurious branches caused by deep paralogs, misassembly, frameshifts, or recombination. These final homologs are then used to identify orthologs. We explore four alternative tree-based orthology inference approaches, of which two are new. These accommodate gene and genome duplications as well as gene tree discordance. We demonstrate these methods in three published data sets including the grape family, Hymenoptera, and millipedes with divergence times ranging from approximately 100 to over 400 Ma. The procedure significantly increased the completeness and accuracy of the inferred homologs and orthologs. We also found that data sets that are more recently diverged and/or include more high-coverage genomes had more complete sets of orthologs. To explicitly evaluate sources of conflicting phylogenetic signals, we applied serial jackknife analyses of gene regions keeping each locus intact. The methods described here can scale to over 100 taxa. They have been implemented in python with independent scripts for each step, making it easy to modify or incorporate them into existing pipelines. All scripts are available from https://bitbucket.org/yangya/phylogenomic_dataset_construction. PMID:25158799

  2. Orthology inference in nonmodel organisms using transcriptomes and low-coverage genomes: improving accuracy and matrix occupancy for phylogenomics.

    PubMed

    Yang, Ya; Smith, Stephen A

    2014-11-01

    Orthology inference is central to phylogenomic analyses. Phylogenomic data sets commonly include transcriptomes and low-coverage genomes that are incomplete and contain errors and isoforms. These properties can severely violate the underlying assumptions of orthology inference with existing heuristics. We present a procedure that uses phylogenies for both homology and orthology assignment. The procedure first uses similarity scores to infer putative homologs that are then aligned, constructed into phylogenies, and pruned of spurious branches caused by deep paralogs, misassembly, frameshifts, or recombination. These final homologs are then used to identify orthologs. We explore four alternative tree-based orthology inference approaches, of which two are new. These accommodate gene and genome duplications as well as gene tree discordance. We demonstrate these methods in three published data sets including the grape family, Hymenoptera, and millipedes with divergence times ranging from approximately 100 to over 400 Ma. The procedure significantly increased the completeness and accuracy of the inferred homologs and orthologs. We also found that data sets that are more recently diverged and/or include more high-coverage genomes had more complete sets of orthologs. To explicitly evaluate sources of conflicting phylogenetic signals, we applied serial jackknife analyses of gene regions keeping each locus intact. The methods described here can scale to over 100 taxa. They have been implemented in python with independent scripts for each step, making it easy to modify or incorporate them into existing pipelines. All scripts are available from https://bitbucket.org/yangya/phylogenomic_dataset_construction. PMID:25158799

  3. PICARA, an analytical pipeline providing probabilistic inference about a priori candidates genes underlying genome-wide association QTL in plants

    Technology Transfer Automated Retrieval System (TEKTRAN)

    PICARA is an analytical pipeline designed to systematically summarize observed SNP/trait associations identified by genome wide association studies (GWAS) and to identify candidate genes involved in the regulation of complex trait variation. The pipeline provides probabilistic inference about a prio...

  4. Structural Inference in the Art of Violin Making.

    NASA Astrophysics Data System (ADS)

    Morse-Fortier, Leonard Joseph

    The "secrets" of success of early Italian violins have long been sought. Among their many efforts to reproduce the results of Stradiveri, Guarneri, and Amati, luthiers have attempted to order and match natural resonant frequencies in the free violin plates. This tap-tone plate tuning technique is simply an eigenvalue extraction scheme. In the final stages of carving, the violin maker complements considerable intuitive knowledge of violin plate structure and of modal attributes with tap-tone frequency estimates to better understand plate structure and to inform decisions about plate carving and completeness. Examining the modal attributes of violin plates, this work develops and incorporates an impulse-response scheme for modal inference, measures resonant frequencies and modeshapes for a pair of violin plates, and presents modeshapes through a unique computer visualization scheme developed specifically for this purpose. The work explores, through simple examples questions of how plate modal attributes reflect underlying structure, and questions about the so -called evolution of modeshapes and frequencies through assembly of the violin. Separately, the work develops computer code for a carved, anisotropic, plate/shell finite element. Solutions are found to the static displacement and free-vibration eigenvalue problems for an orthotropic plate, and used to verify element accuracy. Finally, a violin back plate is modelled with full consideration of plate thickness and arching. Model estimates for modal attributes compare very well against experimentally acquired values. Finally, the modal synthesis technique is applied to predicting the modal attributes of the violin top plate with ribs attached from those of the top plate alone, and with an estimate of rib mass and stiffness. This last analysis serves to verify the modal synthesis method, and to quantify its limits of applicability in attempting to solve problems with severe structural modification. Conclusions

  5. Effect of sampling on the extent and accuracy of the inferred genetic history of recombining genome.

    PubMed

    Platt, Daniel E; Utro, Filippo; Parida, Laxmi

    2014-06-01

    Accessible biotechnology is enabling the cataloging of genetic variants in individuals in populations at unprecedented scales. The use of phylogeny of the individuals within populations allows a model-based approach to studying these variations, which is important in understanding relationships between and across populations. For the somatic genome, however, the phylogeny must take recombinations (and other genetic mixing events) into account. Hence the resulting topology is more complex than a tree. Unlike a tree topology, it is not as apparent which events are visible from the extant samples. An earlier work presented a mathematical model (called the minimal descriptor) for teasing apart the inherent visible information from that which any specific algorithm might see. We use this framework to study the effect of sampling sizes on the overall inferred genetic history. In this paper, we seek to understand the extent, characteristics (in terms of recent versus ancient genetic events) and reliability of what was resolvable within field samples drawn from modern populations. We observed that most of the visible ancient events are recoverable from relatively small sample sizes. However, without identification of this relatively small minority of ancient genetic events, most of the signal will appear to reflect modern events and admixtures. We also found that the more ancient events are likely to be reproduced with higher fidelity between multiple samplings, and that the identified older events are less likely to yield false positive discrimination between populations. We conclude that a recombinant phylogenetic reconstruction is necessary to identify which markers are most likely to discriminate ancient events, and to discriminate between populations with lower risk of false positives. Secondly, on a broader note, this study also provides a general methodology for a critical assessment of the inferred common genetic history of populations (say, in plant cultivars or

  6. Genome Structure Gallery from the Mycobacterium Tuberculosis Structual Genomics Consortium

    DOE Data Explorer

    The TB Structural Genomics Consortium works with the structures of proteins from M. tuberculosis, analyzing these structures in the context of functional information that currently exists and that the Consortium generates. The database of linked structural and functional information constructed from this project will form a lasting basis for understanding M. tuberculosis pathogenesis and for structure-based drug design. The Consortium's structural and functional information is publicly available. The Structures Gallery makes more than 650 total structures available by PDB identifier. Some of these are not consortium targets, but all are viewable in 3D color and can be manipulated in various ways by Jmol, an open-source Java viewer for chemical structures in 3D from http://www.jmol.org/

  7. msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding

    PubMed Central

    Gilad, Yoav; Pritchard, Jonathan K.; Stephens, Matthew

    2015-01-01

    Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at http://rajanil.github.io/msCentipede. PMID:26406244

  8. msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding.

    PubMed

    Raj, Anil; Shim, Heejung; Gilad, Yoav; Pritchard, Jonathan K; Stephens, Matthew

    2015-01-01

    Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at http://rajanil.github.io/msCentipede. PMID:26406244

  9. Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data

    PubMed Central

    Bhaskar, Anand; Wang, Y.X. Rachel; Song, Yun S.

    2015-01-01

    With the recent increase in study sample sizes in human genetics, there has been growing interest in inferring historical population demography from genomic variation data. Here, we present an efficient inference method that can scale up to very large samples, with tens or hundreds of thousands of individuals. Specifically, by utilizing analytic results on the expected frequency spectrum under the coalescent and by leveraging the technique of automatic differentiation, which allows us to compute gradients exactly, we develop a very efficient algorithm to infer piecewise-exponential models of the historical effective population size from the distribution of sample allele frequencies. Our method is orders of magnitude faster than previous demographic inference methods based on the frequency spectrum. In addition to inferring demography, our method can also accurately estimate locus-specific mutation rates. We perform extensive validation of our method on simulated data and show that it can accurately infer multiple recent epochs of rapid exponential growth, a signal that is difficult to pick up with small sample sizes. Lastly, we use our method to analyze data from recent sequencing studies, including a large-sample exome-sequencing data set of tens of thousands of individuals assayed at a few hundred genic regions. PMID:25564017

  10. Inference of upper-mantle density structure from seismic velocities

    NASA Astrophysics Data System (ADS)

    Nettles, M.; Dziewonski, A. M.

    2005-12-01

    The inverse problem for the determination of density structure from perturbations in the gravity field is highly nonunique. The combination of gravity data and other observables can, however, be used to make inferences about the Earth's density structure in three dimensions. We use the three-dimensional shear-wave velocity model of Nettles and Dziewonski (2005) to make a forward prediction of the Earth's gravity field using simple assumptions about the relationship between perturbations in shear velocity and density. A scaling factor f=0.25 relating perturbations in shear velocity and density (δ/ρρ = f · δvS/vS) is determined empirically by comparison of observed variations in shear velocity in oceanic regions with density variations predicted from a simple model of conductive cooling. This value agrees well with f=0.27 based on the laboratory results of Jackson et al. (1992). The observed gravity signal in the oceans is explained well by this simple thermal-scaling approach. Behavior in some continental regions, such as the Basin and Range and the East African rift zone, is found to be similar to that in the oceans: the high topography in these regions appears to be supported by hot, low-density mantle underneath, a result also found by Kaban and Mooney (2001) for the Basin and Range. A velocity-to-density scaling relationship based only on thermal considerations is clearly inadequate in regions of continental craton, where such scaling leads to unrealistically large perturbations in the predicted gravity field. This result suggests that non-thermal effects must counteract the high density that would occur due to thermal effects alone, consistent with the suggestion of Jordan (1975) and other workers that density increases due to cool temperatures in the continental roots must be balanced by density decreases due to compositional variations. Using the compositional derivatives for density and shear velocity with respect to Mg# determined by Lee (2003), and an

  11. Genome-wide Membrane Protein Structure Prediction

    PubMed Central

    Piccoli, Stefano; Suku, Eda; Garonzi, Marianna; Giorgetti, Alejandro

    2013-01-01

    Transmembrane proteins allow cells to extensively communicate with the external world in a very accurate and specific way. They form principal nodes in several signaling pathways and attract large interest in therapeutic intervention, as the majority pharmaceutical compounds target membrane proteins. Thus, according to the current genome annotation methods, a detailed structural/functional characterization at the protein level of each of the elements codified in the genome is also required. The extreme difficulty in obtaining high-resolution three-dimensional structures, calls for computational approaches. Here we review to which extent the efforts made in the last few years, combining the structural characterization of membrane proteins with protein bioinformatics techniques, could help describing membrane proteins at a genome-wide scale. In particular we analyze the use of comparative modeling techniques as a way of overcoming the lack of high-resolution three-dimensional structures in the human membrane proteome. PMID:24403851

  12. A physical map for the Amborella trichopoda genome sheds light on the evolution of angiosperm genome structure

    PubMed Central

    2011-01-01

    Background Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome. Results Analysis of Amborella BAC ends sequenced from each contig suggests that the density of long terminal repeat retrotransposons is negatively correlated with that of protein coding genes. Syntenic, presumably ancestral, gene blocks were identified in comparisons of the Amborella BAC contigs and the sequenced Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa genomes. Parsimony mapping of the loss of synteny corroborates previous analyses suggesting that the rate of structural change has been more rapid on lineages leading to Arabidopsis and Oryza compared with lineages leading to Populus and Vitis. The gamma paleohexiploidy event identified in the Arabidopsis, Populus and Vitis genomes is shown to have occurred after the divergence of all other known angiosperms from the lineage leading to Amborella. Conclusions When placed in the context of a physical map, BAC end sequences representing just 5.4% of the Amborella genome have facilitated reconstruction of gene blocks that existed in the last common ancestor of all flowering plants. The Amborella genome is an invaluable reference for inferences concerning the ancestral angiosperm and subsequent genome evolution. PMID:21619600

  13. The enzymatic nature of an anonymous protein sequence cannot reliably be inferred from superfamily level structural information alone.

    PubMed

    Roche, Daniel Barry; Brüls, Thomas

    2015-05-01

    As the largest fraction of any proteome does not carry out enzymatic functions, and in order to leverage 3D structural data for the annotation of increasingly higher volumes of sequence data, we wanted to assess the strength of the link between coarse grained structural data (i.e., homologous superfamily level) and the enzymatic versus non-enzymatic nature of protein sequences. To probe this relationship, we took advantage of 41 phylogenetically diverse (encompassing 11 distinct phyla) genomes recently sequenced within the GEBA initiative, for which we integrated structural information, as defined by CATH, with enzyme level information, as defined by Enzyme Commission (EC) numbers. This analysis revealed that only a very small fraction (about 1%) of domain sequences occurring in the analyzed genomes was found to be associated with homologous superfamilies strongly indicative of enzymatic function. Resorting to less stringent criteria to define enzyme versus non-enzyme biased structural classes or excluding highly prevalent folds from the analysis had only modest effect on this proportion. Thus, the low genomic coverage by structurally anchored protein domains strongly associated to catalytic activities indicates that, on its own, the power of coarse grained structural information to infer the general property of being an enzyme is rather limited. PMID:25559918

  14. Genome at Juncture of Early Human Migration: A Systematic Analysis of Two Whole Genomes and Thirteen Exomes from Kuwaiti Population Subgroup of Inferred Saudi Arabian Tribe Ancestry

    PubMed Central

    Alsmadi, Osama; Hebbar, Prashantha; Antony, Dinu; Behbehani, Kazem; Thanaraj, Thangavel Alphonse

    2014-01-01

    Population of the State of Kuwait is composed of three genetic subgroups of inferred Persian, Saudi Arabian tribe and Bedouin ancestry. The Saudi Arabian tribe subgroup traces its origin to the Najd region of Saudi Arabia. By sequencing two whole genomes and thirteen exomes from this subgroup at high coverage (>40X), we identify 4,950,724 Single Nucleotide Polymorphisms (SNPs), 515,802 indels and 39,762 structural variations. Of the identified variants, 10,098 (8.3%) exomic SNPs, 139,923 (2.9%) non-exomic SNPs, 5,256 (54.3%) exomic indels, and 374,959 (74.08%) non-exomic indels are ‘novel’. Up to 8,070 (79.9%) of the reported novel biallelic exomic SNPs are seen in low frequency (minor allele frequency <5%). We observe 5,462 known and 1,004 novel potentially deleterious nonsynonymous SNPs. Allele frequencies of common SNPs from the 15 exomes is significantly correlated with those from genotype data of a larger cohort of 48 individuals (Pearson correlation coefficient, 0.91; p <2.2×10−16). A set of 2,485 SNPs show significantly different allele frequencies when compared to populations from other continents. Two notable variants having risk alleles in high frequencies in this subgroup are: a nonsynonymous deleterious SNP (rs2108622 [19:g.15990431C>T] from CYP4F2 gene [MIM:*604426]) associated with warfarin dosage levels [MIM:#122700] required to elicit normal anticoagulant response; and a 3′ UTR SNP (rs6151429 [22:g.51063477T>C]) from ARSA gene [MIM:*607574]) associated with Metachromatic Leukodystrophy [MIM:#250100]. Hemoglobin Riyadh variant (identified for the first time in a Saudi Arabian woman) is observed in the exome data. The mitochondrial haplogroup profiles of the 15 individuals are consistent with the haplogroup diversity seen in Saudi Arabian natives, who are believed to have received substantial gene flow from Africa and eastern provenance. We present the first genome resource imperative for designing future genetic studies in Saudi Arabian

  15. The fractal structure of the mitochondrial genomes

    NASA Astrophysics Data System (ADS)

    Oiwa, Nestor N.; Glazier, James A.

    2002-08-01

    The mitochondrial DNA genome has a definite multifractal structure. We show that loops, hairpins and inverted palindromes are responsible for this self-similarity. We can thus establish a definite relation between the function of subsequences and their fractal dimension. Intriguingly, protein coding DNAs also exhibit palindromic structures, although they do not appear in the sequence of amino acids. These structures may reflect the stabilization and transcriptional control of DNA or the control of posttranscriptional editing of mRNA.

  16. Robust Inference of Population Structure for Ancestry Prediction and Correction of Stratification in the Presence of Relatedness

    PubMed Central

    Conomos, Matthew P.; Miller, Mike; Thornton, Timothy

    2016-01-01

    Population structure inference with genetic data has been motivated by a variety of applications in population genetics and genetic association studies. Several approaches have been proposed for the identification of genetic ancestry differences in samples where study participants are assumed to be unrelated, including principal components analysis (PCA), multi-dimensional scaling (MDS), and model-based methods for proportional ancestry estimation. Many genetic studies, however, include individuals with some degree of relatedness, and existing methods for inferring genetic ancestry fail in related samples. We present a method, PC-AiR, for robust population structure inference in the presence of known or cryptic relatedness. PC-AiR utilizes genome-screen data and an efficient algorithm to identify a diverse subset of unrelated individuals that is representative of all ancestries in the sample. The PC-AiR method directly performs PCA on the identified ancestry representative subset and then predicts components of variation for all remaining individuals based on genetic similarities. In simulation studies and in applications to real data from Phase III of the HapMap Project, we demonstrate that PC-AiR provides a substantial improvement over existing approaches for population structure inference in related samples. We also demonstrate significant efficiency gains, where a single axis of variation from PC-AiR provides better prediction of ancestry in a variety of structure settings than using ten (or more) components of variation from widely used PCA and MDS approaches. Finally, we illustrate that PC-AiR can provide improved population stratification correction over existing methods in genetic association studies with population structure and relatedness. PMID:25810074

  17. Bayesian statistical treatment of the fluorescence of AFLP bands leads to accurate genetic structure inference.

    PubMed

    Gaggiotti, Oscar E

    2010-11-01

    Ever since the introduction of allozymes in the 1960s, evolutionary biologists and ecologists have continued to search for more powerful molecular markers to estimate important parameters such as effective population size and migration rates and to make inferences about the demographic history of populations, the relationships between individuals and the genetic architecture of phenotypic variation (Bensch & Akesson 2005; Bonin et al. 2007). Choosing a marker requires a thorough consideration of the trade-offs associated with the different techniques and the type of data obtained from them. Some markers can be very informative but require substantial amounts of start-up time (e.g. microsatellites), while others require very little time but are much less polymorphic. Amplified fragment length polymorphism (AFLP) is a firmly established molecular marker technique that falls in this latter category. AFLPs are widely distributed throughout the genome and can be used on organisms for which there is no a priori sequence information (Meudt & Clarke 2007). These properties together with their moderate cost and short start-up time have made them the method of choice for many molecular ecology studies of wild species (Bensch & Akesson 2005). However, they have a major disadvantage, they are dominant. This represents a very important limitation because many statistical genetics methods appropriate for molecular ecology studies require the use of codominant markers. In this issue, Foll et al. (2010) present an innovative hierarchical Bayesian method that overcomes this limitation. The proposed approach represents a comprehensive statistical treatment of the fluorescence of AFLP bands and leads to accurate inferences about the genetic structure of natural populations. Besides allowing a quasi-codominant treatment of AFLPs, this new method also solves the difficult problems posed by subjectivity in the scoring of AFLP bands. PMID:20958811

  18. Genome Structure of the Legume, Lotus japonicus

    PubMed Central

    Sato, Shusei; Nakamura, Yasukazu; Kaneko, Takakazu; Asamizu, Erika; Kato, Tomohiko; Nakao, Mitsuteru; Sasamoto, Shigemi; Watanabe, Akiko; Ono, Akiko; Kawashima, Kumiko; Fujishiro, Tsunakazu; Katoh, Midori; Kohara, Mitsuyo; Kishida, Yoshie; Minami, Chiharu; Nakayama, Shinobu; Nakazaki, Naomi; Shimizu, Yoshimi; Shinpo, Sayaka; Takahashi, Chika; Wada, Tsuyuko; Yamada, Manabu; Ohmido, Nobuko; Hayashi, Makoto; Fukui, Kiichi; Baba, Tomoya; Nakamichi, Tomoko; Mori, Hirotada; Tabata, Satoshi

    2008-01-01

    The legume Lotus japonicus has been widely used as a model system to investigate the genetic background of legume-specific phenomena such as symbiotic nitrogen fixation. Here, we report structural features of the L. japonicus genome. The 315.1-Mb sequences determined in this and previous studies correspond to 67% of the genome (472 Mb), and are likely to cover 91.3% of the gene space. Linkage mapping anchored 130-Mb sequences onto the six linkage groups. A total of 10 951 complete and 19 848 partial structures of protein-encoding genes were assigned to the genome. Comparative analysis of these genes revealed the expansion of several functional domains and gene families that are characteristic of L. japonicus. Synteny analysis detected traces of whole-genome duplication and the presence of synteny blocks with other plant genomes to various degrees. This study provides the first opportunity to look into the complex and unique genetic system of legumes. PMID:18511435

  19. Using Genomics for Natural Product Structure Elucidation.

    PubMed

    Tietz, Jonathan I; Mitchell, Douglas A

    2016-01-01

    Natural products (NPs) are the most historically bountiful source of chemical matter for drug development-especially for anti-infectives. With insights gleaned from genome mining, interest in natural product discovery has been reinvigorated. An essential stage in NP discovery is structural elucidation, which sheds light not only on the chemical composition of a molecule but also its novelty, properties, and derivatization potential. The history of structure elucidation is replete with techniquebased revolutions: combustion analysis, crystallography, UV, IR, MS, and NMR have each provided game-changing advances; the latest such advance is genomics. All natural products have a genetic basis, and the ability to obtain and interpret genomic information for structure elucidation is increasingly available at low cost to non-specialists. In this review, we describe the value of genomics as a structural elucidation technique, especially from the perspective of the natural product chemist approaching an unknown metabolite. Herein we first introduce the databases and programs of interest to the natural products chemist, with an emphasis on those currently most suited for general usability. We describe strategies for linking observed natural product-linked phenotypes to their corresponding gene clusters. We then discuss techniques for extracting structural information from genes, illustrated with numerous case examples. We also provide an analysis of the biases and limitations of the field with recommendations for future development. Our overview is not only aimed at biologically-oriented researchers already at ease with bioinformatic techniques, but also, in particular, at natural product, organic, and/or medicinal chemists not previously familiar with genomic techniques. PMID:26456468

  20. Inference of the Transcriptional Regulatory Network in Staphylococcus aureus by Integration of Experimental and Genomics-Based Evidence▿†

    PubMed Central

    Ravcheev, Dmitry A.; Best, Aaron A.; Tintle, Nathan; DeJongh, Matthew; Osterman, Andrei L.; Novichkov, Pavel S.; Rodionov, Dmitry A.

    2011-01-01

    Transcriptional regulatory networks are fine-tuned systems that help microorganisms respond to changes in the environment and cell physiological state. We applied the comparative genomics approach implemented in the RegPredict Web server combined with SEED subsystem analysis and available information on known regulatory interactions for regulatory network reconstruction for the human pathogen Staphylococcus aureus and six related species from the family Staphylococcaceae. The resulting reference set of 46 transcription factor regulons contains more than 1,900 binding sites and 2,800 target genes involved in the central metabolism of carbohydrates, amino acids, and fatty acids; respiration; the stress response; metal homeostasis; drug and metal resistance; and virulence. The inferred regulatory network in S. aureus includes ∼320 regulatory interactions between 46 transcription factors and ∼550 candidate target genes comprising 20% of its genome. We predicted ∼170 novel interactions and 24 novel regulons for the control of the central metabolic pathways in S. aureus. The reconstructed regulons are largely variable in the Staphylococcaceae: only 20% of S. aureus regulatory interactions are conserved across all studied genomes. We used a large-scale gene expression data set for S. aureus to assess relationships between the inferred regulons and gene expression patterns. The predicted reference set of regulons is captured within the Staphylococcus collection in the RegPrecise database (http://regprecise.lbl.gov). PMID:21531804

  1. A data management system for structural genomics.

    PubMed

    Raymond, Stéphane; O'Toole, Nicholas; Cygler, Miroslaw

    2004-06-21

    BACKGROUND: Structural genomics (SG) projects aim to determine thousands of protein structures by the development of high-throughput techniques for all steps of the experimental structure determination pipeline. Crucial to the success of such endeavours is the careful tracking and archiving of experimental and external data on protein targets. RESULTS: We have developed a sophisticated data management system for structural genomics. Central to the system is an Oracle-based, SQL-interfaced database. The database schema deals with all facets of the structure determination process, from target selection to data deposition. Users access the database via any web browser. Experimental data is input by users with pre-defined web forms. Data can be displayed according to numerous criteria. A list of all current target proteins can be viewed, with links for each target to associated entries in external databases. To avoid unnecessary work on targets, our data management system matches protein sequences weekly using BLAST to entries in the Protein Data Bank and to targets of other SG centers worldwide. CONCLUSION: Our system is a working, effective and user-friendly data management tool for structural genomics projects. In this report we present a detailed summary of the various capabilities of the system, using real target data as examples, and indicate our plans for future enhancements. PMID:15210054

  2. Demographic Divergence History of Pied Flycatcher and Collared Flycatcher Inferred from Whole-Genome Re-sequencing Data

    PubMed Central

    Nadachowska-Brzyska, Krystyna; Burri, Reto; Olason, Pall I.; Kawakami, Takeshi; Smeds, Linnéa; Ellegren, Hans

    2013-01-01

    Profound knowledge of demographic history is a prerequisite for the understanding and inference of processes involved in the evolution of population differentiation and speciation. Together with new coalescent-based methods, the recent availability of genome-wide data enables investigation of differentiation and divergence processes at unprecedented depth. We combined two powerful approaches, full Approximate Bayesian Computation analysis (ABC) and pairwise sequentially Markovian coalescent modeling (PSMC), to reconstruct the demographic history of the split between two avian speciation model species, the pied flycatcher and collared flycatcher. Using whole-genome re-sequencing data from 20 individuals, we investigated 15 demographic models including different levels and patterns of gene flow, and changes in effective population size over time. ABC provided high support for recent (mode 0.3 my, range <0.7 my) species divergence, declines in effective population size of both species since their initial divergence, and unidirectional recent gene flow from pied flycatcher into collared flycatcher. The estimated divergence time and population size changes, supported by PSMC results, suggest that the ancestral species persisted through one of the glacial periods of middle Pleistocene and then split into two large populations that first increased in size before going through severe bottlenecks and expanding into their current ranges. Secondary contact appears to have been established after the last glacial maximum. The severity of the bottlenecks at the last glacial maximum is indicated by the discrepancy between current effective population sizes (20,000–80,000) and census sizes (5–50 million birds) of the two species. The recent divergence time challenges the supposition that avian speciation is a relatively slow process with extended times for intrinsic postzygotic reproductive barriers to evolve. Our study emphasizes the importance of using genome-wide data to

  3. Inferring Selective Constraint from Population Genomic Data Suggests Recent Regulatory Turnover in the Human Brain.

    PubMed

    Schrider, Daniel R; Kern, Andrew D

    2015-12-01

    The comparative genomics revolution of the past decade has enabled the discovery of functional elements in the human genome via sequence comparison. While that is so, an important class of elements, those specific to humans, is entirely missed by searching for sequence conservation across species. Here we present an analysis based on variation data among human genomes that utilizes a supervised machine learning approach for the identification of human-specific purifying selection in the genome. Using only allele frequency information from the complete low-coverage 1000 Genomes Project data set in conjunction with a support vector machine trained from known functional and nonfunctional portions of the genome, we are able to accurately identify portions of the genome constrained by purifying selection. Our method identifies previously known human-specific gains or losses of function and uncovers many novel candidates. Candidate targets for gain and loss of function along the human lineage include numerous putative regulatory regions of genes essential for normal development of the central nervous system, including a significant enrichment of gain of function events near neurotransmitter receptor genes. These results are consistent with regulatory turnover being a key mechanism in the evolution of human-specific characteristics of brain development. Finally, we show that the majority of the genome is unconstrained by natural selection currently, in agreement with what has been estimated from phylogenetic methods but in sharp contrast to estimates based on transcriptomics or other high-throughput functional methods. PMID:26590212

  4. Reference set of regulons in Desulfovibrionales inferred by comparative genomics approach

    SciTech Connect

    Kazakov, A.E.; Rodionov, D.A.; Price, M.N.; Arkin, A.P.; Dubchak, I.; Novichkov, P.S.

    2010-11-15

    in this study, we carried out large-scale comparative genomics analysis of regulatory interactions in Desulfovibrio vulgaris and 12 related genomes from Desulfovibrionales order using our recently developed web server RegPredict (http://regpredict.lbl.gov). An overall reference collection of 26 Desulfovibrionales regulogs can be accessed through RegPrecise database (http://regpredict.lbl.gov).

  5. Inferring Selective Constraint from Population Genomic Data Suggests Recent Regulatory Turnover in the Human Brain

    PubMed Central

    Schrider, Daniel R.; Kern, Andrew D.

    2015-01-01

    The comparative genomics revolution of the past decade has enabled the discovery of functional elements in the human genome via sequence comparison. While that is so, an important class of elements, those specific to humans, is entirely missed by searching for sequence conservation across species. Here we present an analysis based on variation data among human genomes that utilizes a supervised machine learning approach for the identification of human-specific purifying selection in the genome. Using only allele frequency information from the complete low-coverage 1000 Genomes Project data set in conjunction with a support vector machine trained from known functional and nonfunctional portions of the genome, we are able to accurately identify portions of the genome constrained by purifying selection. Our method identifies previously known human-specific gains or losses of function and uncovers many novel candidates. Candidate targets for gain and loss of function along the human lineage include numerous putative regulatory regions of genes essential for normal development of the central nervous system, including a significant enrichment of gain of function events near neurotransmitter receptor genes. These results are consistent with regulatory turnover being a key mechanism in the evolution of human-specific characteristics of brain development. Finally, we show that the majority of the genome is unconstrained by natural selection currently, in agreement with what has been estimated from phylogenetic methods but in sharp contrast to estimates based on transcriptomics or other high-throughput functional methods. PMID:26590212

  6. Haemonchus contortus: Genome Structure, Organization and Comparative Genomics.

    PubMed

    Laing, R; Martinelli, A; Tracey, A; Holroyd, N; Gilleard, J S; Cotton, J A

    2016-01-01

    One of the first genome sequencing projects for a parasitic nematode was that for Haemonchus contortus. The open access data from the Wellcome Trust Sanger Institute provided a valuable early resource for the research community, particularly for the identification of specific genes and genetic markers. Later, a second sequencing project was initiated by the University of Melbourne, and the two draft genome sequences for H. contortus were published back-to-back in 2013. There is a pressing need for long-range genomic information for genetic mapping, population genetics and functional genomic studies, so we are continuing to improve the Wellcome Trust Sanger Institute assembly to provide a finished reference genome for H. contortus. This review describes this process, compares the H. contortus genome assemblies with draft genomes from other members of the strongylid group and discusses future directions for parasite genomics using the H. contortus model. PMID:27238013

  7. Glycoprotein Structural Genomics: Solving the Glycosylation Problem

    PubMed Central

    Chang, Veronica T.; Crispin, Max; Aricescu, A. Radu; Harvey, David J.; Nettleship, Joanne E.; Fennelly, Janet A.; Yu, Chao; Boles, Kent S.; Evans, Edward J.; Stuart, David I.; Dwek, Raymond A.; Jones, E. Yvonne; Owens, Raymond J.; Davis, Simon J.

    2007-01-01

    Summary Glycoproteins present special problems for structural genomic analysis because they often require glycosylation in order to fold correctly, whereas their chemical and conformational heterogeneity generally inhibits crystallization. We show that the “glycosylation problem” can be solved by expressing glycoproteins transiently in mammalian cells in the presence of the N-glycosylation processing inhibitors, kifunensine or swainsonine. This allows the correct folding of the glycoproteins, but leaves them sensitive to enzymes, such as endoglycosidase H, that reduce the N-glycans to single residues, enhancing crystallization. Since the scalability of transient mammalian expression is now comparable to that of bacterial systems, this approach should relieve one of the major bottlenecks in structural genomic analysis. PMID:17355862

  8. PyClone: Statistical inference of clonal population structure in cancer

    PubMed Central

    Roth, Andrew; Khattra, Jaswinder; Yap, Damian; Wan, Adrian; Laks, Emma; Biele, Justina; Ha, Gavin; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P.

    2016-01-01

    We introduce a novel statistical method, PyClone, for inference of clonal population structures in cancers. PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy number changes and normal cell contamination. Single cell sequencing validation demonstrates that PyClone infers accurate clustering of mutations that co-occur in individual cells. PMID:24633410

  9. Unifying Inference of Meso-Scale Structures in Networks

    PubMed Central

    Tunç, Birkan; Verma, Ragini

    2015-01-01

    Networks are among the most prevalent formal representations in scientific studies, employed to depict interactions between objects such as molecules, neuronal clusters, or social groups. Studies performed at meso-scale that involve grouping of objects based on their distinctive interaction patterns form one of the main lines of investigation in network science. In a social network, for instance, meso-scale structures can correspond to isolated social groupings or groups of individuals that serve as a communication core. Currently, the research on different meso-scale structures such as community and core-periphery structures has been conducted via independent approaches, which precludes the possibility of an algorithmic design that can handle multiple meso-scale structures and deciding which structure explains the observed data better. In this study, we propose a unified formulation for the algorithmic detection and analysis of different meso-scale structures. This facilitates the investigation of hybrid structures that capture the interplay between multiple meso-scale structures and statistical comparison of competing structures, all of which have been hitherto unavailable. We demonstrate the applicability of the methodology in analyzing the human brain network, by determining the dominant organizational structure (communities) of the brain, as well as its auxiliary characteristics (core-periphery). PMID:26569619

  10. Bayesian inference of protein structure from chemical shift data

    PubMed Central

    Bratholm, Lars A.; Christensen, Anders S.; Hamelryck, Thomas

    2015-01-01

    Protein chemical shifts are routinely used to augment molecular mechanics force fields in protein structure simulations, with weights of the chemical shift restraints determined empirically. These weights, however, might not be an optimal descriptor of a given protein structure and predictive model, and a bias is introduced which might result in incorrect structures. In the inferential structure determination framework, both the unknown structure and the disagreement between experimental and back-calculated data are formulated as a joint probability distribution, thus utilizing the full information content of the data. Here, we present the formulation of such a probability distribution where the error in chemical shift prediction is described by either a Gaussian or Cauchy distribution. The methodology is demonstrated and compared to a set of empirically weighted potentials through Markov chain Monte Carlo simulations of three small proteins (ENHD, Protein G and the SMN Tudor Domain) using the PROFASI force field and the chemical shift predictor CamShift. Using a clustering-criterion for identifying the best structure, together with the addition of a solvent exposure scoring term, the simulations suggests that sampling both the structure and the uncertainties in chemical shift prediction leads more accurate structures compared to conventional methods using empirical determined weights. The Cauchy distribution, using either sampled uncertainties or predetermined weights, did, however, result in overall better convergence to the native fold, suggesting that both types of distribution might be useful in different aspects of the protein structure prediction. PMID:25825683

  11. PICARA, an Analytical Pipeline Providing Probabilistic Inference about A Priori Candidates Genes Underlying Genome-Wide Association QTL in Plants

    PubMed Central

    Chen, Charles; DeClerck, Genevieve; Tian, Feng; Spooner, William; McCouch, Susan; Buckler, Edward

    2012-01-01

    PICARA is an analytical pipeline designed to systematically summarize observed SNP/trait associations identified by genome wide association studies (GWAS) and to identify candidate genes involved in the regulation of complex trait variation. The pipeline provides probabilistic inference about a priori candidate genes using integrated information derived from genome-wide association signals, gene homology, and curated gene sets embedded in pathway descriptions. In this paper, we demonstrate the performance of PICARA using data for flowering time variation in maize – a key trait for geographical and seasonal adaption of plants. Among 406 curated flowering time-related genes from Arabidopsis, we identify 61 orthologs in maize that are significantly enriched for GWAS SNP signals, including key regulators such as FT (Flowering Locus T) and GI (GIGANTEA), and genes centered in the Arabidopsis circadian pathway, including TOC1 (Timing of CAB Expression 1) and LHY (Late Elongated Hypocotyl). In addition, we discover a regulatory feature that is characteristic of these a priori flowering time candidates in maize. This new probabilistic analytical pipeline helps researchers infer the functional significance of candidate genes associated with complex traits and helps guide future experiments by providing statistical support for gene candidates based on the integration of heterogeneous biological information. PMID:23144785

  12. The Quality and Validation of Structures from Structural Genomics

    PubMed Central

    Domagalski, Marcin J.; Zheng, Heping; Zimmerman, Matthew D.; Dauter, Zbigniew; Wlodawer, Alexander; Minor, Wladek

    2014-01-01

    Quality control of three-dimensional structures of macromolecules is a critical step to ensure the integrity of structural biology data, especially those produced by structural genomics centers. Whereas the Protein Data Bank (PDB) has proven to be a remarkable success overall, the inconsistent quality of structures reveals a lack of universal standards for structure/deposit validation. Here, we review the state-of-the-art methods used in macromolecular structure validation, focusing on validation of structures determined by X-ray crystallography. We describe some general protocols used in the rebuilding and re-refinement of problematic structural models. We also briefly discuss some frontier areas of structure validation, including refinement of protein–ligand complexes, automation of structure redetermination, and the use of NMR structures and computational models to solve X-ray crystal structures by molecular replacement. PMID:24203341

  13. High-level phylogeny of the Coleoptera inferred with mitochondrial genome sequences.

    PubMed

    Yuan, Ming-Long; Zhang, Qi-Lin; Zhang, Li; Guo, Zhong-Long; Liu, Yong-Jian; Shen, Yu-Ying; Shao, Renfu

    2016-11-01

    The Coleoptera (beetles) exhibits tremendous morphological, ecological, and behavioral diversity. To better understand the phylogenetics and evolution of beetles, we sequenced three complete mitogenomes from two families (Cleridae and Meloidae), which share conserved mitogenomic features with other completely sequenced beetles. We assessed the influence of six datasets and three inference methods on topology and nodal support within the Coleoptera. We found that both Bayesian inference and maximum likelihood with homogeneous-site models were greatly affected by nucleotide compositional heterogeneity, while the heterogeneous-site mixture model in PhyloBayes could provide better phylogenetic signals for the Coleoptera. The amino acid dataset generated more reliable tree topology at the higher taxonomic levels (i.e. suborders and series), where the inclusion of rRNA genes and the third positions of protein-coding genes improved phylogenetic inference at the superfamily level, especially under a heterogeneous-site model. We recovered the suborder relationships as (Archostemata+Adephaga)+(Myxophaga+Polyphaga). The series relationships within Polyphaga were recovered as (Scirtiformia+(Elateriformia+((Bostrichiformia+Scarabaeiformia+Staphyliniformia)+Cucujiformia))). All superfamilies within Cucujiformia were recovered as monophyletic. We obtained a cucujiform phylogeny of (Cleroidea+(Coccinelloidea+((Lymexyloidea+Tenebrionoidea)+(Cucujoidea+(Chrysomeloidea+Curculionoidea))))). This study showed that although tree topologies were sensitive to data types and inference methods, mitogenomic data could provide useful information for resolving the Coleoptera phylogeny at various taxonomic levels by using suitable datasets and heterogeneous-site models. PMID:27497607

  14. Mechanisms underlying structural variant formation in genomic disorders

    PubMed Central

    Carvalho, Claudia M. B.; Lupski, James R.

    2016-01-01

    With the recent burst of technological developments in genomics, and the clinical implementation of genome-wide assays, our understanding of the molecular basis of genomic disorders, specifically the contribution of structural variation to disease burden, is evolving quickly. Ongoing studies have revealed a ubiquitous role for genome architecture in the formation of structural variants at a given locus, both in DNA recombination-based processes and in replication-based processes. These reports showcase the influence of repeat sequences on genomic stability and structural variant complexity and also highlight the tremendous plasticity and dynamic nature of our genome in evolution, health and disease susceptibility. PMID:26924765

  15. Distribution and Evolution of Repeated Sequences in Genomes of Triatominae (Hemiptera-Reduviidae) Inferred from Genomic In Situ Hybridization

    PubMed Central

    Pita, Sebastian; Panzera, Francisco; Sánchez, Antonio; Panzera, Yanina; Palomeque, Teresa; Lorite, Pedro

    2014-01-01

    The subfamily Triatominae, vectors of Chagas disease, comprises 140 species characterized by a highly homogeneous chromosome number. We analyzed the chromosomal distribution and evolution of repeated sequences in Triatominae genomes by Genomic in situ Hybridization using Triatoma delpontei and Triatoma infestans genomic DNAs as probes. Hybridizations were performed on their own chromosomes and on nine species included in six genera from the two main tribes: Triatomini and Rhodniini. Genomic probes clearly generate two different hybridization patterns, dispersed or accumulated in specific regions or chromosomes. The three used probes generate the same hybridization pattern in each species. However, these patterns are species-specific. In closely related species, the probes strongly hybridized in the autosomal heterochromatic regions, resembling C-banding and DAPI patterns. However, in more distant species these co-localizations are not observed. The heterochromatic Y chromosome is constituted by highly repeated sequences, which is conserved among 10 species of Triatomini tribe suggesting be an ancestral character for this group. However, the Y chromosome in Rhodniini tribe is markedly different, supporting the early evolutionary dichotomy between both tribes. In some species, sex chromosomes and autosomes shared repeated sequences, suggesting meiotic chromatin exchanges among these heterologous chromosomes. Our GISH analyses enabled us to acquire not only reliable information about autosomal repeated sequences distribution but also an insight into sex chromosome evolution in Triatominae. Furthermore, the differentiation obtained by GISH might be a valuable marker to establish phylogenetic relationships and to test the controversial origin of the Triatominae subfamily. PMID:25478792

  16. Inferring Speciation Processes from Patterns of Natural Variation in Microbial Genomes

    PubMed Central

    Krause, David J.; Whitaker, Rachel J.

    2015-01-01

    Microbial species concepts have long been the focus of contentious debate, fueled by technological limitations to the genetic resolution of species, by the daunting task of investigating phenotypic variation among individual microscopic organisms, and by a lack of understanding of gene flow in reproductively asexual organisms that are prone to promiscuous horizontal gene transfer. Population genomics, the emerging approach of analyzing the complete genomes of a multitude of closely related organisms, is poised to overcome these limitations by providing a window into patterns of genome variation revealing the evolutionary processes through which species diverge. This new approach is more than just an extension of previous multilocus sequencing technologies, in that it provides a comprehensive view of interacting evolutionary processes. Here we argue that the application of population genomic tools in a rigorous population genetic framework will help to identify the processes of microbial speciation and ultimately lead to a general species concept based on the unique biology and ecology of microorganisms. PMID:26316424

  17. New insights into the Tyrolean Iceman's origin and phenotype as inferred by whole-genome sequencing.

    PubMed

    Keller, Andreas; Graefen, Angela; Ball, Markus; Matzas, Mark; Boisguerin, Valesca; Maixner, Frank; Leidinger, Petra; Backes, Christina; Khairat, Rabab; Forster, Michael; Stade, Björn; Franke, Andre; Mayer, Jens; Spangler, Jessica; McLaughlin, Stephen; Shah, Minita; Lee, Clarence; Harkins, Timothy T; Sartori, Alexander; Moreno-Estrada, Andres; Henn, Brenna; Sikora, Martin; Semino, Ornella; Chiaroni, Jacques; Rootsi, Siiri; Myres, Natalie M; Cabrera, Vicente M; Underhill, Peter A; Bustamante, Carlos D; Vigl, Eduard Egarter; Samadelli, Marco; Cipollini, Giovanna; Haas, Jan; Katus, Hugo; O'Connor, Brian D; Carlson, Marc R J; Meder, Benjamin; Blin, Nikolaus; Meese, Eckart; Pusch, Carsten M; Zink, Albert

    2012-01-01

    The Tyrolean Iceman, a 5,300-year-old Copper age individual, was discovered in 1991 on the Tisenjoch Pass in the Italian part of the Ötztal Alps. Here we report the complete genome sequence of the Iceman and show 100% concordance between the previously reported mitochondrial genome sequence and the consensus sequence generated from our genomic data. We present indications for recent common ancestry between the Iceman and present-day inhabitants of the Tyrrhenian Sea, that the Iceman probably had brown eyes, belonged to blood group O and was lactose intolerant. His genetic predisposition shows an increased risk for coronary heart disease and may have contributed to the development of previously reported vascular calcifications. Sequences corresponding to ~60% of the genome of Borrelia burgdorferi are indicative of the earliest human case of infection with the pathogen for Lyme borreliosis. PMID:22426219

  18. Interspecific Chromosome Substitution Lines as Genetic Resources for Improvement, Trait Analyses and Genomic Inference

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Three of the research areas likely to very significantly impact genetic improvement of cotton are interspecific introgression, genetic dissection of complex traits and sequencing of [AD] genomes. Interspecific introgression is expected to increase genetic diversity of breeding germplasm, creating op...

  19. Data set of phylogenetic analysis inferred based on the complete genomes of the family Nodaviridae.

    PubMed

    Low, Chen-Fei; Bunawan, Hamidun

    2016-09-01

    In this article, nine complete genomes of viruses from the genus Alphanodavirus and Betanodavirus (Family Nodaviridae) were comparatively analyzed and the data of their evolutionary origins and relatedness are reported. The nucleotide sequence alignment of the complete genomes from all species and their deduced evolutionary relationships are presented. High sequence similarity within the genus Betanodavirus compared to the genus Alphanodavirus was revealed in multiple sequence alignment of the Nodaviridae genomes. The amino acid sequence similarity for both RNA1 and RNA2 ORF is more conserved in Betanodavirus, compared to Alphanodavirus. The conserved and variable regions within the virus genome that were defined based on the multiple sequence alignments are presented in this dataset. PMID:27617282

  20. Demographic History of the Genus Pan Inferred from Whole Mitochondrial Genome Reconstructions

    PubMed Central

    Tucci, Serena; de Manuel, Marc; Ghirotto, Silvia; Benazzo, Andrea; Prado-Martinez, Javier; Lorente-Galdos, Belen; Nam, Kiwoong; Dabad, Marc; Hernandez-Rodriguez, Jessica; Comas, David; Navarro, Arcadi; Schierup, Mikkel H.; Andres, Aida M.; Barbujani, Guido; Hvilsom, Christina; Marques-Bonet, Tomas

    2016-01-01

    The genus Pan is the closest genus to our own and it includes two species, Pan paniscus (bonobos) and Pan troglodytes (chimpanzees). The later is constituted by four subspecies, all highly endangered. The study of the Pan genera has been incessantly complicated by the intricate relationship among subspecies and the statistical limitations imposed by the reduced number of samples or genomic markers analyzed. Here, we present a new method to reconstruct complete mitochondrial genomes (mitogenomes) from whole genome shotgun (WGS) datasets, mtArchitect, showing that its reconstructions are highly accurate and consistent with long-range PCR mitogenomes. We used this approach to build the mitochondrial genomes of 20 newly sequenced samples which, together with available genomes, allowed us to analyze the hitherto most complete Pan mitochondrial genome dataset including 156 chimpanzee and 44 bonobo individuals, with a proportional contribution from all chimpanzee subspecies. We estimated the separation time between chimpanzees and bonobos around 1.15 million years ago (Mya) [0.81–1.49]. Further, we found that under the most probable genealogical model the two clades of chimpanzees, Western + Nigeria-Cameroon and Central + Eastern, separated at 0.59 Mya [0.41–0.78] with further internal separations at 0.32 Mya [0.22–0.43] and 0.16 Mya [0.17–0.34], respectively. Finally, for a subset of our samples, we compared nuclear versus mitochondrial genomes and we found that chimpanzee subspecies have different patterns of nuclear and mitochondrial diversity, which could be a result of either processes affecting the mitochondrial genome, such as hitchhiking or background selection, or a result of population dynamics. PMID:27345955

  1. Demographic History of the Genus Pan Inferred from Whole Mitochondrial Genome Reconstructions.

    PubMed

    Lobon, Irene; Tucci, Serena; de Manuel, Marc; Ghirotto, Silvia; Benazzo, Andrea; Prado-Martinez, Javier; Lorente-Galdos, Belen; Nam, Kiwoong; Dabad, Marc; Hernandez-Rodriguez, Jessica; Comas, David; Navarro, Arcadi; Schierup, Mikkel H; Andres, Aida M; Barbujani, Guido; Hvilsom, Christina; Marques-Bonet, Tomas

    2016-01-01

    The genus Pan is the closest genus to our own and it includes two species, Pan paniscus (bonobos) and Pan troglodytes (chimpanzees). The later is constituted by four subspecies, all highly endangered. The study of the Pan genera has been incessantly complicated by the intricate relationship among subspecies and the statistical limitations imposed by the reduced number of samples or genomic markers analyzed. Here, we present a new method to reconstruct complete mitochondrial genomes (mitogenomes) from whole genome shotgun (WGS) datasets, mtArchitect, showing that its reconstructions are highly accurate and consistent with long-range PCR mitogenomes. We used this approach to build the mitochondrial genomes of 20 newly sequenced samples which, together with available genomes, allowed us to analyze the hitherto most complete Pan mitochondrial genome dataset including 156 chimpanzee and 44 bonobo individuals, with a proportional contribution from all chimpanzee subspecies. We estimated the separation time between chimpanzees and bonobos around 1.15 million years ago (Mya) [0.81-1.49]. Further, we found that under the most probable genealogical model the two clades of chimpanzees, Western + Nigeria-Cameroon and Central + Eastern, separated at 0.59 Mya [0.41-0.78] with further internal separations at 0.32 Mya [0.22-0.43] and 0.16 Mya [0.17-0.34], respectively. Finally, for a subset of our samples, we compared nuclear versus mitochondrial genomes and we found that chimpanzee subspecies have different patterns of nuclear and mitochondrial diversity, which could be a result of either processes affecting the mitochondrial genome, such as hitchhiking or background selection, or a result of population dynamics. PMID:27345955

  2. Inference on recombination and block structure using unphased data.

    PubMed Central

    Wiuf, Carsten

    2004-01-01

    In this study compatibility with a tree for unphased genotype data is discussed. If the data are compatible with a tree, the data are consistent with an assumption of no recombination in its evolutionary history. Further, it is said that there is a solution to the perfect phylogeny problem; i.e., for each individual a pair of haplotypes can be defined and the set of all haplotypes can be explained without invoking recombination. A new algorithm to decide whether or not a sample is compatible with a tree is derived. The new algorithm relies on an equivalence relation between sites that mutually determine the phase of each other. (The previous algorithm was based on advanced graph theoretical tools.) The equivalence relation is used to derive the number of solutions to the perfect phylogeny problem. Further, a series of statistics, R ( j ) ( M ), j >or= 2, are defined. These can be used to detect recombination events in the sample's history and to divide the sample into regions that are compatible with a tree. The new statistics are applied to real data from human genes. The results from this application are discussed with reference to recent suggestions that recombination in the human genome is highly heterogeneous. PMID:15020442

  3. Systematic Prioritization of Druggable Mutations in ∼5000 Genomes Across 16 Cancer Types Using a Structural Genomics-based Approach.

    PubMed

    Zhao, Junfei; Cheng, Feixiong; Wang, Yuanyuan; Arteaga, Carlos L; Zhao, Zhongming

    2016-02-01

    A massive amount of somatic mutations has been cataloged in large-scale projects such as The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium projects. The majority of the somatic mutations found in tumor genomes are neutral 'passenger' rather than damaging "driver" mutations. Now, understanding their biological consequences and prioritizing them for druggable targets are urgently needed. Thanks to the rapid advances in structural genomics technologies (e.g. X-ray), large-scale protein structural data has now been made available, providing critical information for deciphering functional roles of mutations in cancer and prioritizing those alterations that may mediate drug binding at the atom resolution and, as such, be druggable targets. We hypothesized that mutations at protein-ligand binding-site residues are likely to be druggable targets. Thus, to prioritize druggable mutations, we developed SGDriver, a structural genomics-based method incorporating the somatic missense mutations into protein-ligand binding-site residues using a Bayes inference statistical framework. We applied SGDriver to 746,631 missense mutations observed in 4997 tumor-normal pairs across 16 cancer types from The Cancer Genome Atlas. SGDriver detected 14,471 potential druggable mutations in 2091 proteins (including 1,516 recurrently mutated proteins) across 3558 cancer genomes (71.2%), and further identified 298 proteins harboring mutations that were significantly enriched at protein-ligand binding-site residues (adjusted p value < 0.05). The identified proteins are significantly enriched in both oncoproteins and tumor suppressors. The follow-up drug-target network analysis suggested 98 known and 126 repurposed druggable anticancer targets (e.g. SPOP and NR3C1). Furthermore, our integrative analysis indicated that 13% of patients might benefit from current targeted therapy, and this -proportion would increase to 31% when considering drug repositioning. This study

  4. GMOL: An Interactive Tool for 3D Genome Structure Visualization.

    PubMed

    Nowotny, Jackson; Wells, Avery; Oluwadare, Oluwatosin; Xu, Lingfei; Cao, Renzhi; Trieu, Tuan; He, Chenfeng; Cheng, Jianlin

    2016-01-01

    It has been shown that genome spatial structures largely affect both genome activity and DNA function. Knowing this, many researchers are currently attempting to accurately model genome structures. Despite these increased efforts there still exists a shortage of tools dedicated to visualizing the genome. Creating a tool that can accurately visualize the genome can aid researchers by highlighting structural relationships that may not be obvious when examining the sequence information alone. Here we present a desktop application, known as GMOL, designed to effectively visualize genome structures so that researchers may better analyze genomic data. GMOL was developed based upon our multi-scale approach that allows a user to scale between six separate levels within the genome. With GMOL, a user can choose any unit at any scale and scale it up or down to visualize its structure and retrieve corresponding genome sequences. Users can also interactively manipulate and measure the whole genome structure and extract static images and machine-readable data files in PDB format from the multi-scale structure. By using GMOL researchers will be able to better understand and analyze genome structure models and the impact their structural relations have on genome activity and DNA function. PMID:26868282

  5. Inferring Meaning from Syntactic Structures in Acquisition: The Case of Transitivity and Telicity

    ERIC Educational Resources Information Center

    Wagner, Laura

    2010-01-01

    This paper investigated children's ability to use syntactic structures to infer semantic information. The particular syntax-semantics link examined was the one between transitivity (transitive/intransitive structures) and telicity (telic/atelic perspectives; that is, boundedness). Although transitivity is an important syntactic reflex of telicity,…

  6. Genome structure of cottontail rabbit herpesvirus.

    PubMed

    Cebrian, J; Berthelot, N; Laithier, M

    1989-02-01

    The genome structure of a herpesvirus isolated from primary cultures of kidney cells from the cottontail rabbit Sylvilagus floridanus was elucidated by using electron microscopy and restriction enzyme analysis. The genome, which was about 150 kilobase pairs long and which had an average G + C composition of 45%, consisted of two regions with unique base sequences (54 and 47 kilobase pairs) enclosed by reiterations of a 925-base-pair sequence with a variable copy number. The internal repeats were in opposite polarity with respect to the terminal repeats, and both unique regions underwent inversion. The nucleotide sequence of the repeat unit was determined, and virion DNA termini were precisely localized within this sequence. Elements showing homology with the cleavage-packaging signals common to other herpesviruses were detected. The data indicate that this virus is different from the previously described herpesvirus sylvilagus. PMID:2911115

  7. Protein NMR Structure Refinement based on Bayesian Inference

    NASA Astrophysics Data System (ADS)

    Ikeya, Teppei; Ikeda, Shiro; Kigawa, Takanori; Ito, Yutaka; Güntert, Peter

    2016-03-01

    Nuclear Magnetic Resonance (NMR) spectroscopy is a tool to investigate threedimensional (3D) structures and dynamics of biomacromolecules at atomic resolution in solution or more natural environments such as living cells. Since NMR data are principally only spectra with peak signals, it is required to properly deduce structural information from the sparse experimental data with their imperfections and uncertainty, and to visualize 3D conformations by NMR structure calculation. In order to efficiently analyse the data, Rieping et al. proposed a new structure calculation method based on Bayes’ theorem. We implemented a similar approach into the program CYANA with some modifications. It allows us to handle automatic NOE cross peak assignments in unambiguous and ambiguous usages, and to create a prior distribution based on a physical force field with the generalized Born implicit water model. The sampling scheme for obtaining the posterior is performed by a hybrid Monte Carlo algorithm combined with Markov chain Monte Carlo (MCMC) by the Gibbs sampler, and molecular dynamics simulation (MD) for obtaining a canonical ensemble of conformations. Since it is not trivial to search the entire function space particularly for exploring the conformational prior due to the extraordinarily large conformation space of proteins, the replica exchange method is performed, in which several MCMC calculations with different temperatures run in parallel as replicas. It is shown with simulated data or randomly deleted experimental peaks that the new structure calculation method can provide accurate structures even with less peaks, especially compared with the conventional method. In particular, it dramatically improves in-cell structures of the proteins GB1 and TTHA1718 using exclusively information obtained in living Escherichia coli (E. coli) cells.

  8. Comparative genomics of four Liliales families inferred from the complete chloroplast genome sequence of Veratrum patulum O. Loes. (Melanthiaceae).

    PubMed

    Do, Hoang Dang Khoa; Kim, Jung Sung; Kim, Joo-Hwan

    2013-11-10

    The sequence of the chloroplast genome, which is inherited maternally, contains useful information for many scientific fields such as plant systematics, biogeography and biotechnology because its characteristics are highly conserved among species. There is an increase in chloroplast genomes of angiosperms that have been sequenced in recent years. In this study, the nucleotide sequence of the chloroplast genome (cpDNA) of Veratrum patulum Loes. (Melanthiaceae, Liliales) was analyzed completely. The circular double-stranded DNA of 153,699 bp consists of two inverted repeat (IR) regions of 26,360 bp each, a large single copy of 83,372 bp, and a small single copy of 17,607 bp. This plastome contains 81 protein-coding genes, 30 distinct tRNA and four genes of rRNA. In addition, there are six hypothetical coding regions (ycf1, ycf2, ycf3, ycf4, ycf15 and ycf68) and two open reading frames (ORF42 and ORF56), which are also found in the chloroplast genomes of the other species. The gene orders and gene contents of the V. patulum plastid genome are similar to that of Smilax china, Lilium longiflorum and Alstroemeria aurea, members of the Smilacaceae, Liliaceae and Alstroemeriaceae (Liliales), respectively. However, the loss rps16 exon 2 in V. patulum results in the difference in the large single copy regions in comparison with other species. The base substitution rate is quite similar among genes of these species. Additionally, the base substitution rate of inverted repeat region was smaller than that of single copy regions in all observed species of Liliales. The IR regions were expanded to trnH_GUG in V. patulum, a part of rps19 in L. longiflorum and A. aurea, and whole sequence of rps19 in S. china. Furthermore, the IGS lengths of rbcL-accD-psaI region were variable among Liliales species, suggesting that this region might be a hotspot of indel events and the informative site for phylogenetic studies in Liliales. In general, the whole chloroplast genome of V. patulum, a

  9. Non-Bayesian Inference: Causal Structure Trumps Correlation

    ERIC Educational Resources Information Center

    Bes, Benedicte; Sloman, Steven; Lucas, Christopher G.; Raufaste, Eric

    2012-01-01

    The study tests the hypothesis that conditional probability judgments can be influenced by causal links between the target event and the evidence even when the statistical relations among variables are held constant. Three experiments varied the causal structure relating three variables and found that (a) the target event was perceived as more…

  10. Alternative Multiple Imputation Inference for Mean and Covariance Structure Modeling

    ERIC Educational Resources Information Center

    Lee, Taehun; Cai, Li

    2012-01-01

    Model-based multiple imputation has become an indispensable method in the educational and behavioral sciences. Mean and covariance structure models are often fitted to multiply imputed data sets. However, the presence of multiple random imputations complicates model fit testing, which is an important aspect of mean and covariance structure…

  11. Inference of expanded Lrp-like feast/famine transcription factor targets in a non-model organism using protein structure-based prediction.

    PubMed

    Ashworth, Justin; Plaisier, Christopher L; Lo, Fang Yin; Reiss, David J; Baliga, Nitin S

    2014-01-01

    Widespread microbial genome sequencing presents an opportunity to understand the gene regulatory networks of non-model organisms. This requires knowledge of the binding sites for transcription factors whose DNA-binding properties are unknown or difficult to infer. We adapted a protein structure-based method to predict the specificities and putative regulons of homologous transcription factors across diverse species. As a proof-of-concept we predicted the specificities and transcriptional target genes of divergent archaeal feast/famine regulatory proteins, several of which are encoded in the genome of Halobacterium salinarum. This was validated by comparison to experimentally determined specificities for transcription factors in distantly related extremophiles, chromatin immunoprecipitation experiments, and cis-regulatory sequence conservation across eighteen related species of halobacteria. Through this analysis we were able to infer that Halobacterium salinarum employs a divergent local trans-regulatory strategy to regulate genes (carA and carB) involved in arginine and pyrimidine metabolism, whereas Escherichia coli employs an operon. The prediction of gene regulatory binding sites using structure-based methods is useful for the inference of gene regulatory relationships in new species that are otherwise difficult to infer. PMID:25255272

  12. The Generator of the Event Structure Lexicon (GESL): Automatic Annotation of Event Structure for Textual Inference Tasks

    ERIC Educational Resources Information Center

    Im, Seohyun

    2013-01-01

    This dissertation aims to develop the Generator of the Event Structure Lexicon (GESL) which is a tool to automate annotating the event structure of verbs in text to support textual inference tasks related to lexically entailed subevents. The output of the GESL is the Event Structure Lexicon (ESL), which is a lexicon of verbs in text which includes…

  13. Chromosome evolution in malaria mosquitoes inferred from physically mapped genome assemblies.

    PubMed

    Sharakhov, Igor V; Artemov, Gleb N; Sharakhova, Maria V

    2016-04-01

    Polymorphic inversions in mosquitoes are distributed nonrandomly among chromosomes and are associated with ecological, behavioral, and physiological adaptations related to pathogen transmission. Despite their significance, the patterns and mechanism of genome rearrangements are not well understood. Recent sequencing and physical mapping of the genomes for 16 Anopheles mosquito species provided an opportunity to study chromosome evolution at the highest resolution. New studies revealed that fixed rearrangement accumulated [Formula: see text]3 times faster on the X chromosome than on autosomes. The highest densities of transposable elements (TEs) and satellites of different sizes have also been found on the X chromosome, suggesting a mechanism for the inversion generation. The high rate of X chromosome rearrangements is in sharp contrast with the paucity of polymorphic inversions on the X in the majority of anopheline species. This paper highlights the advances in understanding chromosome evolution in malaria vectors and discusses possible future directions in studying mechanisms and biological roles of genome rearrangements. PMID:27021248

  14. Genome-Wide Structural Variation Detection by Genome Mapping on Nanochannel Arrays

    PubMed Central

    Mak, Angel C. Y.; Lai, Yvonne Y. Y.; Lam, Ernest T.; Kwok, Tsz-Piu; Leung, Alden K. Y.; Poon, Annie; Mostovoy, Yulia; Hastie, Alex R.; Stedman, William; Anantharaman, Thomas; Andrews, Warren; Zhou, Xiang; Pang, Andy W. C.; Dai, Heng; Chu, Catherine; Lin, Chin; Wu, Jacob J. K.; Li, Catherine M. L.; Li, Jing-Woei; Yim, Aldrin K. Y.; Chan, Saki; Sibert, Justin; Džakula, Željko; Cao, Han; Yiu, Siu-Ming; Chan, Ting-Fung; Yip, Kevin Y.; Xiao, Ming; Kwok, Pui-Yan

    2016-01-01

    Comprehensive whole-genome structural variation detection is challenging with current approaches. With diploid cells as DNA source and the presence of numerous repetitive elements, short-read DNA sequencing cannot be used to detect structural variation efficiently. In this report, we show that genome mapping with long, fluorescently labeled DNA molecules imaged on nanochannel arrays can be used for whole-genome structural variation detection without sequencing. While whole-genome haplotyping is not achieved, local phasing (across >150-kb regions) is routine, as molecules from the parental chromosomes are examined separately. In one experiment, we generated genome maps from a trio from the 1000 Genomes Project, compared the maps against that derived from the reference human genome, and identified structural variations that are >5 kb in size. We find that these individuals have many more structural variants than those published, including some with the potential of disrupting gene function or regulation. PMID:26510793

  15. Stock Portfolio Structure of Individual Investors Infers Future Trading Behavior

    PubMed Central

    Bohlin, Ludvig; Rosvall, Martin

    2014-01-01

    Although the understanding of and motivation behind individual trading behavior is an important puzzle in finance, little is known about the connection between an investor's portfolio structure and her trading behavior in practice. In this paper, we investigate the relation between what stocks investors hold, and what stocks they buy, and show that investors with similar portfolio structures to a great extent trade in a similar way. With data from the central register of shareholdings in Sweden, we model the market in a similarity network, by considering investors as nodes, connected with links representing portfolio similarity. From the network, we find investor groups that not only identify different investment strategies, but also represent individual investors trading in a similar way. These findings suggest that the stock portfolios of investors hold meaningful information, which could be used to earn a better understanding of stock market dynamics. PMID:25068302

  16. Stock portfolio structure of individual investors infers future trading behavior.

    PubMed

    Bohlin, Ludvig; Rosvall, Martin

    2014-01-01

    Although the understanding of and motivation behind individual trading behavior is an important puzzle in finance, little is known about the connection between an investor's portfolio structure and her trading behavior in practice. In this paper, we investigate the relation between what stocks investors hold, and what stocks they buy, and show that investors with similar portfolio structures to a great extent trade in a similar way. With data from the central register of shareholdings in Sweden, we model the market in a similarity network, by considering investors as nodes, connected with links representing portfolio similarity. From the network, we find investor groups that not only identify different investment strategies, but also represent individual investors trading in a similar way. These findings suggest that the stock portfolios of investors hold meaningful information, which could be used to earn a better understanding of stock market dynamics. PMID:25068302

  17. Use of Bayesian Inference in Crystallographic Structure Refinement via Full Diffraction Profile Analysis.

    PubMed

    Fancher, Chris M; Han, Zhen; Levin, Igor; Page, Katharine; Reich, Brian J; Smith, Ralph C; Wilson, Alyson G; Jones, Jacob L

    2016-01-01

    A Bayesian inference method for refining crystallographic structures is presented. The distribution of model parameters is stochastically sampled using Markov chain Monte Carlo. Posterior probability distributions are constructed for all model parameters to properly quantify uncertainty by appropriately modeling the heteroskedasticity and correlation of the error structure. The proposed method is demonstrated by analyzing a National Institute of Standards and Technology silicon standard reference material. The results obtained by Bayesian inference are compared with those determined by Rietveld refinement. Posterior probability distributions of model parameters provide both estimates and uncertainties. The new method better estimates the true uncertainties in the model as compared to the Rietveld method. PMID:27550221

  18. Use of Bayesian Inference in Crystallographic Structure Refinement via Full Diffraction Profile Analysis

    PubMed Central

    Fancher, Chris M.; Han, Zhen; Levin, Igor; Page, Katharine; Reich, Brian J.; Smith, Ralph C.; Wilson, Alyson G.; Jones, Jacob L.

    2016-01-01

    A Bayesian inference method for refining crystallographic structures is presented. The distribution of model parameters is stochastically sampled using Markov chain Monte Carlo. Posterior probability distributions are constructed for all model parameters to properly quantify uncertainty by appropriately modeling the heteroskedasticity and correlation of the error structure. The proposed method is demonstrated by analyzing a National Institute of Standards and Technology silicon standard reference material. The results obtained by Bayesian inference are compared with those determined by Rietveld refinement. Posterior probability distributions of model parameters provide both estimates and uncertainties. The new method better estimates the true uncertainties in the model as compared to the Rietveld method. PMID:27550221

  19. Chloroplast genome structure in Ilex (Aquifoliaceae)

    PubMed Central

    Yao, Xin; Tan, Yun-Hong; Liu, Ying-Ying; Song, Yu; Yang, Jun-Bo; Corlett, Richard T.

    2016-01-01

    Aquifoliaceae is the largest family in the campanulid order Aquifoliales. It consists of a single genus, Ilex, the hollies, which is the largest woody dioecious genus in the angiosperms. Most species are in East Asia or South America. The taxonomy and evolutionary history remain unclear due to the lack of a robust species-level phylogeny. We produced the first complete chloroplast genomes in this family, including seven Ilex species, by Illumina sequencing of long-range PCR products and subsequent reference-guided de novo assembly. These genomes have a typical bicyclic structure with a conserved genome arrangement and moderate divergence. The total length is 157,741 bp and there is one large single-copy region (LSC) with 87,109 bp, one small single-copy with 18,436 bp, and a pair of inverted repeat regions (IR) with 52,196 bp. A total of 144 genes were identified, including 96 protein-coding genes, 40 tRNA and 8 rRNA. Thirty-four repetitive sequences were identified in Ilex pubescens, with lengths >14 bp and identity >90%, and 11 divergence hotspot regions that could be targeted for phylogenetic markers. This study will contribute to improved resolution of deep branches of the Ilex phylogeny and facilitate identification of Ilex species. PMID:27378489

  20. Inferring Earth structure from the response to ocean tidal loads

    NASA Astrophysics Data System (ADS)

    Martens, H. R.; Simons, M.; Ito, T.

    2012-12-01

    Tidal forces, generated primarily by gravitational interactions with the moon and Sun, distort the shape of Earth's solid interior (body tides) and redistribute the mass of the oceans (ocean tides). The periodic shifting of ocean mass places cyclic loads on Earth, with the response to these loads observable as spatial displacements in Global Positioning System (GPS) data. Gravitational and elastic responses of the solid Earth to ocean tidal loads (OTLs) are controlled by the material properties of Earth's interior and may hence be used to constrain independently the absolute values of density and the elastic moduli down to c. 300km depth. Previous analysis of this type focused on structure in the western United States. We present observational results and modeled predictions for OTL-induced surface displacements at nearly 100 GPS stations across Brazil, Argentina, and Uruguay. Relative to the earlier study region, eastern South America is an ideal geographic location to study the effects of OTLs because it is composed primarily of stable shield and platform provinces, implying less structural complexity. Furthermore, the region is bounded to the north and east by large amplitude ocean tides. Obtaining absolute values for material properties in the crust and upper mantle beneath South America could provide valuable insight into the structure of the Amazonian craton and hence knowledge about its long-term stability against tectonic deformation. We extract the amplitude and phase of several main tidal constituents from the GPS data using classical harmonic analysis. We then compare our observations with theoretical predictions drawn from a variety of Earth models. Predicted surface displacements derived from radially symmetric Earth models, such as PREM and ad hoc perturbations to PREM, exhibit spatially correlated residuals, suggesting a need to explore a wider family of models, including those with lateral heterogeneity. Initially we have relied on one

  1. Collinearity analysis of Brassica A and C genomes based on an updated inferred unigene order

    PubMed Central

    Bancroft, Ian; Fraser, Fiona; Morgan, Colin; Trick, Martin

    2015-01-01

    This data article includes SNP scoring across lines of the Brassica napus TNDH population based on Illumina sequencing of mRNA, expanded to 75 lines. The 21, 323 mapped markers defined 887 recombination bins, representing an updated genetic linkage map for the species. Based on this new map, 5 genome sequence scaffolds were split and the order and orientation of scaffolds updated to establish a new pseudomolecule specification. The order of unigenes and SNP array probes within these pseudomolecules was determined. Unigenes were assessed for sequence similarity to the A and C genomes. The 57, 246 that mapped to both enabled the collinearity of the A and C genomes to be illustrated graphically. Although the great majority was in collinear positions, some were not. Analyses of 60 such instances are presented, suggesting that the breakdown in collinearity was largely due to either the absence of the homoeologue on one genome (resulting in sequence match to a paralogue) or multiple similar sequences being present. The mRNAseq datasets for the TNDH lines are available from the SRA repository (ERA283648); the remaining datasets are supplied with this article. PMID:26217717

  2. Interspecific chromosome substitution lines as genetic resources for improvement, trait analysis and genomic inference

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genetic base that cotton breeders commonly use to improve Upland cultivars is very narrow. The AD-genome species G. barbadense, G. tomentosum, and G. mustelinum are part of the primary germplasm pool, too, and constitute genetic reservoirs of genes for resistance to abiotic stress, pests and pa...

  3. Chromosomal instability in Afrotheria: fragile sites, evolutionary breakpoints and phylogenetic inference from genome sequence assemblies

    PubMed Central

    Ruiz-Herrera, Aurora; Robinson, Terence J

    2007-01-01

    Background Extant placental mammals are divided into four major clades (Laurasiatheria, Supraprimates, Xenarthra and Afrotheria). Given that Afrotheria is generally thought to root the eutherian tree in phylogenetic analysis of large nuclear gene data sets, the study of the organization of the genomes of afrotherian species provides new insights into the dynamics of mammalian chromosomal evolution. Here we test if there are chromosomal bands with a high tendency to break and reorganize in Afrotheria, and by analyzing the expression of aphidicolin-induced common fragile sites in three afrotherian species, whether these are coincidental with recognized evolutionary breakpoints. Results We described 29 fragile sites in the aardvark (OAF) genome, 27 in the golden mole (CAS), and 35 in the elephant-shrew (EED) genome. We show that fragile sites are conserved among afrotherian species and these are correlated with evolutionary breakpoints when compared to the human (HSA) genome. Inddition, by computationally scanning the newly released opossum (Monodelphis domestica) and chicken sequence assemblies for use as outgroups to Placentalia, we validate the HSA 3/21/5 chromosomal synteny as a rare genomic change that defines the monophyly of this ancient African clade of mammals. On the other hand, support for HSA 1/19p, which is also thought to underpin Afrotheria, is currently ambiguous. Conclusion We provide evidence that (i) the evolutionary breakpoints that characterise human syntenies detected in the basal Afrotheria correspond at the chromosomal band level with fragile sites, (ii) that HSA 3p/21 was in the amniote ancestor (i.e., common to turtles, lepidosaurs, crocodilians, birds and mammals) and was subsequently disrupted in the lineage leading to marsupials. Its expansion to include HSA 5 in Afrotheria is unique and (iii) that its fragmentation to HSA 3p/21 + HSA 5/21 in elephant and manatee was due to a fission within HSA 21 that is probably shared by all

  4. Structured models of infectious disease: inference with discrete data.

    PubMed

    Metcalf, C J E; Lessler, J; Klepac, P; Morice, A; Grenfell, B T; Bjørnstad, O N

    2012-12-01

    The usage of structured population models can make substantial contributions to public health, particularly for infections where clinical outcomes vary over age. There are three theoretical challenges in implementing such analyses: (i) developing an appropriate framework that models both demographic and epidemiological transitions; (ii) parameterizing the framework, where parameters may be based on data ranging from the biological course of infection, basic patterns of human demography, specific characteristics of population growth, and details of vaccination regimes implemented; (iii) evaluating public health strategies in the face of changing human demography. We illustrate the general approach by developing a model of rubella in Costa Rica. The demographic profile of this infection is a crucial aspect of its public health impact, and we use a transient perturbation analysis to explore the impact of changing human demography on immunization strategies implemented. PMID:22178687

  5. Crustal structure beneath northeast India inferred from receiver function modeling

    NASA Astrophysics Data System (ADS)

    Borah, Kajaljyoti; Bora, Dipok K.; Goyal, Ayush; Kumar, Raju

    2016-09-01

    We estimated crustal shear velocity structure beneath ten broadband seismic stations of northeast India, by using H-Vp/Vs stacking method and a non-linear direct search approach, Neighbourhood Algorithm (NA) technique followed by joint inversion of Rayleigh wave group velocity and receiver function, calculated from teleseismic earthquakes data. Results show significant variations of thickness, shear velocities (Vs) and Vp/Vs ratio in the crust of the study region. The inverted shear wave velocity models show crustal thickness variations of 32-36 km in Shillong Plateau (North), 36-40 in Assam Valley and ∼44 km in Lesser Himalaya (South). Average Vp/Vs ratio in Shillong Plateau is less (1.73-1.77) compared to Assam Valley and Lesser Himalaya (∼1.80). Average crustal shear velocity beneath the study region varies from 3.4 to 3.5 km/s. Sediment structure beneath Shillong Plateau and Assam Valley shows 1-2 km thick sediment layer with low Vs (2.5-2.9 km/s) and high Vp/Vs ratio (1.8-2.1), while it is observed to be of greater thickness (4 km) with similar Vs and high Vp/Vs (∼2.5) in RUP (Lesser Himalaya). Both Shillong Plateau and Assam Valley show thick upper and middle crust (10-20 km), and thin (4-9 km) lower crust. Average Vp/Vs ratio in Assam Valley and Shillong Plateau suggest that the crust is felsic-to-intermediate and intermediate-to-mafic beneath Shillong Plateau and Assam Valley, respectively. Results show that lower crust rocks beneath the Shillong Plateau and Assam Valley lies between mafic granulite and mafic garnet granulite.

  6. Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach.

    PubMed

    Boitard, Simon; Rodríguez, Willy; Jay, Flora; Mona, Stefano; Austerlitz, Frédéric

    2016-03-01

    Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles. PMID:26943927

  7. Phylogeny and physiology of candidate phylum 'Atribacteria' (OP9/JS1) inferred from cultivation-independent genomics.

    PubMed

    Nobu, Masaru K; Dodsworth, Jeremy A; Murugapiran, Senthil K; Rinke, Christian; Gies, Esther A; Webster, Gordon; Schwientek, Patrick; Kille, Peter; Parkes, R John; Sass, Henrik; Jørgensen, Bo B; Weightman, Andrew J; Liu, Wen-Tso; Hallam, Steven J; Tsiamis, George; Woyke, Tanja; Hedlund, Brian P

    2016-02-01

    The 'Atribacteria' is a candidate phylum in the Bacteria recently proposed to include members of the OP9 and JS1 lineages. OP9 and JS1 are globally distributed, and in some cases abundant, in anaerobic marine sediments, geothermal environments, anaerobic digesters and reactors and petroleum reservoirs. However, the monophyly of OP9 and JS1 has been questioned and their physiology and ecology remain largely enigmatic due to a lack of cultivated representatives. Here cultivation-independent genomic approaches were used to provide a first comprehensive view of the phylogeny, conserved genomic features and metabolic potential of members of this ubiquitous candidate phylum. Previously available and heretofore unpublished OP9 and JS1 single-cell genomic data sets were used as recruitment platforms for the reconstruction of atribacterial metagenome bins from a terephthalate-degrading reactor biofilm and from the monimolimnion of meromictic Sakinaw Lake. The single-cell genomes and metagenome bins together comprise six species- to genus-level groups that represent most major lineages within OP9 and JS1. Phylogenomic analyses of these combined data sets confirmed the monophyly of the 'Atribacteria' inclusive of OP9 and JS1. Additional conserved features within the 'Atribacteria' were identified, including a gene cluster encoding putative bacterial microcompartments that may be involved in aldehyde and sugar metabolism, energy conservation and carbon storage. Comparative analysis of the metabolic potential inferred from these data sets revealed that members of the 'Atribacteria' are likely to be heterotrophic anaerobes that lack respiratory capacity, with some lineages predicted to specialize in either primary fermentation of carbohydrates or secondary fermentation of organic acids, such as propionate. PMID:26090992

  8. Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach

    PubMed Central

    Boitard, Simon; Rodríguez, Willy; Jay, Flora; Mona, Stefano; Austerlitz, Frédéric

    2016-01-01

    Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles. PMID:26943927

  9. Detecting Heterogeneity in Population Structure Across the Genome in Admixed Populations.

    PubMed

    McHugh, Caitlin; Brown, Lisa; Thornton, Timothy A

    2016-09-01

    The genetic structure of human populations is often characterized by aggregating measures of ancestry across the autosomal chromosomes. While it may be reasonable to assume that population structure patterns are similar genome-wide in relatively homogeneous populations, this assumption may not be appropriate for admixed populations, such as Hispanics and African-Americans, with recent ancestry from two or more continents. Recent studies have suggested that systematic ancestry differences can arise at genomic locations in admixed populations as a result of selection and nonrandom mating. Here, we propose a method, which we refer to as the chromosomal ancestry differences (CAnD) test, for detecting heterogeneity in population structure across the genome. CAnD can incorporate either local or chromosome-wide ancestry inferred from SNP genotype data to identify chromosomes harboring genomic regions with ancestry contributions that are significantly different than expected. In simulation studies with real genotype data from phase III of the HapMap Project, we demonstrate the validity and power of CAnD. We apply CAnD to the HapMap Mexican-American (MXL) and African-American (ASW) population samples; in this analysis the software RFMix is used to infer local ancestry at genomic regions, assuming admixing from Europeans, West Africans, and Native Americans. The CAnD test provides strong evidence of heterogeneity in population structure across the genome in the MXL sample ([Formula: see text]), which is largely driven by elevated Native American ancestry and deficit of European ancestry on the X chromosomes. Among the ASW, all chromosomes are largely African derived and no heterogeneity in population structure is detected in this sample. PMID:27440868

  10. Structure of human serum lipoproteins inferred from compositional analysis.

    PubMed Central

    Shen, B W; Scanu, A M; Kézdy, F J

    1977-01-01

    Analysis of the correlations between size and chemical composition of lipoproteins of normolipidemic human plasma shows that the structure of all circulating lipoproteins is consistent with a spherical model of radius r in which a spherical liquid core of cholesterol esters and triglycerides of radius = r --20.2 A is surrounded by a monolayer of cholesterol and phospholipids with closely hydrophobic ends on the surface of the core. The average molecular areas at this inner surface are Spl = 68.5 A2/molecule for phospholipids and Sc= 39.1 A2/molecule for cholesterol. The proteins are closely packed with the hydrophilic head groups of phospholipids at the outer surface of the particle, with S' pl = 62.7 A2/molecule for phospholipids and Saa = 15.6 A2/amino acid for proteins. The polar head group of free cholesterol does not participate in the packing of the outer layer and thus must be masked by proteins. Free cholesterol is distributed among the circulating lipoproteins--with the exception of very high density lipoprotein and perhaps chylomicrons--according to a thermodynamic equilibrium governed by the curvature of the surface of the particle. PMID:265578

  11. Structural characterization of genomes by large scale sequence-structure threading: application of reliability analysis in structural genomics

    PubMed Central

    Cherkasov, Artem; Ho Sui, Shannan J; Brunham, Robert C; Jones, Steven JM

    2004-01-01

    Background We establish that the occurrence of protein folds among genomes can be accurately described with a Weibull function. Systems which exhibit Weibull character can be interpreted with reliability theory commonly used in engineering analysis. For instance, Weibull distributions are widely used in reliability, maintainability and safety work to model time-to-failure of mechanical devices, mechanisms, building constructions and equipment. Results We have found that the Weibull function describes protein fold distribution within and among genomes more accurately than conventional power functions which have been used in a number of structural genomic studies reported to date. It has also been found that the Weibull reliability parameter β for protein fold distributions varies between genomes and may reflect differences in rates of gene duplication in evolutionary history of organisms. Conclusions The results of this work demonstrate that reliability analysis can provide useful insights and testable predictions in the fields of comparative and structural genomics. PMID:15274750

  12. Simple Math is Enough: Two Examples of Inferring Functional Associations from Genomic Data

    NASA Technical Reports Server (NTRS)

    Liang, Shoudan

    2003-01-01

    Non-random features in the genomic data are usually biologically meaningful. The key is to choose the feature well. Having a p-value based score prioritizes the findings. If two proteins share a unusually large number of common interaction partners, they tend to be involved in the same biological process. We used this finding to predict the functions of 81 un-annotated proteins in yeast.

  13. Clonal genotype and population structure inference from single-cell tumor sequencing.

    PubMed

    Roth, Andrew; McPherson, Andrew; Laks, Emma; Biele, Justina; Yap, Damian; Wan, Adrian; Smith, Maia A; Nielsen, Cydney B; McAlpine, Jessica N; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P

    2016-07-01

    Single-cell DNA sequencing has great potential to reveal the clonal genotypes and population structure of human cancers. However, single-cell data suffer from missing values and biased allelic counts as well as false genotype measurements owing to the sequencing of multiple cells. We describe the Single Cell Genotyper (https://bitbucket.org/aroth85/scg), an open-source software based on a statistical model coupled with a mean-field variational inference method, which can be used to address these problems and robustly infer clonal genotypes. PMID:27183439

  14. Climate-induced changes in lake ecosystem structure inferred from coupled neo- and paleoecological approaches

    USGS Publications Warehouse

    Saros, Jasmine E.; Stone, Jeffery R.; Pederson, Gregory T.; Slemmons, Krista; Spanbauer, Trisha; Schliep, Anna; Cahl, Douglas; Williamson, Craig E.; Engstrom, Daniel R.

    2015-01-01

    Over the 20th century, surface water temperatures have increased in many lake ecosystems around the world, but long-term trends in the vertical thermal structure of lakes remain unclear, despite the strong control that thermal stratification exerts on the biological response of lakes to climate change. Here we used both neo- and paleoecological approaches to develop a fossil-based inference model for lake mixing depths and thereby refine understanding of lake thermal structure change. We focused on three common planktonic diatom taxa, the distributions of which previous research suggests might be affected by mixing depth. Comparative lake surveys and growth rate experiments revealed that these species respond to lake thermal structure when nitrogen is sufficient, with species optima ranging from shallower to deeper mixing depths. The diatom-based mixing depth model was applied to sedimentary diatom profiles extending back to 1750 AD in two lakes with moderate nitrate concentrations but differing climate settings. Thermal reconstructions were consistent with expected changes, with shallower mixing depths inferred for an alpine lake where treeline has advanced, and deeper mixing depths inferred for a boreal lake where wind strength has increased. The inference model developed here provides a new tool to expand and refine understanding of climate-induced changes in lake ecosystems.

  15. Reductive evolution and niche adaptation inferred from the genome of Mycobacterium ulcerans, the causative agent of Buruli ulcer

    PubMed Central

    Stinear, Timothy P.; Seemann, Torsten; Pidot, Sacha; Frigui, Wafa; Reysset, Gilles; Garnier, Thierry; Meurice, Guillaume; Simon, David; Bouchier, Christiane; Ma, Laurence; Tichit, Magali; Porter, Jessica L.; Ryan, Janine; Johnson, Paul D.R.; Davies, John K.; Jenkin, Grant A.; Small, Pamela L.C.; Jones, Louis M.; Tekaia, Fredj; Laval, Françoise; Daffé, Mamadou; Parkhill, Julian; Cole, Stewart T.

    2007-01-01

    Mycobacterium ulcerans is found in aquatic ecosystems and causes Buruli ulcer in humans, a neglected but devastating necrotic disease of subcutaneous tissue that is rampant throughout West and Central Africa. Here, we report the complete 5.8-Mb genome sequence of M. ulcerans and show that it comprises two circular replicons, a chromosome of 5632 kb and a virulence plasmid of 174 kb. The plasmid is required for production of the polyketide toxin mycolactone, which provokes necrosis. Comparisons with the recently completed 6.6-Mb genome of Mycobacterium marinum revealed >98% nucleotide sequence identity and genome-wide synteny. However, as well as the plasmid, M. ulcerans has accumulated 213 copies of the insertion sequence IS2404, 91 copies of IS2606, 771 pseudogenes, two bacteriophages, and multiple DNA deletions and rearrangements. These data indicate that M. ulcerans has recently evolved via lateral gene transfer and reductive evolution from the generalist, more rapid-growing environmental species M. marinum to become a niche-adapted specialist. Predictions based on genome inspection for the production of modified mycobacterial virulence factors, such as the highly abundant phthiodiolone lipids, were confirmed by structural analyses. Similarly, 11 protein-coding sequences identified as M. ulcerans-specific by comparative genomics were verified as such by PCR screening a diverse collection of 33 strains of M. ulcerans and M. marinum. This work offers significant insight into the biology and evolution of mycobacterial pathogens and is an important component of international efforts to counter Buruli ulcer. PMID:17210928

  16. Revealing less derived nature of cartilaginous fish genomes with their evolutionary time scale inferred with nuclear genes.

    PubMed

    Renz, Adina J; Meyer, Axel; Kuraku, Shigehiro

    2013-01-01

    Cartilaginous fishes, divided into Holocephali (chimaeras) and Elasmoblanchii (sharks, rays and skates), occupy a key phylogenetic position among extant vertebrates in reconstructing their evolutionary processes. Their accurate evolutionary time scale is indispensable for better understanding of the relationship between phenotypic and molecular evolution of cartilaginous fishes. However, our current knowledge on the time scale of cartilaginous fish evolution largely relies on estimates using mitochondrial DNA sequences. In this study, making the best use of the still partial, but large-scale sequencing data of cartilaginous fish species, we estimate the divergence times between the major cartilaginous fish lineages employing nuclear genes. By rigorous orthology assessment based on available genomic and transcriptomic sequence resources for cartilaginous fishes, we selected 20 protein-coding genes in the nuclear genome, spanning 2973 amino acid residues. Our analysis based on the Bayesian inference resulted in the mean divergence time of 421 Ma, the late Silurian, for the Holocephali-Elasmobranchii split, and 306 Ma, the late Carboniferous, for the split between sharks and rays/skates. By applying these results and other documented divergence times, we measured the relative evolutionary rate of the Hox A cluster sequences in the cartilaginous fish lineages, which resulted in a lower substitution rate with a factor of at least 2.4 in comparison to tetrapod lineages. The obtained time scale enables mapping phenotypic and molecular changes in a quantitative framework. It is of great interest to corroborate the less derived nature of cartilaginous fish at the molecular level as a genome-wide phenomenon. PMID:23825540

  17. Primate phylogenetic relationships and divergence dates inferred from complete mitochondrial genomes

    PubMed Central

    Hodgson, Jason A.; Burrell, Andrew S.; Sterner, Kirstin N.; Raaum, Ryan L.; Disotell, Todd R.

    2014-01-01

    The origins and the divergence times of the most basal lineages within primates have been difficult to resolve mainly due to the incomplete sampling of early fossil taxa. The main source of contention is related to the discordance between molecular and fossil estimates: while there are no crown primate fossils older than 56 Ma, most molecule-based estimates extend the origins of crown primates into the Cretaceous. Here we present a comprehensive mitogenomic study of primates. We assembled 87 mammalian mitochondrial genomes, including 62 primate species representing all the families of the order. We newly sequenced eleven mitochondrial genomes, including eight Old World monkeys and three strepsirrhines. Phylogenetic analyses support a strong topology, confirming the monophyly for all the major primate clades. In contrast to previous mitogenomic studies, the positions of tarsiers and colugos relative to strepsirrhines and anthropoids are well resolved. In order to improve our understanding of how fossil calibrations affect age estimates within primates, we explore the effect of seventeen fossil calibrations across primates and other mammalian groups and we select a subset of calibrations to date our mitogenomic tree. The divergence date estimates of the Strepsirrhine/Haplorhine split support an origin of crown primates in the Late Cretaceous, at around 74 Ma. This result supports a short fuse model of primate origins, whereby relatively little time passed between the origin of the order and the diversification of its major clades. It also suggests that the early primate fossil record is likely poorly sampled. PMID:24583291

  18. Karyotypic evolution of the family Sciuridae: inferences from the genome organizations of ground squirrels.

    PubMed

    Li, T; Wang, J; Su, W; Nie, W; Yang, F

    2006-01-01

    Cross-species chromosome painting has made a great contribution to our understanding of the evolution of karyotypes and genome organizations of mammals. Several recent papers of comparative painting between tree and flying squirrels have shed some light on the evolution of the family Sciuridae and the order Rodentia. In the present study we have extended the comparative painting to the Himalayan marmot (Marmotahimalayana) and the African ground squirrel (Xerus cf. erythropus), i.e. representative species from another important squirrel group--the ground squirrels--, and have established genome-wide comparative chromosome maps between human, eastern gray squirrel, and these two ground squirrels. The results show that 1) the squirrels so far studied all have conserved karyotypes that resemble the ancestral karyotype of the order Rodentia; 2) the African ground squirrels could have retained the ancestral karyotype of the family Sciuridae. Furthermore, we have mapped the evolutionary rearrangements onto a molecular-based consensus phylogenetic tree of the family Sciuridae. PMID:16484783

  19. Inferring Properties of Ancient Cyanobacteria from Biogeochemical Activity and Genomes of Siderophilic Cyanobacteria

    NASA Technical Reports Server (NTRS)

    McKay, David S.; Brown, I. I.; Tringe, S. G.; Thomas-Keprta, K. E.; Bryant, D. A.; Sarkisova, S. S.; Malley, K.; Sosa, O.; Klatt, C. G.; McKay, D. S.

    2010-01-01

    Interrelationships between life and the planetary system could have simultaneously left landmarks in genomes of microbes and physicochemical signatures in the lithosphere. Verifying the links between genomic features in living organisms and the mineralized signatures generated by these organisms will help to reveal traces of life on Earth and beyond. Among contemporary environments, iron-depositing hot springs (IDHS) may represent one of the most appropriate natural models [1] for insights into ancient life since organisms may have originated on Earth and probably Mars in association with hydrothermal activity [2,3]. IDHS also seem to be appropriate models for studying certain biogeochemical processes that could have taken place in the late Archean and,-or early Paleoproterozoic eras [4, 5]. It has been suggested that inorganic polyphosphate (PPi), in chains of tens to hundreds of phosphate residues linked by high-energy bonds, is environmentally ubiquitous and abundant [6]. Cyanobacteria (CB) react to increased heavy metal concentrations and UV by enhanced generation of PPi bodies (PPB) [7], which are believed to be signatures of life [8]. However, the role of PPi in oxygenic prokaryotes for the suppression of oxidative stress induced by high Fe is poorly studied. Here we present preliminary results of a new mechanism of Fe mineralization in oxygenic prokaryotes, the effect of Fe on the generation of PPi bodies in CB, as well as preliminary analysis of the diversity and phylogeny of proteins involved in the prevention of oxidative stress in phototrophs inhabiting IDHS.

  20. Structure and function of the mammalian middle ear. II: Inferring function from structure.

    PubMed

    Mason, Matthew J

    2016-02-01

    Anatomists and zoologists who study middle ear morphology are often interested to know what the structure of an ear can reveal about the auditory acuity and hearing range of the animal in question. This paper represents an introduction to middle ear function targetted towards biological scientists with little experience in the field of auditory acoustics. Simple models of impedance matching are first described, based on the familiar concepts of the area and lever ratios of the middle ear. However, using the Mongolian gerbil Meriones unguiculatus as a test case, it is shown that the predictions made by such 'ideal transformer' models are generally not consistent with measurements derived from recent experimental studies. Electrical analogue models represent a better way to understand some of the complex, frequency-dependent responses of the middle ear: these have been used to model the effects of middle ear subcavities, and the possible function of the auditory ossicles as a transmission line. The concepts behind such models are explained here, again aimed at those with little background knowledge. Functional inferences based on middle ear anatomy are more likely to be valid at low frequencies. Acoustic impedance at low frequencies is dominated by compliance; expanded middle ear cavities, found in small desert mammals including gerbils, jerboas and the sengi Macroscelides, are expected to improve low-frequency sound transmission, as long as the ossicular system is not too stiff. PMID:26100915

  1. King penguin demography since the last glaciation inferred from genome-wide data

    PubMed Central

    Trucchi, Emiliano; Gratton, Paolo; Whittington, Jason D.; Cristofari, Robin; Le Maho, Yvon; Stenseth, Nils Chr; Le Bohec, Céline

    2014-01-01

    How natural climate cycles, such as past glacial/interglacial patterns, have shaped species distributions at the high-latitude regions of the Southern Hemisphere is still largely unclear. Here, we show how the post-glacial warming following the Last Glacial Maximum (ca 18 000 years ago), allowed the (re)colonization of the fragmented sub-Antarctic habitat by an upper-level marine predator, the king penguin Aptenodytes patagonicus. Using restriction site-associated DNA sequencing and standard mitochondrial data, we tested the behaviour of subsets of anonymous nuclear loci in inferring past demography through coalescent-based and allele frequency spectrum analyses. Our results show that the king penguin population breeding on Crozet archipelago steeply increased in size, closely following the Holocene warming recorded in the Epica Dome C ice core. The following population growth can be explained by a threshold model in which the ecological requirements of this species (year-round ice-free habitat for breeding and access to a major source of food such as the Antarctic Polar Front) were met on Crozet soon after the Pleistocene/Holocene climatic transition. PMID:24920481

  2. Evolutionary landscape of amphibians emerging from ancient freshwater fish inferred from complete mitochondrial genomes.

    PubMed

    Wang, Xiao-Tong; Zhang, Yan-Feng; Wu, Qian; Zhang, Hao

    2012-05-01

    It is very interesting that the only extant marine amphibian is the marine frog, Fejervarya cancrivora. This study investigated the reasons for this apparent rarity by conducting a phylogenetic tree analysis of the complete mitochondrial genomes from 14 amphibians, 67 freshwater fishes, four migratory fishes, 35 saltwater fishes, and one hemichordate. The results showed that amphibians, living fossil fishes, and the common ancestors of modern fishes are phylogenetically separated. In general, amphibians, living fossil fishes, saltwater fishes, and freshwater fishes are clustered in different clades. This suggests that the ancestor of living amphibians arose from a type of primordial freshwater fish, rather than the coelacanth, lungfish, or modern saltwater fish. Modern freshwater fish and modern saltwater fish were probably separated from a common ancestor by a single event, caused by crustal movement. PMID:22503684

  3. Conflicting genomic signals affect phylogenetic inference in four species of North American pines.

    PubMed

    Koralewski, Tomasz E; Mateos, Mariana; Krutovsky, Konstantin V

    2016-01-01

    Adaptive evolutionary processes in plants may be accompanied by episodes of introgression, parallel evolution and incomplete lineage sorting that pose challenges in untangling species evolutionary history. Genus Pinus (pines) is one of the most abundant and most studied groups among gymnosperms, and a good example of a lineage where these phenomena have been observed. Pines are among the most ecologically and economically important plant species. Some, such as the pines of the southeastern USA (southern pines in subsection Australes), are subjects of intensive breeding programmes. Despite numerous published studies, the evolutionary history of Australes remains ambiguous and often controversial. We studied the phylogeny of four major southern pine species: shortleaf (Pinus echinata), slash (P. elliottii), longleaf (P. palustris) and loblolly (P. taeda), using sequences from 11 nuclear loci and maximum likelihood and Bayesian methods. Our analysis encountered resolution difficulties similar to earlier published studies. Although incomplete lineage sorting and introgression are two phenomena presumptively underlying our results, the phylogenetic inferences seem to be also influenced by the genes examined, with certain topologies supported by sets of genes sharing common putative functionalities. For example, genes involved in wood formation supported the clade echinata-taeda, genes linked to plant defence supported the clade echinata-elliottii and genes linked to water management properties supported the clade echinata-palustris The support for these clades was very high and consistent across methods. We discuss the potential factors that could underlie these observations, including incomplete lineage sorting, hybridization and parallel or adaptive evolution. Our results likely reflect the relatively short evolutionary history of the subsection that is thought to have begun during the middle Miocene and has been influenced by climate fluctuations. PMID:27060161

  4. Conflicting genomic signals affect phylogenetic inference in four species of North American pines

    PubMed Central

    Koralewski, Tomasz E.; Mateos, Mariana; Krutovsky, Konstantin V.

    2016-01-01

    Adaptive evolutionary processes in plants may be accompanied by episodes of introgression, parallel evolution and incomplete lineage sorting that pose challenges in untangling species evolutionary history. Genus Pinus (pines) is one of the most abundant and most studied groups among gymnosperms, and a good example of a lineage where these phenomena have been observed. Pines are among the most ecologically and economically important plant species. Some, such as the pines of the southeastern USA (southern pines in subsection Australes), are subjects of intensive breeding programmes. Despite numerous published studies, the evolutionary history of Australes remains ambiguous and often controversial. We studied the phylogeny of four major southern pine species: shortleaf (Pinus echinata), slash (P. elliottii), longleaf (P. palustris) and loblolly (P. taeda), using sequences from 11 nuclear loci and maximum likelihood and Bayesian methods. Our analysis encountered resolution difficulties similar to earlier published studies. Although incomplete lineage sorting and introgression are two phenomena presumptively underlying our results, the phylogenetic inferences seem to be also influenced by the genes examined, with certain topologies supported by sets of genes sharing common putative functionalities. For example, genes involved in wood formation supported the clade echinata–taeda, genes linked to plant defence supported the clade echinata–elliottii and genes linked to water management properties supported the clade echinata–palustris. The support for these clades was very high and consistent across methods. We discuss the potential factors that could underlie these observations, including incomplete lineage sorting, hybridization and parallel or adaptive evolution. Our results likely reflect the relatively short evolutionary history of the subsection that is thought to have begun during the middle Miocene and has been influenced by climate fluctuations. PMID

  5. Accurate Inference of Subtle Population Structure (and Other Genetic Discontinuities) Using Principal Coordinates

    PubMed Central

    Reeves, Patrick A.; Richards, Christopher M.

    2009-01-01

    Background Accurate inference of genetic discontinuities between populations is an essential component of intraspecific biodiversity and evolution studies, as well as associative genetics. The most widely-used methods to infer population structure are model-based, Bayesian MCMC procedures that minimize Hardy-Weinberg and linkage disequilibrium within subpopulations. These methods are useful, but suffer from large computational requirements and a dependence on modeling assumptions that may not be met in real data sets. Here we describe the development of a new approach, PCO-MC, which couples principal coordinate analysis to a clustering procedure for the inference of population structure from multilocus genotype data. Methodology/Principal Findings PCO-MC uses data from all principal coordinate axes simultaneously to calculate a multidimensional “density landscape”, from which the number of subpopulations, and the membership within subpopulations, is determined using a valley-seeking algorithm. Using extensive simulations, we show that this approach outperforms a Bayesian MCMC procedure when many loci (e.g. 100) are sampled, but that the Bayesian procedure is marginally superior with few loci (e.g. 10). When presented with sufficient data, PCO-MC accurately delineated subpopulations with population Fst values as low as 0.03 (G'st>0.2), whereas the limit of resolution of the Bayesian approach was Fst = 0.05 (G'st>0.35). Conclusions/Significance We draw a distinction between population structure inference for describing biodiversity as opposed to Type I error control in associative genetics. We suggest that discrete assignments, like those produced by PCO-MC, are appropriate for circumscribing units of biodiversity whereas expression of population structure as a continuous variable is more useful for case-control correction in structured association studies. PMID:19172174

  6. Using evolutionary sequence variation to make inferences about protein structure and function

    NASA Astrophysics Data System (ADS)

    Colwell, Lucy

    2015-03-01

    The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. The explosive growth in the number of available protein sequences raises the possibility of using the natural variation present in homologous protein sequences to infer these constraints and thus identify residues that control different protein phenotypes. Because in many cases phenotypic changes are controlled by more than one amino acid, the mutations that separate one phenotype from another may not be independent, requiring us to understand the correlation structure of the data. To address this we build a maximum entropy probability model for the protein sequence. The parameters of the inferred model are constrained by the statistics of a large sequence alignment. Pairs of sequence positions with the strongest interactions accurately predict contacts in protein tertiary structure, enabling all atom structural models to be constructed. We describe development of a theoretical inference framework that enables the relationship between the amount of available input data and the reliability of structural predictions to be better understood.

  7. Morphological homoplasy, life history evolution, and historical biogeography of plethodontid salamanders inferred from complete mitochondrial genomes

    SciTech Connect

    Mueller, Rachel Lockridge; Macey, J. Robert; Jaekel, Martin; Wake, David B.; Boore, Jeffrey L.

    2004-08-01

    The evolutionary history of the largest salamander family (Plethodontidae) is characterized by extreme morphological homoplasy. Analysis of the mechanisms generating such homoplasy requires an independent, molecular phylogeny. To this end, we sequenced 24 complete mitochondrial genomes (22 plethodontids and two outgroup taxa), added data for three species from GenBank, and performed partitioned and unpartitioned Bayesian, ML, and MP phylogenetic analyses. We explored four dataset partitioning strategies to account for evolutionary process heterogeneity among genes and codon positions, all of which yielded increased model likelihoods and decreased numbers of supported nodes in the topologies (PP > 0.95) relative to the unpartitioned analysis. Our phylogenetic analyses yielded congruent trees that contrast with the traditional morphology-based taxonomy; the monophyly of three out of four major groups is rejected. Reanalysis of current hypotheses in light of these new evolutionary relationships suggests that (1) a larval life history stage re-evolved from a direct-developing ancestor multiple times, (2) there is no phylogenetic support for the ''Out of Appalachia'' hypothesis of plethodontid origins, and (3) novel scenarios must be reconstructed for the convergent evolution of projectile tongues, reduction in toe number, and specialization for defensive tail loss. Some of these novel scenarios imply morphological transformation series that proceed in the opposite direction than was previously thought. In addition, they suggest surprising evolutionary lability in traits previously interpreted to be conservative.

  8. Origins of the Moken Sea Gypsies inferred from mitochondrial hypervariable region and whole genome sequences.

    PubMed

    Dancause, Kelsey Needham; Chan, Chim W; Arunotai, Narumon Hinshiranan; Lum, J Koji

    2009-02-01

    The origins of the Moken 'Sea Gypsies,' a group of traditionally boat-dwelling nomadic foragers, remain speculative despite previous examinations from linguistic, sociocultural and genetic perspectives. We explored Moken origin(s) and affinities by comparing whole mitochondrial genome and hypervariable segment I sequences from 12 Moken individuals, sampled from four islands of the Mergui Archipelago, to other mainland Asian, Island Southeast Asian (ISEA) and Oceanic populations. These analyses revealed a major (11/12) and a minor (1/12) haplotype in the population, indicating low mitochondrial diversity likely resulting from historically low population sizes, isolation and consequent genetic drift. Phylogenetic analyses revealed close relationships between the major lineage (MKN1) and ISEA, mainland Asian and aboriginal Malay populations, and of the minor lineage (MKN2) to populations from ISEA. MKN1 belongs to a recently defined subclade of the ancient yet localized M21 haplogroup. MKN2 is not closely related to any previously sampled lineages, but has been tentatively assigned to the basal M46 haplogroup that possibly originated among the original inhabitants of ISEA. Our analyses suggest that MKN1 originated within coastal mainland SEA and dispersed into ISEA and rapidly into the Mergui Archipelago within the past few thousand years as a result of climate change induced population pressure. PMID:19158811

  9. Child Development and Structural Variation in the Human Genome

    ERIC Educational Resources Information Center

    Zhang, Ying; Haraksingh, Rajini; Grubert, Fabian; Abyzov, Alexej; Gerstein, Mark; Weissman, Sherman; Urban, Alexander E.

    2013-01-01

    Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretches of DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural variation has been shown to be far more common in human genomes than previously thought. Very little is currently known about the effects…

  10. Inferring regulatory elements from a whole genome. An analysis of Helicobacter pylori sigma(80) family of promoter signals.

    PubMed

    Vanet, A; Marsan, L; Labigne, A; Sagot, M F

    2000-03-24

    Helicobacter pylori is adapted to life in a unique niche, the gastric epithelium of primates. Its promoters may therefore be different from those of other bacteria. Here, we determine motifs possibly involved in the recognition of such promoter sequences by the RNA polymerase using a new motif identification method. An important feature of this method is that the motifs are sought with the least possible assumptions about what they may look like. The method starts by considering the whole genome of H. pylori and attempts to infer directly from it a description for a family of promoters. Thus, this approach differs from searching for such promoters with a previously established description. The two algorithms are based on the idea of inferring motifs by flexibly comparing words in the sequences with an external object, instead of between themselves. The first algorithm infers single motifs, the second a combination of two motifs separated from one another by strictly defined, sterically constrained distances. Besides independently finding motifs known to be present in other bacteria, such as the Shine-Dalgarno sequence and the TATA-box, this approach suggests the existence in H. pylori of a new, combined motif, TTAAGC, followed optimally 21 bp downstream by TATAAT. Between these two motifs, there is in some cases another, TTTTAA or, less frequently, a repetition of TTAAGC separated optimally from the TATA-box by 12 bp. The combined motif TTAAGCx(21+/-2)TATAAT is present with no errors immediately upstream from the only two copies of the ribosomal 23 S-5 S RNA genes in H. pylori, and with one error upstream from the only two copies of the ribosomal 16 S RNA genes. The operons of both ribosomal RNA molecules are strongly expressed, representing an encouraging sign of the pertinence of the motifs found by the algorithms. In 25 cases out of a possible 30, the combined motif is found with no more than three substitutions immediately upstream from ribosomal proteins, or

  11. Phylogeny and genetic history of the Siberian salamander (Salamandrella keyserlingii, Dybowski, 1870) inferred from complete mitochondrial genomes.

    PubMed

    Malyarchuk, Boris; Derenko, Miroslava; Denisova, Galina

    2013-05-01

    We assessed phylogeny of the Siberian salamander (Salamandrella keyserlingii, Dybowski, 1870), the most northern ectothermic, terrestrial vertebrate in Eurasia, by sequence analysis of complete mitochondrial genomes in 26 specimens from different localities (China, Khabarovsk region, Sakhalin, Yakutia, Magadan region, Chukotka, Kamchatka, Ural, European part of Russia). In addition, a complete mitochondrial genome of the Schrenck salamander, Salamandrella schrenckii, was determined for the first time. Bayesian phylogenetic analysis of the entire mtDNA genomes of S. keyserlingii demonstrates that two haplotype clades, AB and C, radiated about 1.4 million years ago (Mya). Bayesian skyline plots of population size change through time show an expansion around 250 thousand years ago (kya) and then a decline around the Last Glacial Maximum (25 kya) with subsequent restoration of population size. Climatic changes during the Quaternary period have dramatically affected the population genetic structure of the Siberian salamanders. In addition, complete mtDNA sequence analysis allowed us to recognize that the vast area of Northern Eurasia was colonized only by the Siberian salamander clade C1b during the last 150 kya. Meanwhile, we were unable to find evidence of molecular adaptation in this clade by analyzing the whole mitochondrial genomes of the Siberian salamanders. PMID:23415986

  12. Super learning to hedge against incorrect inference from arbitrary parametric assumptions in marginal structural modeling

    PubMed Central

    Neugebauer, Romain; Fireman, Bruce; Roy, Jason A.; Raebel, Marsha A.; Nichols, Gregory A.; O’Connor, Patrick J.

    2013-01-01

    Objective Clinical trials are unlikely to ever be launched for many Comparative Effectiveness Research (CER) questions. Inferences from hypothetical randomized trials may however be emulated with marginal structural modeling (MSM) using observational data but success in adjusting for time-dependent confounding and selection bias typically relies on parametric modeling assumptions. If these assumptions are violated, inferences from MSM may be inaccurate. In this article, we motivate the application of a data-adaptive estimation approach called Super Learning to avoid reliance on arbitrary parametric assumptions in CER. Study Design and Setting Using the electronic health records data from adults with new onset type 2 diabetes, we implemented MSM with inverse probability weighting estimation to evaluate the effect of three oral anti-diabetic therapies on the worsening of glomerular filtration rate. Results Inferences from IPW estimation were noticeably sensitive to the parametric assumptions about the associations between both the exposure and censoring processes and the main suspected source of confounding, i.e., time-dependent measurements of hemoglobin A1c. Super Learning was successfully implemented to harness flexible confounding and selection bias adjustment from existing machine learning algorithms. Conclusion Erroneous IPW inference about clinical effectiveness due to arbitrary and incorrect modeling decisions may be avoided with Super Learning. PMID:23849160

  13. Phylogenetic Diversity of the Enteric Pathogen Salmonella enterica subsp. enterica Inferred from Genome-Wide Reference-Free SNP Characters

    PubMed Central

    Timme, Ruth E.; Pettengill, James B.; Allard, Marc W.; Strain, Errol; Barrangou, Rodolphe; Wehnes, Chris; Van Kessel, JoAnn S.; Karns, Jeffrey S.; Musser, Steven M.; Brown, Eric W.

    2013-01-01

    The enteric pathogen Salmonella enterica is one of the leading causes of foodborne illness in the world. The species is extremely diverse, containing more than 2,500 named serovars that are designated for their unique antigen characters and pathogenicity profiles—some are known to be virulent pathogens, while others are not. Questions regarding the evolution of pathogenicity, significance of antigen characters, diversity of clustered regularly interspaced short palindromic repeat (CRISPR) loci, among others, will remain elusive until a strong evolutionary framework is established. We present the first large-scale S. enterica subsp. enterica phylogeny inferred from a new reference-free k-mer approach of gathering single nucleotide polymorphisms (SNPs) from whole genomes. The phylogeny of 156 isolates representing 78 serovars (102 were newly sequenced) reveals two major lineages, each with many strongly supported sublineages. One of these lineages is the S. Typhi group; well nested within the phylogeny. Lineage-through-time analyses suggest there have been two instances of accelerated rates of diversification within the subspecies. We also found that antigen characters and CRISPR loci reveal different evolutionary patterns than that of the phylogeny, suggesting that a horizontal gene transfer or possibly a shared environmental acquisition might have influenced the present character distribution. Our study also shows the ability to extract reference-free SNPs from a large set of genomes and then to use these SNPs for phylogenetic reconstruction. This automated, annotation-free approach is an important step forward for bacterial disease tracking and in efficiently elucidating the evolutionary history of highly clonal organisms. PMID:24158624

  14. Phylogeny of the Sphaerotilus-Leptothrix group inferred from morphological comparisons, genomic fingerprinting, and 16S ribosomal DNA sequence analyses.

    PubMed

    Siering, P L; Ghiorse, W C

    1996-01-01

    Phase-contrast light microscopy revealed that only one of eight cultivated strains belonging to the Sphaerotilus-Leptothrix group of sheathed bacteria actually produced a sheath in standard growth media. Two Sphaerotilus natans strains produced branched cells, but other morphological characteristics that were used to identify these bacteria were consistent with previously published descriptions. Genomic fingerprints, which were obtained by performing PCR amplification with primers corresponding to enterobacterial repetitive intergenic consensus sequences, were useful for distinguishing between the genera Sphaerotilus and Leptothrix, as well as among individual strains. The complete 16S ribosomal DNA (rDNA) sequences of two strains of "Leptothrix discophora" (strains SP-6 and SS-1) were determined. In addition, partial sequences (approximately 300 nucleotides) of one strain of Leptothrix cholodnii (strain LMG 7171), an unidentified Leptothrix strain (strain NC-1), and four strains of Sphaerotilus natans (strains ATCC 13338T [T = type strain], ATCC 15291, ATCC 29329, and ATCC 29330) were determined. We found that two of the S. natans strains (ATCC 15291 and ATCC 13338T), which differed in morphology and in their genomic fingerprints, had identical sequences in the 300-nucleotide region sequenced. Both parsimony and distance matrix methods were used to infer the evolutionary relationships of the eight strains in a comparison of the 16S rDNA sequences of these organisms with 16S rDNA sequences obtained from ribosomal sequence databases. All of the strains clustered in the Rubrivivax subdivision of the beta subclass of the Proteobacteria, which confirmed previously published conclusions concerning selected individual strains. Additional analyses revealed that all of the S. natans strains clustered in one closely related group, while the Leptothrix strains clustered in two separate lineages that were approximately equidistant from the S. natans cluster. This finding

  15. Limitations to estimating bacterial cross-species transmission using genetic and genomic markers: inferences from simulation modeling

    PubMed Central

    Benavides, Julio A; Cross, Paul C; Luikart, Gordon; Creel, Scott

    2014-01-01

    Cross-species transmission (CST) of bacterial pathogens has major implications for human health, livestock, and wildlife management because it determines whether control actions in one species may have subsequent effects on other potential host species. The study of bacterial transmission has benefitted from methods measuring two types of genetic variation: variable number of tandem repeats (VNTRs) and single nucleotide polymorphisms (SNPs). However, it is unclear whether these data can distinguish between different epidemiological scenarios. We used a simulation model with two host species and known transmission rates (within and between species) to evaluate the utility of these markers for inferring CST. We found that CST estimates are biased for a wide range of parameters when based on VNTRs and a most parsimonious reconstructed phylogeny. However, estimations of CST rates lower than 5% can be achieved with relatively low bias using as low as 250 SNPs. CST estimates are sensitive to several parameters, including the number of mutations accumulated since introduction, stochasticity, the genetic difference of strains introduced, and the sampling effort. Our results suggest that, even with whole-genome sequences, unbiased estimates of CST will be difficult when sampling is limited, mutation rates are low, or for pathogens that were recently introduced. PMID:25469159

  16. Gorilla genome structural variation reveals evolutionary parallelisms with chimpanzee.

    PubMed

    Ventura, Mario; Catacchio, Claudia R; Alkan, Can; Marques-Bonet, Tomas; Sajjadian, Saba; Graves, Tina A; Hormozdiari, Fereydoun; Navarro, Arcadi; Malig, Maika; Baker, Carl; Lee, Choli; Turner, Emily H; Chen, Lin; Kidd, Jeffrey M; Archidiacono, Nicoletta; Shendure, Jay; Wilson, Richard K; Eichler, Evan E

    2011-10-01

    Structural variation has played an important role in the evolutionary restructuring of human and great ape genomes. Recent analyses have suggested that the genomes of chimpanzee and human have been particularly enriched for this form of genetic variation. Here, we set out to assess the extent of structural variation in the gorilla lineage by generating 10-fold genomic sequence coverage from a western lowland gorilla and integrating these data into a physical and cytogenetic framework of structural variation. We discovered and validated over 7665 structural changes within the gorilla lineage, including sequence resolution of inversions, deletions, duplications, and mobile element insertions. A comparison with human and other ape genomes shows that the gorilla genome has been subjected to the highest rate of segmental duplication. We show that both the gorilla and chimpanzee genomes have experienced independent yet convergent patterns of structural mutation that have not occurred in humans, including the formation of subtelomeric heterochromatic caps, the hyperexpansion of segmental duplications, and bursts of retroviral integrations. Our analysis suggests that the chimpanzee and gorilla genomes are structurally more derived than either orangutan or human genomes. PMID:21685127

  17. High-Throughput Computational and Experimental Techniques in Structural Genomics

    PubMed Central

    Chance, Mark R.; Fiser, Andras; Sali, Andrej; Pieper, Ursula; Eswar, Narayanan; Xu, Guiping; Fajardo, J. Eduardo; Radhakannan, Thirumuruhan; Marinkovic, Nebojsa

    2004-01-01

    Structural genomics has as its goal the provision of structural information for all possible ORF sequences through a combination of experimental and computational approaches. The access to genome sequences and cloning resources from an ever-widening array of organisms is driving high-throughput structural studies by the New York Structural Genomics Research Consortium. In this report, we outline the progress of the Consortium in establishing its pipeline for structural genomics, and some of the experimental and bioinformatics efforts leading to structural annotation of proteins. The Consortium has established a pipeline for structural biology studies, automated modeling of ORF sequences using solved (template) structures, and a novel high-throughput approach (metallomics) to examining the metal binding to purified protein targets. The Consortium has so far produced 493 purified proteins from >1077 expression vectors. A total of 95 have resulted in crystal structures, and 81 are deposited in the Protein Data Bank (PDB). Comparative modeling of these structures has generated >40,000 structural models. We also initiated a high-throughput metal analysis of the purified proteins; this has determined that 10%-15% of the targets contain a stoichiometric structural or catalytic transition metal atom. The progress of the structural genomics centers in the U.S. and around the world suggests that the goal of providing useful structural information on most all ORF domains will be realized. This projected resource will provide structural biology information important to understanding the function of most proteins of the cell. PMID:15489337

  18. Rare Variation Facilitates Inferences of Fine-Scale Population Structure in Humans

    PubMed Central

    O’Connor, Timothy D.; Fu, Wenqing; Mychaleckyj, Josyf C.; Logsdon, Benjamin; Auer, Paul; Carlson, Christopher S.; Leal, Suzanne M.; Smith, Joshua D.; Rieder, Mark J.; Bamshad, Michael J.; Nickerson, Deborah A.; Akey, Joshua M.

    2015-01-01

    Understanding the genetic structure of human populations has important implications for the design and interpretation of disease mapping studies and reconstructing human evolutionary history. To date, inferences of human population structure have primarily been made with common variants. However, recent large-scale resequencing studies have shown an abundance of rare variation in humans, which may be particularly useful for making inferences of fine-scale population structure. To this end, we used an information theory framework and extensive coalescent simulations to rigorously quantify the informativeness of rare and common variation to detect signatures of fine-scale population structure. We show that rare variation affords unique insights into patterns of recent population structure. Furthermore, to empirically assess our theoretical findings, we analyzed high-coverage exome sequences in 6,515 European and African American individuals. As predicted, rare variants are more informative than common polymorphisms in revealing a distinct cluster of European–American individuals, and subsequent analyses demonstrate that these individuals are likely of Ashkenazi Jewish ancestry. Our results provide new insights into the population structure using rare variation, which will be an important factor to account for in rare variant association studies. PMID:25415970

  19. Process-driven inference of biological network structure: feasibility, minimality, and multiplicity.

    PubMed

    Wang, Guanyu; Rong, Yongwu; Chen, Hao; Pearson, Carl; Du, Chenghang; Simha, Rahul; Zeng, Chen

    2012-01-01

    A common problem in molecular biology is to use experimental data, such as microarray data, to infer knowledge about the structure of interactions between important molecules in subsystems of the cell. By approximating the state of each molecule as "on" or "off", it becomes possible to simplify the problem, and exploit the tools of boolean analysis for such inference. Amongst boolean techniques, the process-driven approach has shown promise in being able to identify putative network structures, as well as stability and modularity properties. This paper examines the process-driven approach more formally, and makes four contributions about the computational complexity of the inference problem, under the "dominant inhibition" assumption of molecular interactions. The first is a proof that the feasibility problem (does there exist a network that explains the data?) can be solved in polynomial-time. Second, the minimality problem (what is the smallest network that explains the data?) is shown to be NP-hard, and therefore unlikely to result in a polynomial-time algorithm. Third, a simple polynomial-time heuristic is shown to produce near-minimal solutions, as demonstrated by simulation. Fourth, the theoretical framework explains how multiplicity (the number of network solutions to realize a given biological process), which can take exponential-time to compute, can instead be accurately estimated by a fast, polynomial-time heuristic. PMID:22815739

  20. Bayesian large-scale structure inference: initial conditions and the cosmic web

    NASA Astrophysics Data System (ADS)

    Leclercq, Florent; Wandelt, Benjamin

    2014-05-01

    We describe an innovative statistical approach for the ab initio simultaneous analysis of the formation history and morphology of the large-scale structure of the inhomogeneous Universe. Our algorithm explores the joint posterior distribution of the many millions of parameters involved via efficient Hamiltonian Markov Chain Monte Carlo sampling. We describe its application to the Sloan Digital Sky Survey data release 7 and an additional non-linear filtering step. We illustrate the use of our findings for cosmic web analysis: identification of structures via tidal shear analysis and inference of dark matter voids.

  1. Multiple genome alignment for identifying the core structure among moderately related microbial genomes

    PubMed Central

    Uchiyama, Ikuo

    2008-01-01

    Background Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult when more distantly related genomes are compared. Here, we consider the core structure as a set of sufficiently long segments in which gene orders are conserved so that they are likely to have been inherited mainly through vertical transfer, and developed a method for identifying the core structure by finding the order of pre-identified orthologous groups (OGs) that maximally retains the conserved gene orders. Results The method was applied to genome comparisons of two well-characterized families, Bacillaceae and Enterobacteriaceae, and identified their core structures comprising 1438 and 2125 OGs, respectively. The core sets contained most of the essential genes and their related genes, which were primarily included in the intersection of the two core sets comprising around 700 OGs. The definition of the genomic core based on gene order conservation was demonstrated to be more robust than the simpler approach based only on gene conservation. We also investigated the core structures in terms of G+C content homogeneity and phylogenetic congruence, and found that the core genes primarily exhibited the expected characteristic, i.e., being indigenous and sharing the same history, more than the non-core genes. Conclusion The results demonstrate that our strategy of genome alignment based on gene order conservation can provide an effective approach to identify the genomic core among moderately related microbial genomes. PMID:18976470

  2. Inferring Biological Structures from Super-Resolution Single Molecule Images Using Generative Models

    PubMed Central

    Maji, Suvrajit; Bruchez, Marcel P.

    2012-01-01

    Localization-based super resolution imaging is presently limited by sampling requirements for dynamic measurements of biological structures. Generating an image requires serial acquisition of individual molecular positions at sufficient density to define a biological structure, increasing the acquisition time. Efficient analysis of biological structures from sparse localization data could substantially improve the dynamic imaging capabilities of these methods. Using a feature extraction technique called the Hough Transform simple biological structures are identified from both simulated and real localization data. We demonstrate that these generative models can efficiently infer biological structures in the data from far fewer localizations than are required for complete spatial sampling. Analysis at partial data densities revealed efficient recovery of clathrin vesicle size distributions and microtubule orientation angles with as little as 10% of the localization data. This approach significantly increases the temporal resolution for dynamic imaging and provides quantitatively useful biological information. PMID:22629348

  3. Population structure of Atlantic mackerel inferred from RAD-seq-derived SNP markers: effects of sequence clustering parameters and hierarchical SNP selection.

    PubMed

    Rodríguez-Ezpeleta, Naiara; Bradbury, Ian R; Mendibil, Iñaki; Álvarez, Paula; Cotano, Unai; Irigoien, Xabier

    2016-07-01

    Restriction-site-associated DNA sequencing (RAD-seq) and related methods are revolutionizing the field of population genomics in nonmodel organisms as they allow generating an unprecedented number of single nucleotide polymorphisms (SNPs) even when no genomic information is available. Yet, RAD-seq data analyses rely on assumptions on nature and number of nucleotide variants present in a single locus, the choice of which may lead to an under- or overestimated number of SNPs and/or to incorrectly called genotypes. Using the Atlantic mackerel (Scomber scombrus L.) and a close relative, the Atlantic chub mackerel (Scomber colias), as case study, here we explore the sensitivity of population structure inferences to two crucial aspects in RAD-seq data analysis: the maximum number of mismatches allowed to merge reads into a locus and the relatedness of the individuals used for genotype calling and SNP selection. Our study resolves the population structure of the Atlantic mackerel, but, most importantly, provides insights into the effects of alternative RAD-seq data analysis strategies on population structure inferences that are directly applicable to other species. PMID:26936210

  4. Genome Editing of Structural Variations: Modeling and Gene Correction.

    PubMed

    Park, Chul-Yong; Sung, Jin Jea; Kim, Dong-Wook

    2016-07-01

    The analysis of chromosomal structural variations (SVs), such as inversions and translocations, was made possible by the completion of the human genome project and the development of genome-wide sequencing technologies. SVs contribute to genetic diversity and evolution, although some SVs can cause diseases such as hemophilia A in humans. Genome engineering technology using programmable nucleases (e.g., ZFNs, TALENs, and CRISPR/Cas9) has been rapidly developed, enabling precise and efficient genome editing for SV research. Here, we review advances in modeling and gene correction of SVs, focusing on inversion, translocation, and nucleotide repeat expansion. PMID:27016031

  5. On the analysis of large-scale genomic structures.

    PubMed

    Oiwa, Nestor Norio; Goldman, Carla

    2005-01-01

    We apply methods from statistical physics (histograms, correlation functions, fractal dimensions, and singularity spectra) to characterize large-scale structure of the distribution of nucleotides along genomic sequences. We discuss the role of the extension of noncoding segments ("junk DNA") for the genomic organization, and the connection between the coding segment distribution and the high-eukaryotic chromatin condensation. The following sequences taken from GenBank were analyzed: complete genome of Xanthomonas campestri, complete genome of yeast, chromosome V of Caenorhabditis elegans, and human chromosome XVII around gene BRCA1. The results are compared with the random and periodic sequences and those generated by simple and generalized fractal Cantor sets. PMID:15858230

  6. Web-Based Arabidopsis Functional and Structural Genomics Resources

    PubMed Central

    Lu, Yan; Last, Robert L.

    2008-01-01

    As plant research moves to a “post-genomic” era, many diverse internet resources become available to the international research community. Arabidopsis thaliana, because of its small size, rapid life cycle and simple genome, has been a model system for decades, with much research funding and many projects devoted to creation of functional and structural genomics resources. Different types of data, including genome, transcriptome, proteome, phenome, metabolome and ionome are stored in these resources. In this chapter, a variety of genomics resources are introduced, with simple descriptions of how some can be accessed by laboratory researchers via the internet. PMID:22303243

  7. HSA: integrating multi-track Hi-C data for genome-scale reconstruction of 3D chromatin structure.

    PubMed

    Zou, Chenchen; Zhang, Yuping; Ouyang, Zhengqing

    2016-01-01

    Genome-wide 3C technologies (Hi-C) are being increasingly employed to study three-dimensional (3D) genome conformations. Existing computational approaches are unable to integrate accumulating data to facilitate studying 3D chromatin structure and function. We present HSA ( http://ouyanglab.jax.org/hsa/ ), a flexible tool that jointly analyzes multiple contact maps to infer 3D chromatin structure at the genome scale. HSA globally searches the latent structure underlying different cleavage footprints. Its robustness and accuracy outperform or rival existing tools on extensive simulations and orthogonal experiment validations. Applying HSA to recent in situ Hi-C data, we found the 3D chromatin structures are highly conserved across various human cell types. PMID:26936376

  8. Mediation Analysis With Intermediate Confounding: Structural Equation Modeling Viewed Through the Causal Inference Lens

    PubMed Central

    De Stavola, Bianca L.; Daniel, Rhian M.; Ploubidis, George B.; Micali, Nadia

    2015-01-01

    The study of mediation has a long tradition in the social sciences and a relatively more recent one in epidemiology. The first school is linked to path analysis and structural equation models (SEMs), while the second is related mostly to methods developed within the potential outcomes approach to causal inference. By giving model-free definitions of direct and indirect effects and clear assumptions for their identification, the latter school has formalized notions intuitively developed in the former and has greatly increased the flexibility of the models involved. However, through its predominant focus on nonparametric identification, the causal inference approach to effect decomposition via natural effects is limited to settings that exclude intermediate confounders. Such confounders are naturally dealt with (albeit with the caveats of informality and modeling inflexibility) in the SEM framework. Therefore, it seems pertinent to revisit SEMs with intermediate confounders, armed with the formal definitions and (parametric) identification assumptions from causal inference. Here we investigate: 1) how identification assumptions affect the specification of SEMs, 2) whether the more restrictive SEM assumptions can be relaxed, and 3) whether existing sensitivity analyses can be extended to this setting. Data from the Avon Longitudinal Study of Parents and Children (1990–2005) are used for illustration. PMID:25504026

  9. Phylogeny and biogeography of highly diverged freshwater fish species (Leuciscinae, Cyprinidae, Teleostei) inferred from mitochondrial genome analysis.

    PubMed

    Imoto, Junichi M; Saitoh, Kenji; Sasaki, Takeshi; Yonezawa, Takahiro; Adachi, Jun; Kartavtsev, Yuri P; Miya, Masaki; Nishida, Mutsumi; Hanzawa, Naoto

    2013-02-10

    The distribution of freshwater taxa is a good biogeographic model to study pattern and process of vicariance and dispersal. The subfamily Leuciscinae (Cyprinidae, Teleostei) consists of many species distributed widely in Eurasia and North America. Leuciscinae have been divided into two phyletic groups, leuciscin and phoxinin. The phylogenetic relationships between major clades within the subfamily are poorly understood, largely because of the overwhelming diversity of the group. The origin of the Far Eastern phoxinin is an interesting question regarding the evolutionary history of Leuciscinae. Here we present phylogenetic analysis of 31 species of Leuciscinae and outgroups based on complete mitochondrial genome sequences to clarify the phylogenetic relationships and to infer the evolutionary history of the subfamily. Phylogenetic analysis suggests that the Far Eastern phoxinin species comprised the monophyletic clades Tribolodon, Pseudaspius, Oreoleuciscus and Far Eastern Phoxinus. The Far Eastern phoxinin clade was independent of other Leuciscinae lineages and was closer to North American phoxinins than European leuciscins. All of our analysis also suggested that leuciscins and phoxinins each constituted monophyletic groups. Divergence time estimation suggested that Leuciscinae species diverged from outgroups such as Tincinae to be 83.3 million years ago (Mya) in the Late Cretaceous and leuciscin and phoxinin shared a common ancestor 70.7 Mya. Radiation of Leuciscinae lineages occurred during the Late Cretaceous to Paleocene. This period also witnessed the radiation of tetrapods. Reconstruction of ancestral areas indicates Leuciscinae species originated within Europe. Leuciscin species evolved in Europe and the ancestor of phoxinin was distributed in North America. The Far Eastern phoxinins would have dispersed from North America to Far East across the Beringia land bridge. The present study suggests important roles for the continental rearrangements during the

  10. The Genomic Threading Database: a comprehensive resource for structural annotations of the genomes from key organisms.

    PubMed

    McGuffin, Liam J; Street, Stefano A; Bryson, Kevin; Sørensen, Søren-Aksel; Jones, David T

    2004-01-01

    Currently, the Genomic Threading Database (GTD) contains structural assignments for the proteins encoded within the genomes of nine eukaryotes and 101 prokaryotes. Structural annotations are carried out using a modified version of GenTHREADER, a reliable fold recognition method. The Gen THREADER annotation jobs are distributed across multiple clusters of processors using grid technology and the predictions are deposited in a relational database accessible via a web interface at http://bioinf.cs.ucl.ac.uk/GTD. Using this system, up to 84% of proteins encoded within a genome can be confidently assigned to known folds with 72% of the residues aligned. On average in the GTD, 64% of proteins encoded within a genome are confidently assigned to known folds and 58% of the residues are aligned to structures. PMID:14681393

  11. Inferring a District-Based Hierarchical Structure of Social Contacts from Census Data

    PubMed Central

    Yu, Zhiwen; Liu, Jiming; Zhu, Xianjun

    2015-01-01

    Researchers have recently paid attention to social contact patterns among individuals due to their useful applications in such areas as epidemic evaluation and control, public health decisions, chronic disease research and social network research. Although some studies have estimated social contact patterns from social networks and surveys, few have considered how to infer the hierarchical structure of social contacts directly from census data. In this paper, we focus on inferring an individual’s social contact patterns from detailed census data, and generate various types of social contact patterns such as hierarchical-district-structure-based, cross-district and age-district-based patterns. We evaluate newly generated contact patterns derived from detailed 2011 Hong Kong census data by incorporating them into a model and simulation of the 2009 Hong Kong H1N1 epidemic. We then compare the newly generated social contact patterns with the mixing patterns that are often used in the literature, and draw the following conclusions. First, the generation of social contact patterns based on a hierarchical district structure allows for simulations at different district levels. Second, the newly generated social contact patterns reflect individuals social contacts. Third, the newly generated social contact patterns improve the accuracy of the SEIR-based epidemic model. PMID:25679787

  12. Structural Genomics of Minimal Organisms: Pipeline and Results

    SciTech Connect

    Kim, Sung-Hou; Shin, Dong-Hae; Kim, Rosalind; Adams, Paul; Chandonia, John-Marc

    2007-09-14

    The initial objective of the Berkeley Structural Genomics Center was to obtain a near complete three-dimensional (3D) structural information of all soluble proteins of two minimal organisms, closely related pathogens Mycoplasma genitalium and M. pneumoniae. The former has fewer than 500 genes and the latter has fewer than 700 genes. A semiautomated structural genomics pipeline was set up from target selection, cloning, expression, purification, and ultimately structural determination. At the time of this writing, structural information of more than 93percent of all soluble proteins of M. genitalium is avail able. This chapter summarizes the approaches taken by the authors' center.

  13. Inferring the Clonal Structure of Viral Populations from Time Series Sequencing

    PubMed Central

    Chedom, Donatien F.; Murcia, Pablo R.; Greenman, Chris D.

    2015-01-01

    RNA virus populations will undergo processes of mutation and selection resulting in a mixed population of viral particles. High throughput sequencing of a viral population subsequently contains a mixed signal of the underlying clones. We would like to identify the underlying evolutionary structures. We utilize two sources of information to attempt this; within segment linkage information, and mutation prevalence. We demonstrate that clone haplotypes, their prevalence, and maximum parsimony reticulate evolutionary structures can be identified, although the solutions may not be unique, even for complete sets of information. This is applied to a chain of influenza infection, where we infer evolutionary structures, including reassortment, and demonstrate some of the difficulties of interpretation that arise from deep sequencing due to artifacts such as template switching during PCR amplification. PMID:26571026

  14. Effects of vegetation canopy structure on remotely sensed canopy temperatures. [inferring plant water stress and yield

    NASA Technical Reports Server (NTRS)

    Kimes, D. S.

    1979-01-01

    The effects of vegetation canopy structure on thermal infrared sensor response must be understood before vegetation surface temperatures of canopies with low percent ground cover can be accurately inferred. The response of a sensor is a function of vegetation geometric structure, the vertical surface temperature distribution of the canopy components, and sensor view angle. Large deviations between the nadir sensor effective radiant temperature (ERT) and vegetation ERT for a soybean canopy were observed throughout the growing season. The nadir sensor ERT of a soybean canopy with 35 percent ground cover deviated from the vegetation ERT by as much as 11 C during the mid-day. These deviations were quantitatively explained as a function of canopy structure and soil temperature. Remote sensing techniques which determine the vegetation canopy temperature(s) from the sensor response need to be studied.

  15. Hebbian Wiring Plasticity Generates Efficient Network Structures for Robust Inference with Synaptic Weight Plasticity

    PubMed Central

    Hiratani, Naoki; Fukai, Tomoki

    2016-01-01

    In the adult mammalian cortex, a small fraction of spines are created and eliminated every day, and the resultant synaptic connection structure is highly nonrandom, even in local circuits. However, it remains unknown whether a particular synaptic connection structure is functionally advantageous in local circuits, and why creation and elimination of synaptic connections is necessary in addition to rich synaptic weight plasticity. To answer these questions, we studied an inference task model through theoretical and numerical analyses. We demonstrate that a robustly beneficial network structure naturally emerges by combining Hebbian-type synaptic weight plasticity and wiring plasticity. Especially in a sparsely connected network, wiring plasticity achieves reliable computation by enabling efficient information transmission. Furthermore, the proposed rule reproduces experimental observed correlation between spine dynamics and task performance. PMID:27303271

  16. Genomic structure of the human caldesmon gene.

    PubMed Central

    Hayashi, K; Yano, H; Hashida, T; Takeuchi, R; Takeda, O; Asada, K; Takahashi, E; Kato, I; Sobue, K

    1992-01-01

    The high molecular weight caldesmon (h-CaD) is predominantly expressed in smooth muscles, whereas the low molecular weight caldesmon (l-CaD) is widely distributed in nonmuscle tissues and cells. The changes in CaD isoform expression are closely correlated with the phenotypic modulation of smooth muscle cells. During a search for isoform diversity of human CaDs, l-CaD cDNAs were cloned from HeLa S3 cells. HeLa l-CaD I is composed of 558 amino acids, whereas 26 amino acids (residues 202-227 for HeLa l-CaD I) are deleted in HeLa l-CaD II. The short amino-terminal sequence of HeLa l-CaDs is different from that of fibroblast (WI-38) l-CaD II and human aorta h-CaD. We have also identified WI-38 l-CaD I, which contains a 26-amino acid insertion relative to WI-38 l-CaD II. To reveal the molecular events of the expressional regulation of the CaD isoforms, the genomic structure of the human CaD gene was determined. The human CaD gene is composed of 14 exons and was mapped to a single locus, 7q33-q34. The 26-amino acid insertion is encoded in exon 4 and is specifically spliced in the mRNAs for both h-CaD and l-CaDs I. Exon 3 is the exon that encodes the central repeating domain specific to h-CaD (residues 208-436) together with the common domain in all CaD (residues 73-207 for h-CaD and WI-38 l-CaDs, and residues 68-201 for HeLa l-CaDs). The regulation of h- and l-CaD expression is thought to depend on selection of the two 5' splice sites within exon 3. Thus, the change in expression between l-CaD and h-CaD might be caused by this splicing pathway. Images PMID:1465449

  17. Structural genomics-impact on biomedicine and drug discovery.

    PubMed

    Weigelt, Johan

    2010-05-01

    The field of structural genomics emerged as one of many 'omics disciplines more than a decade ago, and a multitude of large scale initiatives have been launched across the world. Development and implementation of methods for high-throughput structural biology represents a common denominator among different structural genomics programs. From another perspective a distinction between "biology-driven" versus "structure-driven" approaches can be made. This review outlines the general themes of structural genomics, its achievements and its impact on biomedicine and drug discovery. The growing number of high resolution structures of known and potential drug target proteins is expected to have tremendous value for future drug discovery programs. Moreover, the availability of large numbers of purified proteins enables generation of tool reagents, such as chemical probes and antibodies, to further explore protein function in the cell. PMID:20211166

  18. Strawberry Part 3 - structural and functional genomics

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The area of strawberry genomics is rapidly changing because of the burgeoning interest in, and need for, reference plants for the Rosaceae family, which contains many important fruit, nut, ornamental and wood crops, including peach, apple, almond, rose and cherry. This chapter describes the current...

  19. Protein surface analysis for function annotation in high-throughput structural genomics pipeline

    PubMed Central

    Binkowski, T. Andrew; Joachimiak, Andrzej; Liang, Jie

    2005-01-01

    Structural genomics (SG) initiatives are expanding the universe of protein fold space by rapidly determining structures of proteins that were intentionally selected on the basis of low sequence similarity to proteins of known structure. Often these proteins have no associated biochemical or cellular functions. The SG success has resulted in an accelerated deposition of novel structures. In some cases the structural bioinformatics analysis applied to these novel structures has provided specific functional assignment. However, this approach has also uncovered limitations in the functional analysis of uncharacterized proteins using traditional sequence and backbone structure methodologies. A novel method, named pvSOAR (pocket and void Surface of Amino Acid Residues), of comparing the protein surfaces of geometrically defined pockets and voids was developed. pvSOAR was able to detect previously unrecognized and novel functional relationships between surface features of proteins. In this study, pvSOAR is applied to several structural genomics proteins. We examined the surfaces of YecM, BioH, and RpiB from Escherichia coli as well as the CBS domains from inosine-5′-monosphate dehydrogenase from Streptococcus pyogenes, conserved hypothetical protein Ta549 from Thermoplasm acidophilum, and CBS domain protein mt1622 from Methanobacterium thermoautotrophicum with the goal to infer information about their biochemical function. PMID:16322579

  20. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies.

    PubMed Central

    Falush, Daniel; Stephens, Matthew; Pritchard, Jonathan K

    2003-01-01

    We describe extensions to the method of Pritchard et al. for inferring population structure from multilocus genotype data. Most importantly, we develop methods that allow for linkage between loci. The new model accounts for the correlations between linked loci that arise in admixed populations ("admixture linkage disequilibium"). This modification has several advantages, allowing (1) detection of admixture events farther back into the past, (2) inference of the population of origin of chromosomal regions, and (3) more accurate estimates of statistical uncertainty when linked loci are used. It is also of potential use for admixture mapping. In addition, we describe a new prior model for the allele frequencies within each population, which allows identification of subtle population subdivisions that were not detectable using the existing method. We present results applying the new methods to study admixture in African-Americans, recombination in Helicobacter pylori, and drift in populations of Drosophila melanogaster. The methods are implemented in a program, structure, version 2.0, which is available at http://pritch.bsd.uchicago.edu. PMID:12930761

  1. PSI-2: Structural Genomics to Cover Protein Domain Family Space

    PubMed Central

    Dessailly, Benoît H.; Nair, Rajesh; Jaroszewski, Lukasz; Fajardo, J. Eduardo; Kouranov, Andrei; Lee, David; Fiser, Andras; Godzik, Adam; Rost, Burkhard; Orengo, Christine

    2010-01-01

    Summary One major objective of structural genomics efforts, including the NIH-funded Protein Structure Initiative (PSI), has been to increase the structural coverage of protein sequence space. Here, we present the target selection strategy used during the second phase of PSI (PSI-2). This strategy, jointly devised by the bioinformatics groups associated with the PSI-2 large-scale production centres, targets representatives from large, structurally uncharacterised protein domain families, and from structurally uncharacterised subfamilies in very large and diverse families with incomplete structural coverage. These very large families are extremely diverse both structurally and functionally, and are highly over-represented in known proteomes. On the basis of several metrics, we then discuss to what extent PSI-2, during its first three years, has increased the structural coverage of genomes, and contributed structural and functional novelty. Together, the results presented here suggest that PSI-2 is successfully meeting its objectives and provides useful insights into structural and functional space. PMID:19523904

  2. History of biological metal utilization inferred through phylogenomic analysis of protein structures

    PubMed Central

    Dupont, Christopher L.; Butcher, Andrew; Valas, Ruben E.; Bourne, Philip E.; Caetano-Anollés, Gustavo

    2010-01-01

    The fundamental chemistry of trace elements dictates the molecular speciation and reactivity both within cells and the environment at large. Using protein structure and comparative genomics, we elucidate several major influences this chemistry has had upon biology. All of life exhibits the same proteome size-dependent scaling for the number of metal-binding proteins within a proteome. This fundamental evolutionary constant shows that the selection of one element occurs at the exclusion of another, with the eschewal of Fe for Zn and Ca being a defining feature of eukaryotic proteomes. Early life lacked both the structures required to control intracellular metal concentrations and the metal-binding proteins that catalyze electron transport and redox transformations. The development of protein structures for metal homeostasis coincided with the emergence of metal-specific structures, which predominantly bound metals abundant in the Archean ocean. Potentially, this promoted the diversification of emerging lineages of Archaea and Bacteria through the establishment of biogeochemical cycles. In contrast, structures binding Cu and Zn evolved much later, providing further evidence that environmental availability influenced the selection of the elements. The late evolving Zn-binding proteins are fundamental to eukaryotic cellular biology, and Zn bioavailability may have been a limiting factor in eukaryotic evolution. The results presented here provide an evolutionary timeline based on genomic characteristics, and key hypotheses can be tested by alternative geochemical methods. PMID:20498051

  3. Genome-wide patterns of population structure and admixture in West Africans and African Americans.

    PubMed

    Bryc, Katarzyna; Auton, Adam; Nelson, Matthew R; Oksenberg, Jorge R; Hauser, Stephen L; Williams, Scott; Froment, Alain; Bodo, Jean-Marie; Wambebe, Charles; Tishkoff, Sarah A; Bustamante, Carlos D

    2010-01-12

    Quantifying patterns of population structure in Africans and African Americans illuminates the history of human populations and is critical for undertaking medical genomic studies on a global scale. To obtain a fine-scale genome-wide perspective of ancestry, we analyze Affymetrix GeneChip 500K genotype data from African Americans (n = 365) and individuals with ancestry from West Africa (n = 203 from 12 populations) and Europe (n = 400 from 42 countries). We find that population structure within the West African sample reflects primarily language and secondarily geographical distance, echoing the Bantu expansion. Among African Americans, analysis of genomic admixture by a principal component-based approach indicates that the median proportion of European ancestry is 18.5% (25th-75th percentiles: 11.6-27.7%), with very large variation among individuals. In the African-American sample as a whole, few autosomal regions showed exceptionally high or low mean African ancestry, but the X chromosome showed elevated levels of African ancestry, consistent with a sex-biased pattern of gene flow with an excess of European male and African female ancestry. We also find that genomic profiles of individual African Americans afford personalized ancestry reconstructions differentiating ancient vs. recent European and African ancestry. Finally, patterns of genetic similarity among inferred African segments of African-American genomes and genomes of contemporary African populations included in this study suggest African ancestry is most similar to non-Bantu Niger-Kordofanian-speaking populations, consistent with historical documents of the African Diaspora and trans-Atlantic slave trade. PMID:20080753

  4. Structural and Operational Complexity of the Geobacter Sulfurreducens Genome

    SciTech Connect

    Qiu, Yu; Cho, Byung-Kwan; Park, Young S.; Lovley, Derek R.; Palsson, Bernhard O.; Zengler, Karsten

    2010-06-30

    Prokaryotic genomes can be annotated based on their structural, operational, and functional properties. These annotations provide the pivotal scaffold for understanding cellular functions on a genome-scale, such as metabolism and transcriptional regulation. Here, we describe a systems approach to simultaneously determine the structural and operational annotation of the Geobacter sulfurreducens genome. Integration of proteomics, transcriptomics, RNA polymerase, and sigma factor-binding information with deep-sequencing-based analysis of primary 59-end transcripts allowed for a most precise annotation. The structural annotation is comprised of numerous previously undetected genes, noncoding RNAs, prevalent leaderless mRNA transcripts, and antisense transcripts. When compared with other prokaryotes, we found that the number of antisense transcripts reversely correlated with genome size. The operational annotation consists of 1453 operons, 22% of which have multiple transcription start sites that use different RNA polymerase holoenzymes. Several operons with multiple transcription start sites encoded genes with essential functions, giving insight into the regulatory complexity of the genome. The experimentally determined structural and operational annotations can be combined with functional annotation, yielding a new three-level annotation that greatly expands our understanding of prokaryotic genomes.

  5. Structure and replication of geminivirus genomes.

    PubMed

    Davies, J W; Stanley, J; Donson, J; Mullineaux, P M; Boulton, M I

    1987-01-01

    The geminiviruses are a group of plant viruses containing single-stranded (ss) DNA in particles comprising two quasi-icosahedral units. Some are transmitted by whiteflies, others by leafhoppers. Comparisons were made of the genome organization and expression of cassava latent virus (CLV) and maize streak virus (MSV) and beet curly top virus (BCTV), each with distinct host range and insect vector species characteristics. From these studies, several indications as to the replication mechanism(s) are suggested. PMID:3503890

  6. Insights into archaeal evolution and symbiosis from the genomes of a nanoarchaeon and its inferred crenarchaeal host from Obsidian Pool, Yellowstone National Park

    PubMed Central

    2013-01-01

    Background A single cultured marine organism, Nanoarchaeum equitans, represents the Nanoarchaeota branch of symbiotic Archaea, with a highly reduced genome and unusual features such as multiple split genes. Results The first terrestrial hyperthermophilic member of the Nanoarchaeota was collected from Obsidian Pool, a thermal feature in Yellowstone National Park, separated by single cell isolation, and sequenced together with its putative host, a Sulfolobales archaeon. Both the new Nanoarchaeota (Nst1) and N. equitans lack most biosynthetic capabilities, and phylogenetic analysis of ribosomal RNA and protein sequences indicates that the two form a deep-branching archaeal lineage. However, the Nst1 genome is more than 20% larger, and encodes a complete gluconeogenesis pathway as well as the full complement of archaeal flagellum proteins. With a larger genome, a smaller repertoire of split protein encoding genes and no split non-contiguous tRNAs, Nst1 appears to have experienced less severe genome reduction than N. equitans. These findings imply that, rather than representing ancestral characters, the extremely compact genomes and multiple split genes of Nanoarchaeota are derived characters associated with their symbiotic or parasitic lifestyle. The inferred host of Nst1 is potentially autotrophic, with a streamlined genome and simplified central and energetic metabolism as compared to other Sulfolobales. Conclusions Comparison of the N. equitans and Nst1 genomes suggests that the marine and terrestrial lineages of Nanoarchaeota share a common ancestor that was already a symbiont of another archaeon. The two distinct Nanoarchaeota-host genomic data sets offer novel insights into the evolution of archaeal symbiosis and parasitism, enabling further studies of the cellular and molecular mechanisms of these relationships. Reviewers This article was reviewed by Patrick Forterre, Bettina Siebers (nominated by Michael Galperin) and Purification Lopez-Garcia PMID:23607440

  7. Self-Organized Inference of Spatial Structure in Randomly Deployed Sensor Networks

    NASA Astrophysics Data System (ADS)

    George, Neena A.; Minai, Ali A.; Doboli, Simona

    Randomly deployed wireless sensor networks are becoming increasingly viable for many applications. Such networks can comprise anywhere from a few hundred to thousands of sensor nodes, and these sizes are likely to grow with advancing technology, making scalability a primary concern. Each node in these sensor networks is a small unit with limited resources and localized sensing and communication. Thus, all global tasks must be accomplished through self-organized distributed algorithms, which also leads to improved scalability, robustness and flexibility. In this paper, we examine the use of distributed algorithms to infer the spatial structure of an extended environment monitored by a self-organizing sensor network. Based on its sensing, the network segments the environment into regions with distinct characteristics, thereby inferring a cognitive map of the environment. This, in turn, can be used to answer global queries about the environment efficiently and accurately. The main challenge to the network arises from the necessarily irregular spatial sampling and the need for totally distributed computation. We consider distributed machine learning techniques for segmentation and study the variation of segmentation quality with reconstruction at different node densities and in environments of varying complexity.

  8. Population-based 3D genome structure analysis reveals driving forces in spatial genome organization

    PubMed Central

    Li, Wenyuan; Kalhor, Reza; Dai, Chao; Hao, Shengli; Gong, Ke; Zhou, Yonggang; Li, Haochen; Zhou, Xianghong Jasmine; Le Gros, Mark A.; Larabell, Carolyn A.; Chen, Lin; Alber, Frank

    2016-01-01

    Conformation capture technologies (e.g., Hi-C) chart physical interactions between chromatin regions on a genome-wide scale. However, the structural variability of the genome between cells poses a great challenge to interpreting ensemble-averaged Hi-C data, particularly for long-range and interchromosomal interactions. Here, we present a probabilistic approach for deconvoluting Hi-C data into a model population of distinct diploid 3D genome structures, which facilitates the detection of chromatin interactions likely to co-occur in individual cells. Our approach incorporates the stochastic nature of chromosome conformations and allows a detailed analysis of alternative chromatin structure states. For example, we predict and experimentally confirm the presence of large centromere clusters with distinct chromosome compositions varying between individual cells. The stability of these clusters varies greatly with their chromosome identities. We show that these chromosome-specific clusters can play a key role in the overall chromosome positioning in the nucleus and stabilizing specific chromatin interactions. By explicitly considering genome structural variability, our population-based method provides an important tool for revealing novel insights into the key factors shaping the spatial genome organization. PMID:26951677

  9. Spectral entropy criteria for structural segmentation in genomic DNA sequences

    NASA Astrophysics Data System (ADS)

    Chechetkin, V. R.; Lobzin, V. V.

    2004-07-01

    The spectral entropy is calculated with Fourier structure factors and characterizes the level of structural ordering in a sequence of symbols. It may efficiently be applied to the assessment and reconstruction of the modular structure in genomic DNA sequences. We present the relevant spectral entropy criteria for the local and non-local structural segmentation in DNA sequences. The results are illustrated with the model examples and analysis of intervening exon-intron segments in the protein-coding regions.

  10. Interplanetary magnetic field structure at Saturn inferred from nanodust measurements during the 2013 aurora campaign

    NASA Astrophysics Data System (ADS)

    Hsu, H.-W.; Kempf, S.; Badman, S. V.; Kurth, W. S.; Postberg, F.; Srama, R.

    2016-01-01

    Interactions between the solar wind and planetary magnetospheres provide important diagnostic information about the magnetospheric dynamics. The lack of monitoring of upstream solar wind conditions at the outer planets, however, restrains the overall scientific output. Here we apply a new method, using Cassini nanodust stream measurements, to derive the interplanetary magnetic field structure during the 2013 Saturn aurora campaign. Due to the complex dynamical interactions with the interplanetary magnetic field, a fraction of fast nanodust particles emerging from the Saturnian system is sent back into the magnetosphere and can be detected by a spacecraft located within. The time-dependent directionality caused by the variable interplanetary magnetic field enable these particles to probe the solar wind structure remotely. Information about the arrival time of solar wind compression regions (coupled with the heliospheric current sheet crossings) as well as the field direction associated with the solar wind sector structure can be inferred. Here we present a tentative identification of the interplanetary magnetic field sector structure based on Cassini nanodust and radio emission measurements during the 2013 Saturn aurora campaign. Our results show that, the interplanetary magnetic field near Saturn during 2013-080 to 176 was consistent with a two-sector structure. The intensifications of aurora and the radio emission on 2013-095, 112 and 140 coincide with the IMF sector boundaries, indicating that the encounter of the compressed solar wind is the main cause of the observed activities.

  11. Crustal stress and structure at Kīlauea Volcano inferred from seismic anisotropy

    USGS Publications Warehouse

    Johnson, Jessica H.; Swanson, Donald; Roman, Diana C.; Poland, Michael P.; Thelen, Weston A.

    2015-01-01

    Seismic anisotropy, measured through shear wave splitting (SWS) analysis, can be indicative of the state of stress in Earth's crust. Changes in SWS at Kīlauea Volcano, Hawai‘i, associated with the onset of summit eruptive activity in 2008 hint at the potential of the technique for tracking volcanic activity. To use SWS observations as a monitoring tool, however, it is important to understand the cause of seismic anisotropy at the volcano throughout the eruptive cycle. To address this need, we analyzed SWS results from across Kīlauea in combination with macroscopic surface structures (mapped fractures, faults, and fissures) and stress orientations inferred from fault plane solutions. Seismic anisotropy seems to be due to pervasive aligned structures in most regions of the volcano. The upper East and Southwest Rift Zones, however, show a bimodality in stress and SWS, suggesting a stress discontinuity with depth, perhaps related to magma conduits that trend obliquely to the dominant structure. Other areas in and around Kīlauea Caldera display principal stresses of similar magnitudes, indicating that small stress perturbations can rotate the maximum horizontal compressive stress direction by up to 90°. In these locations, static structures generally control SWS, but dynamic conditions due to magmatic activity can override the structural control. Monitoring of SWS may therefore provide important signs of impending volcanism.

  12. Inferring the mesoscale structure of layered, edge-valued, and time-varying networks.

    PubMed

    Peixoto, Tiago P

    2015-10-01

    Many network systems are composed of interdependent but distinct types of interactions, which cannot be fully understood in isolation. These different types of interactions are often represented as layers, attributes on the edges, or as a time dependence of the network structure. Although they are crucial for a more comprehensive scientific understanding, these representations offer substantial challenges. Namely, it is an open problem how to precisely characterize the large or mesoscale structure of network systems in relation to these additional aspects. Furthermore, the direct incorporation of these features invariably increases the effective dimension of the network description, and hence aggravates the problem of overfitting, i.e., the use of overly complex characterizations that mistake purely random fluctuations for actual structure. In this work, we propose a robust and principled method to tackle these problems, by constructing generative models of modular network structure, incorporating layered, attributed and time-varying properties, as well as a nonparametric Bayesian methodology to infer the parameters from data and select the most appropriate model according to statistical evidence. We show that the method is capable of revealing hidden structure in layered, edge-valued, and time-varying networks, and that the most appropriate level of granularity with respect to the additional dimensions can be reliably identified. We illustrate our approach on a variety of empirical systems, including a social network of physicians, the voting correlations of deputies in the Brazilian national congress, the global airport network, and a proximity network of high-school students. PMID:26565289

  13. Inferring the mesoscale structure of layered, edge-valued, and time-varying networks

    NASA Astrophysics Data System (ADS)

    Peixoto, Tiago P.

    2015-10-01

    Many network systems are composed of interdependent but distinct types of interactions, which cannot be fully understood in isolation. These different types of interactions are often represented as layers, attributes on the edges, or as a time dependence of the network structure. Although they are crucial for a more comprehensive scientific understanding, these representations offer substantial challenges. Namely, it is an open problem how to precisely characterize the large or mesoscale structure of network systems in relation to these additional aspects. Furthermore, the direct incorporation of these features invariably increases the effective dimension of the network description, and hence aggravates the problem of overfitting, i.e., the use of overly complex characterizations that mistake purely random fluctuations for actual structure. In this work, we propose a robust and principled method to tackle these problems, by constructing generative models of modular network structure, incorporating layered, attributed and time-varying properties, as well as a nonparametric Bayesian methodology to infer the parameters from data and select the most appropriate model according to statistical evidence. We show that the method is capable of revealing hidden structure in layered, edge-valued, and time-varying networks, and that the most appropriate level of granularity with respect to the additional dimensions can be reliably identified. We illustrate our approach on a variety of empirical systems, including a social network of physicians, the voting correlations of deputies in the Brazilian national congress, the global airport network, and a proximity network of high-school students.

  14. Phylogeny of Oedogoniales, Chaetophorales and Chaetopeltidales (Chlorophyceae): inferences from sequence-structure analysis of ITS2

    PubMed Central

    Buchheim, Mark A.; Sutherland, Danica M.; Schleicher, Tina; Förster, Frank; Wolf, Matthias

    2012-01-01

    Background and Aims The green algal class Chlorophyceae comprises five orders (Chlamydomonadales, Sphaeropleales, Chaetophorales, Chaetopeltidales and Oedogoniales). Attempts to resolve the relationships among these groups have met with limited success. Studies of single genes (18S rRNA, 26S rRNA, rbcL or atpB) have largely failed to unambiguously resolve the relative positions of Oedogoniales, Chaetophorales and Chaetopeltidales (the OCC taxa). In contrast, recent genomics analyses of plastid data from OCC exemplars provided a robust phylogenetic analysis that supports a monophyletic OCC alliance. Methods An ITS2 data set was assembled to independently test the OCC hypothesis and to evaluate the performance of these data in assessing green algal phylogeny at the ordinal or class level. Sequence-structure analysis designed for use with ITS2 data was employed for phylogenetic reconstruction. Key Results Results of this study yielded trees that were, in general, topologically congruent with the results from the genomic analyses, including support for the monophyly of the OCC alliance. Conclusions Not all nodes from the ITS2 analyses exhibited robust support, but our investigation demonstrates that sequence-structure analyses of ITS2 provide a taxon-rich means of testing phylogenetic hypotheses at high taxonomic levels. Thus, the ITS2 data, in the context of sequence-structure analysis, provide an economical supplement or alternative to the single-marker approaches used in green algal phylogeny. PMID:22028463

  15. Genomic Alteration in Head and Neck Squamous Cell Carcinoma (HNSCC) Cell Lines Inferred from Karyotyping, Molecular Cytogenetics, and Array Comparative Genomic Hybridization.

    PubMed

    Singchat, Worapong; Hitakomate, Ekarat; Rerkarmnuaychoke, Budsaba; Suntronpong, Aorarat; Fu, Beiyuan; Bodhisuwan, Winai; Peyachoknagul, Surin; Yang, Fengtang; Koontongkaew, Sittichai; Srikulnath, Kornsorn

    2016-01-01

    Genomic alteration in head and neck squamous cell carcinoma (HNSCC) was studied in two cell line pairs (HN30-HN31 and HN4-HN12) using conventional C-banding, multiplex fluorescence in situ hybridization (M-FISH), and array comparative genomic hybridization (array CGH). HN30 and HN4 were derived from primary lesions in the pharynx and base of tongue, respectively, and HN31 and HN12 were derived from lymph-node metastatic lesions belonging to the same patients. Gain of chromosome 1, 7, and 11 were shared in almost all cell lines. Hierarchical clustering revealed that HN31 was closely related to HN4, which shared eight chromosome alteration cases. Large C-positive heterochromatins were found in the centromeric region of chromosome 9 in HN31 and HN4, which suggests complex structural amplification of the repetitive sequence. Array CGH revealed amplification of 7p22.3p11.2, 8q11.23q12.1, and 14q32.33 in all cell lines involved with tumorigenesis and inflammation genes. The amplification of 2p21 (SIX3), 11p15.5 (H19), and 11q21q22.3 (MAML2, PGR, TRPC6, and MMP family) regions, and deletion of 9p23 (PTPRD) and 16q23.1 (WWOX) regions were identified in HN31 and HN12. Interestingly, partial loss of PTPRD (9p23) and WWOX (16q23.1) genes was identified in HN31 and HN12, and the level of gene expression tended to be the down-regulation of PTPRD, with no detectable expression of the WWOX gene. This suggests that the scarcity of PTPRD and WWOX genes might have played an important role in progression of HNSCC, and could be considered as a target for cancer therapy or a biomarker in molecular pathology. PMID:27501229

  16. Genomic Alteration in Head and Neck Squamous Cell Carcinoma (HNSCC) Cell Lines Inferred from Karyotyping, Molecular Cytogenetics, and Array Comparative Genomic Hybridization

    PubMed Central

    Rerkarmnuaychoke, Budsaba; Suntronpong, Aorarat; Fu, Beiyuan; Bodhisuwan, Winai; Peyachoknagul, Surin; Yang, Fengtang; Koontongkaew, Sittichai; Srikulnath, Kornsorn

    2016-01-01

    Genomic alteration in head and neck squamous cell carcinoma (HNSCC) was studied in two cell line pairs (HN30-HN31 and HN4-HN12) using conventional C-banding, multiplex fluorescence in situ hybridization (M-FISH), and array comparative genomic hybridization (array CGH). HN30 and HN4 were derived from primary lesions in the pharynx and base of tongue, respectively, and HN31 and HN12 were derived from lymph-node metastatic lesions belonging to the same patients. Gain of chromosome 1, 7, and 11 were shared in almost all cell lines. Hierarchical clustering revealed that HN31 was closely related to HN4, which shared eight chromosome alteration cases. Large C-positive heterochromatins were found in the centromeric region of chromosome 9 in HN31 and HN4, which suggests complex structural amplification of the repetitive sequence. Array CGH revealed amplification of 7p22.3p11.2, 8q11.23q12.1, and 14q32.33 in all cell lines involved with tumorigenesis and inflammation genes. The amplification of 2p21 (SIX3), 11p15.5 (H19), and 11q21q22.3 (MAML2, PGR, TRPC6, and MMP family) regions, and deletion of 9p23 (PTPRD) and 16q23.1 (WWOX) regions were identified in HN31 and HN12. Interestingly, partial loss of PTPRD (9p23) and WWOX (16q23.1) genes was identified in HN31 and HN12, and the level of gene expression tended to be the down-regulation of PTPRD, with no detectable expression of the WWOX gene. This suggests that the scarcity of PTPRD and WWOX genes might have played an important role in progression of HNSCC, and could be considered as a target for cancer therapy or a biomarker in molecular pathology. PMID:27501229

  17. On the inference of function from structure using biomechanical modelling and simulation of extinct organisms.

    PubMed

    Hutchinson, John R

    2012-02-23

    Biomechanical modelling and simulation techniques offer some hope for unravelling the complex inter-relationships of structure and function perhaps even for extinct organisms, but have their limitations owing to this complexity and the many unknown parameters for fossil taxa. Validation and sensitivity analysis are two indispensable approaches for quantifying the accuracy and reliability of such models or simulations. But there are other subtleties in biomechanical modelling that include investigator judgements about the level of simplicity versus complexity in model design or how uncertainty and subjectivity are dealt with. Furthermore, investigator attitudes toward models encompass a broad spectrum between extreme credulity and nihilism, influencing how modelling is conducted and perceived. Fundamentally, more data and more testing of methodology are required for the field to mature and build confidence in its inferences. PMID:21666064

  18. Structural Genomics and Drug Discovery for Infectious Diseases

    SciTech Connect

    Anderson, W.F.

    2010-09-03

    The application of structural genomics methods and approaches to proteins from organisms causing infectious diseases is making available the three dimensional structures of many proteins that are potential drug targets and laying the groundwork for structure aided drug discovery efforts. There are a number of structural genomics projects with a focus on pathogens that have been initiated worldwide. The Center for Structural Genomics of Infectious Diseases (CSGID) was recently established to apply state-of-the-art high throughput structural biology technologies to the characterization of proteins from the National Institute for Allergy and Infectious Diseases (NIAID) category A-C pathogens and organisms causing emerging, or re-emerging infectious diseases. The target selection process emphasizes potential biomedical benefits. Selected proteins include known drug targets and their homologs, essential enzymes, virulence factors and vaccine candidates. The Center also provides a structure determination service for the infectious disease scientific community. The ultimate goal is to generate a library of structures that are available to the scientific community and can serve as a starting point for further research and structure aided drug discovery for infectious diseases. To achieve this goal, the CSGID will determine protein crystal structures of 400 proteins and protein-ligand complexes using proven, rapid, highly integrated, and cost-effective methods for such determination, primarily by X-ray crystallography. High throughput crystallographic structure determination is greatly aided by frequent, convenient access to high-performance beamlines at third-generation synchrotron X-ray sources.

  19. Life-history traits of the Miocene Hipparion concudense (Spain) inferred from bone histological structure.

    PubMed

    Martinez-Maza, Cayetana; Alberdi, Maria Teresa; Nieto-Diaz, Manuel; Prado, José Luis

    2014-01-01

    Histological analyses of fossil bones have provided clues on the growth patterns and life history traits of several extinct vertebrates that would be unavailable for classical morphological studies. We analyzed the bone histology of Hipparion to infer features of its life history traits and growth pattern. Microscope analysis of thin sections of a large sample of humeri, femora, tibiae and metapodials of Hipparion concudense from the upper Miocene site of Los Valles de Fuentidueña (Segovia, Spain) has shown that the number of growth marks is similar among the different limb bones, suggesting that equivalent skeletochronological inferences for this Hipparion population might be achieved by means of any of the elements studied. Considering their abundance, we conducted a skeletechronological study based on the large sample of third metapodials from Los Valles de Fuentidueña together with another large sample from the Upper Miocene locality of Concud (Teruel, Spain). The data obtained enabled us to distinguish four age groups in both samples and to determine that Hipparion concudense tended to reach skeletal maturity during its third year of life. Integration of bone microstructure and skeletochronological data allowed us to identify ontogenetic changes in bone structure and growth rate and to distinguish three histologic ontogenetic stages corresponding to immature, subadult and adult individuals. Data on secondary osteon density revealed an increase in bone remodeling throughout the ontogenetic stages and a lesser degree thereof in the Concud population, which indicates different biomechanical stresses in the two populations, likely due to environmental differences. Several individuals showed atypical growth patterns in the Concud sample, which may also reflect environmental differences between the two localities. Finally, classification of the specimens' age within groups enabled us to characterize the age structure of both samples, which is typical of

  20. Life-History Traits of the Miocene Hipparion concudense (Spain) Inferred from Bone Histological Structure

    PubMed Central

    Martinez-Maza, Cayetana; Alberdi, Maria Teresa; Nieto-Diaz, Manuel; Prado, José Luis

    2014-01-01

    Histological analyses of fossil bones have provided clues on the growth patterns and life history traits of several extinct vertebrates that would be unavailable for classical morphological studies. We analyzed the bone histology of Hipparion to infer features of its life history traits and growth pattern. Microscope analysis of thin sections of a large sample of humeri, femora, tibiae and metapodials of Hipparion concudense from the upper Miocene site of Los Valles de Fuentidueña (Segovia, Spain) has shown that the number of growth marks is similar among the different limb bones, suggesting that equivalent skeletochronological inferences for this Hipparion population might be achieved by means of any of the elements studied. Considering their abundance, we conducted a skeletechronological study based on the large sample of third metapodials from Los Valles de Fuentidueña together with another large sample from the Upper Miocene locality of Concud (Teruel, Spain). The data obtained enabled us to distinguish four age groups in both samples and to determine that Hipparion concudense tended to reach skeletal maturity during its third year of life. Integration of bone microstructure and skeletochronological data allowed us to identify ontogenetic changes in bone structure and growth rate and to distinguish three histologic ontogenetic stages corresponding to immature, subadult and adult individuals. Data on secondary osteon density revealed an increase in bone remodeling throughout the ontogenetic stages and a lesser degree thereof in the Concud population, which indicates different biomechanical stresses in the two populations, likely due to environmental differences. Several individuals showed atypical growth patterns in the Concud sample, which may also reflect environmental differences between the two localities. Finally, classification of the specimens’ age within groups enabled us to characterize the age structure of both samples, which is typical of

  1. Genome Pool Strategy for Structural Coverage of Protein Families

    SciTech Connect

    Jaroszewski, L.; Slabinski, L.; Wooley, J.; Deacon, A.M.; Lesley, S.A.; Wilson, I.A.; Godzik, A.

    2009-05-18

    Even closely homologous proteins often have different crystallization properties and propensities. This observation can be used to introduce an additional dimension into crystallization trials by simultaneous targeting multiple homologs in what we call a 'genome pool' strategy. We show that this strategy works because protein physicochemical properties correlated with crystallization success have a surprisingly broad distribution within most protein families. There are also easy and difficult families where this distribution is tilted in one direction. This leads to uneven structural coverage of protein families, with more easy ones solved. Increasing the size of the genome pool can improve chances of solving the difficult ones. In contrast, our analysis does not indicate that any specific genomes are easy or difficult. Finally, we show that the group of proteins with known 3D structures is systematically different from the general pool of known proteins and we assess the structural consequences of these differences.

  2. Phylogenetic inference and SSR characterization of tropical woody bamboos tribe Bambuseae (Poaceae: Bambusoideae) based on complete plastid genome sequences.

    PubMed

    Vieira, Leila do Nascimento; Dos Anjos, Karina Goulart; Faoro, Helisson; Fraga, Hugo Pacheco de Freitas; Greco, Thiago Machado; Pedrosa, Fábio de Oliveira; de Souza, Emanuel Maltempi; Rogalski, Marcelo; de Souza, Robson Francisco; Guerra, Miguel Pedro

    2016-05-01

    The complete plastome sequencing is an efficient option for increasing phylogenetic resolution and evolutionary studies, as well as may greatly facilitate the use of plastid DNA markers in plant population genetic studies. Merostachys and Guadua stand out as the most common and the highest potential utilization bamboos indigenous of Brazil. Here, we sequenced the complete plastome sequences of the Brazilian Guadua chacoensis and Merostachys sp. to perform full plastome phylogeny and characterize the occurrence, type, and distribution of SRRs using 20 Bambuseae species. The determined plastome sequence of Merostachys sp. and G. chacoensis is 136,334 and 135,403 bp in size, respectively, with an identical gene content and typical quadripartite structure consisting of a pair of IRs separated by the LSC and SSC regions. The Maximum Likelihood and Bayesian Inference analyses produced phylogenomic trees identical in topology. These trees supported monophyly of Paleotropical and Neotropical Bamboos clades. The Neotropical bamboos segregated into three well-supported lineages, Chusqueinae, Guaduinae, and Arthrostylidiinae, with the last two forming a well-supported sister relationship. Paleotropical bamboos segregated into two well-supported lineages, Hickeliinae and Bambusinae + Melocanninae. We identified 141.8 cpSSR in Bambuseae plastomes and an inferior value (38.15) for plastome coding sequences. Among them, we identified 16 polymorphic SSR loci, with number of alleles varying from 3 to 10. These 16 polymorphic cpSSR loci in Bambuseae plastome can be assessed for the intraspecific level of polymorphism, leading to innovative highly sensitive phylogeographic and population genetics studies for this tribe. PMID:26643654

  3. The genome and structural proteome of an ocean siphovirus: a new window into the cyanobacterial ‘mobilome’

    PubMed Central

    Sullivan, Matthew B; Krastins, Bryan; Hughes, Jennifer L; Kelly, Libusha; Chase, Michael; Sarracino, David; Chisholm, Sallie W

    2009-01-01

    Prochlorococcus, an abundant phototroph in the oceans, are infected by members of three families of viruses: myo-, podo- and siphoviruses. Genomes of myo- and podoviruses isolated on Prochlorococcus contain DNA replication machinery and virion structural genes homologous to those from coliphages T4 and T7 respectively. They also contain a suite of genes of cyanobacterial origin, most notably photosynthesis genes, which are expressed during infection and appear integral to the evolutionary trajectory of both host and phage. Here we present the first genome of a cyanobacterial siphovirus, P-SS2, which was isolated from Atlantic slope waters using a Prochlorococcus host (MIT9313). The P-SS2 genome is larger than, and considerably divergent from, previously sequenced siphoviruses. It appears most closely related to lambdoid siphoviruses, with which it shares 13 functional homologues. The ∼108 kb P-SS2 genome encodes 131 predicted proteins and notably lacks photosynthesis genes which have consistently been found in other marine cyanophage, but does contain 14 other cyanobacterial homologues. While only six structural proteins were identified from the genome sequence, 35 proteins were detected experimentally; these mapped onto capsid and tail structural modules in the genome. P-SS2 is potentially capable of integration into its host as inferred from bioinformatically identified genetic machinery int, bet, exo and a 53 bp attachment site. The host attachment site appears to be a genomic island that is tied to insertion sequence (IS) activity that could facilitate mobility of a gene involved in the nitrogen-stress response. The homologous region and a secondary IS-element hot-spot in Synechococcus RS9917 are further evidence of IS-mediated genome evolution coincident with a probable relic prophage integration event. This siphovirus genome provides a glimpse into the biology of a deep-photic zone phage as well as the ocean cyanobacterial prophage and IS element

  4. The Impact of Structural Genomics: Expectations and Outcomes

    SciTech Connect

    Chandonia, John-Marc; Brenner, Steven E.

    2005-12-21

    Structural Genomics (SG) projects aim to expand our structural knowledge of biological macromolecules, while lowering the average costs of structure determination. We quantitatively analyzed the novelty, cost, and impact of structures solved by SG centers, and contrast these results with traditional structural biology. The first structure from a protein family is particularly important to reveal the fold and ancient relationships to other proteins. In the last year, approximately half of such structures were solved at a SG center rather than in a traditional laboratory. Furthermore, the cost of solving a structure at the most efficient U.S. center has now dropped to one-quarter the estimated cost of solving a structure by traditional methods. However, top structural biology laboratories are much more efficient than the average, and comparable to SG centers despite working on very challenging structures. Moreover, traditional structural biology papers are cited significantly more often, suggesting greater current impact.

  5. Evaluating the Influence of the Microsatellite Marker Set on the Genetic Structure Inferred in Pyrus communis L.

    PubMed Central

    Urrestarazu, Jorge; Royo, José B.; Santesteban, Luis G.; Miranda, Carlos

    2015-01-01

    Fingerprinting information can be used to elucidate in a robust manner the genetic structure of germplasm collections, allowing a more rational and fine assessment of genetic resources. Bayesian model-based approaches are nowadays majorly preferred to infer genetic structure, but it is still largely unresolved how marker sets should be built in order to obtain a robust inference. The objective was to evaluate, in Pyrus germplasm collections, the influence of the SSR marker set size on the genetic structure inferred, also evaluating the influence of the criterion used to select those markers. Inferences were performed considering an increasing number of SSR markers that ranged from just two up to 25, incorporated one at a time into the analysis. The influence of the number of SSR markers used was evaluated comparing the number of populations and the strength of the signal detected, and also the similarity of the genotype assignments to populations between analyses. In order to test if those results were influenced by the criterion used to select the SSRs, several choosing scenarios based on the discrimination power or the fixation index values of the SSRs were tested. Our results indicate that population structure could be inferred accurately once a certain SSR number threshold was reached, which depended on the underlying structure within the genotypes, but the method used to select the markers included on each set appeared not to be very relevant. The minimum number of SSRs required to provide robust structure inferences and adequate measurements of the differentiation, even when low differentiation levels exist within populations, was proved similar to that of the complete list of recommended markers for fingerprinting. When a SSR set size similar to the minimum marker sets recommended for fingerprinting it is used, only major divisions or moderate (FST>0.05) differentiation of the germplasm are detected. PMID:26382618

  6. Brain imaging and cognitive neuroscience. Toward strong inference in attributing function to structure.

    PubMed

    Sarter, M; Berntson, G G; Cacioppo, J T

    1996-01-01

    Cognitive neuroscience has emerged from the neurosciences and cognitive psychology as a scientific discipline that aims at the determination of "how brain function gives rise to mental activity" (S. M. Kosslyn & L. M. Shin, 1992, p. 146). While research in cognitive neuroscience combines many levels of neuroscientific and psychological analyses, modern imaging techniques that monitor brain activity during behavioral or cognitive operations have significantly contributed to the emergence of this discipline. The conclusions deduced from these studies are inherently localizationistic in nature; in other words, they describe cognitive functions as being localized in focal brain regions (brain activity in a defined brain region, phi, is involved in specific cognitive function, psi). A broad discussion about the virtues and limitations of such conclusions may help avoid the emergence of a mentalistic localizationism (i.e., the attribution of mentalistic concepts such as happiness, morality, or consciousness to brain structure) and illustrates the importance of a convergence with information generated by different research strategies (such as, for example, evidence generated by studies in which the effects of experimental manipulations of local neuronal processes on cognitive functions are assessed). Progress in capitalizing on brain-imaging studies to investigate questions of the form "brain structure or event phi is associated with cognitive function psi" may be impeded because of the way in which inferences are typically formulated in the brain imaging literature. A conceptual framework to advance the interpretation of data describing the relationships between cognitive phenomena and brain structure activity is provided. PMID:8585670

  7. Benefits of Structural Genomics for Drug Discovery Research

    SciTech Connect

    Grabowski, M.; Chruszcz, M; Zimmerman, M; Kirillova, O; Minor, W

    2009-01-01

    While three dimensional structures have long been used to search for new drug targets, only a fraction of new drugs coming to the market has been developed with the use of a structure-based drug discovery approach. However, the recent years have brought not only an avalanche of new macromolecular structures, but also significant advances in the protein structure determination methodology only now making their way into structure-based drug discovery. In this paper, we review recent developments resulting from the Structural Genomics (SG) programs, focusing on the methods and results most likely to improve our understanding of the molecular foundation of human diseases. SG programs have been around for almost a decade, and in that time, have contributed a significant part of the structural coverage of both the genomes of pathogens causing infectious diseases and structurally uncharacterized biological processes in general. Perhaps most importantly, SG programs have developed new methodology at all steps of the structure determination process, not only to determine new structures highly efficiently, but also to screen protein/ligand interactions. We describe the methodologies, experience and technologies developed by SG, which range from improvements to cloning protocols to improved procedures for crystallographic structure solution that may be applied in 'traditional' structural biology laboratories particularly those performing drug discovery. We also discuss the conditions that must be met to convert the present high-throughput structure determination pipeline into a high-output structure-based drug discovery system.

  8. Coevolution of the Organization and Structure of Prokaryotic Genomes.

    PubMed

    Touchon, Marie; Rocha, Eduardo P C

    2016-01-01

    The cytoplasm of prokaryotes contains many molecular machines interacting directly with the chromosome. These vital interactions depend on the chromosome structure, as a molecule, and on the genome organization, as a unit of genetic information. Strong selection for the organization of the genetic elements implicated in these interactions drives replicon ploidy, gene distribution, operon conservation, and the formation of replication-associated traits. The genomes of prokaryotes are also very plastic with high rates of horizontal gene transfer and gene loss. The evolutionary conflicts between plasticity and organization lead to the formation of regions with high genetic diversity whose impact on chromosome structure is poorly understood. Prokaryotic genomes are remarkable documents of natural history because they carry the imprint of all of these selective and mutational forces. Their study allows a better understanding of molecular mechanisms, their impact on microbial evolution, and how they can be tinkered in synthetic biology. PMID:26729648

  9. Inferring population structure and relationship using minimal independent evolutionary markers in Y-chromosome: a hybrid approach of recursive feature selection for hierarchical clustering

    PubMed Central

    Srivastava, Amit Kumar; Chopra, Rupali; Ali, Shafat; Aggarwal, Shweta; Vig, Lovekesh; Koul Bamezai, Rameshwar Nath

    2014-01-01

    Inundation of evolutionary markers expedited in Human Genome Project and 1000 Genome Consortium has necessitated pruning of redundant and dependent variables. Various computational tools based on machine-learning and data-mining methods like feature selection/extraction have been proposed to escape the curse of dimensionality in large datasets. Incidentally, evolutionary studies, primarily based on sequentially evolved variations have remained un-facilitated by such advances till date. Here, we present a novel approach of recursive feature selection for hierarchical clustering of Y-chromosomal SNPs/haplogroups to select a minimal set of independent markers, sufficient to infer population structure as precisely as deduced by a larger number of evolutionary markers. To validate the applicability of our approach, we optimally designed MALDI-TOF mass spectrometry-based multiplex to accommodate independent Y-chromosomal markers in a single multiplex and genotyped two geographically distinct Indian populations. An analysis of 105 world-wide populations reflected that 15 independent variations/markers were optimal in defining population structure parameters, such as FST, molecular variance and correlation-based relationship. A subsequent addition of randomly selected markers had a negligible effect (close to zero, i.e. 1 × 10−3) on these parameters. The study proves efficient in tracing complex population structures and deriving relationships among world-wide populations in a cost-effective and expedient manner. PMID:25030906

  10. Functional characterization of somatic mutations in cancer using network-based inference of protein activity | Office of Cancer Genomics

    Cancer.gov

    Identifying the multiple dysregulated oncoproteins that contribute to tumorigenesis in a given patient is crucial for developing personalized treatment plans. However, accurate inference of aberrant protein activity in biological samples is still challenging as genetic alterations are only partially predictive and direct measurements of protein activity are generally not feasible.

  11. Arthropod Phylogenetics in Light of Three Novel Millipede (Myriapoda: Diplopoda) Mitochondrial Genomes with Comments on the Appropriateness of Mitochondrial Genome Sequence Data for Inferring Deep Level Relationships

    PubMed Central

    Brewer, Michael S.; Swafford, Lynn; Spruill, Chad L.; Bond, Jason E.

    2013-01-01

    Background Arthropods are the most diverse group of eukaryotic organisms, but their phylogenetic relationships are poorly understood. Herein, we describe three mitochondrial genomes representing orders of millipedes for which complete genomes had not been characterized. Newly sequenced genomes are combined with existing data to characterize the protein coding regions of myriapods and to attempt to reconstruct the evolutionary relationships within the Myriapoda and Arthropoda. Results The newly sequenced genomes are similar to previously characterized millipede sequences in terms of synteny and length. Unique translocations occurred within the newly sequenced taxa, including one half of the Appalachioria falcifera genome, which is inverted with respect to other millipede genomes. Across myriapods, amino acid conservation levels are highly dependent on the gene region. Additionally, individual loci varied in the level of amino acid conservation. Overall, most gene regions showed low levels of conservation at many sites. Attempts to reconstruct the evolutionary relationships suffered from questionable relationships and low support values. Analyses of phylogenetic informativeness show the lack of signal deep in the trees (i.e., genes evolve too quickly). As a result, the myriapod tree resembles previously published results but lacks convincing support, and, within the arthropod tree, well established groups were recovered as polyphyletic. Conclusions The novel genome sequences described herein provide useful genomic information concerning millipede groups that had not been investigated. Taken together with existing sequences, the variety of compositions and evolution of myriapod mitochondrial genomes are shown to be more complex than previously thought. Unfortunately, the use of mitochondrial protein-coding regions in deep arthropod phylogenetics appears problematic, a result consistent with previously published studies. Lack of phylogenetic signal renders the

  12. Inferring Meaning from Syntactic Structures in Acquisition: The Case of Transitivity and Telicity

    PubMed Central

    Wagner, Laura

    2009-01-01

    This paper investigated children’s ability to use syntactic structures to infer semantic information. The particular syntax-semantics link examined was the one between transitivity (transitive/intransitive structures) and telicity (telic/atelic perspectives; that is, boundedness). Although transitivity is an important syntactic reflex of telicity, it is neither necessary nor sufficient for predicting a telicity value; it is therefore a weak cue for telicity semantics. Nevertheless, children do make use of it. Experiment 1 used a match-to-sample task and found that 3-year children could use transitivity information to guide their interpretations of telicity. Experiment 2 used a preferential looking task with 2-year-old children and similarly found that these children could successfully use transitivity as a cue to telicity. Children in both experiments succeeded with both causal and directed-motion events, suggesting that telicity judgments are not tied to any one event type. These results are discussed in the context of other semantic elements that children can link to transitivity, and taken together, are argued to support a largely inferential link between transitivity and telicity. PMID:21399703

  13. Symbolic extensions applied to multiscale structure of genomes.

    PubMed

    Downarowicz, Tomasz; Travisany, Dante; Montecino, Martin; Maass, Alejandro

    2014-06-01

    A genome of a living organism consists of a long string of symbols over a finite alphabet carrying critical information for the organism. This includes its ability to control post natal growth, homeostasis, adaptation to changes in the surrounding environment, or to biochemically respond at the cellular level to various specific regulatory signals. In this sense, a genome represents a symbolic encoding of a highly organized system of information whose functioning may be revealed as a natural multilayer structure in terms of complexity and prominence. In this paper we use the mathematical theory of symbolic extensions as a framework to shed light onto how this multilayer organization is reflected in the symbolic coding of the genome. The distribution of data in an element of a standard symbolic extension of a dynamical system has a specific form: the symbolic sequence is divided into several subsequences (which we call layers) encoding the dynamics on various "scales". We propose that a similar structure resides within the genomes, building our analogy on some of the most recent findings in the field of regulation of genomic DNA functioning. PMID:24728912

  14. Effects of Sample Selection Bias on the Accuracy of Population Structure and Ancestry Inference

    PubMed Central

    Shringarpure, Suyash; Xing, Eric P.

    2014-01-01

    Population stratification is an important task in genetic analyses. It provides information about the ancestry of individuals and can be an important confounder in genome-wide association studies. Public genotyping projects have made a large number of datasets available for study. However, practical constraints dictate that of a geographical/ethnic population, only a small number of individuals are genotyped. The resulting data are a sample from the entire population. If the distribution of sample sizes is not representative of the populations being sampled, the accuracy of population stratification analyses of the data could be affected. We attempt to understand the effect of biased sampling on the accuracy of population structure analysis and individual ancestry recovery. We examined two commonly used methods for analyses of such datasets, ADMIXTURE and EIGENSOFT, and found that the accuracy of recovery of population structure is affected to a large extent by the sample used for analysis and how representative it is of the underlying populations. Using simulated data and real genotype data from cattle, we show that sample selection bias can affect the results of population structure analyses. We develop a mathematical framework for sample selection bias in models for population structure and also proposed a correction for sample selection bias using auxiliary information about the sample. We demonstrate that such a correction is effective in practice using simulated and real data. PMID:24637351

  15. Clumpak: a program for identifying clustering modes and packaging population structure inferences across K

    PubMed Central

    Kopelman, Naama M; Mayzel, Jonathan; Jakobsson, Mattias; Rosenberg, Noah A; Mayrose, Itay

    2015-01-01

    The identification of the genetic structure of populations from multilocus genotype data has become a central component of modern population-genetic data analysis. Application of model-based clustering programs often entails a number of steps, in which the user considers different modeling assumptions, compares results across different pre-determined values of the number of assumed clusters (a parameter typically denoted K), examines multiple independent runs for each fixed value of K, and distinguishes among runs belonging to substantially distinct clustering solutions. Here, we present Clumpak (Cluster Markov Packager Across K), a method that automates the post-processing of results of model-based population structure analyses. For analyzing multiple independent runs at a single K value, Clumpak identifies sets of highly similar runs, separating distinct groups of runs that represent distinct modes in the space of possible solutions. This procedure, which generates a consensus solution for each distinct mode, is performed by the use of a Markov clustering algorithm that relies on a similarity matrix between replicate runs, as computed by the software Clumpp. Next, Clumpak identifies an optimal alignment of inferred clusters across different values of K, extending a similar approach implemented for a fixed K in Clumpp, and simplifying the comparison of clustering results across different K values. Clumpak incorporates additional features, such as implementations of methods for choosing K and comparing solutions obtained by different programs, models, or data subsets. Clumpak, available at http://clumpak.tau.ac.il, simplifies the use of model-based analyses of population structure in population genetics and molecular ecology. PMID:25684545

  16. Likelihood-free inference of population structure and local adaptation in a Bayesian hierarchical model.

    PubMed

    Bazin, Eric; Dawson, Kevin J; Beaumont, Mark A

    2010-06-01

    We address the problem of finding evidence of natural selection from genetic data, accounting for the confounding effects of demographic history. In the absence of natural selection, gene genealogies should all be sampled from the same underlying distribution, often approximated by a coalescent model. Selection at a particular locus will lead to a modified genealogy, and this motivates a number of recent approaches for detecting the effects of natural selection in the genome as "outliers" under some models. The demographic history of a population affects the sampling distribution of genealogies, and therefore the observed genotypes and the classification of outliers. Since we cannot see genealogies directly, we have to infer them from the observed data under some model of mutation and demography. Thus the accuracy of an outlier-based approach depends to a greater or a lesser extent on the uncertainty about the demographic and mutational model. A natural modeling framework for this type of problem is provided by Bayesian hierarchical models, in which parameters, such as mutation rates and selection coefficients, are allowed to vary across loci. It has proved quite difficult computationally to implement fully probabilistic genealogical models with complex demographies, and this has motivated the development of approximations such as approximate Bayesian computation (ABC). In ABC the data are compressed into summary statistics, and computation of the likelihood function is replaced by simulation of data under the model. In a hierarchical setting one may be interested both in hyperparameters and parameters, and there may be very many of the latter--for example, in a genetic model, these may be parameters describing each of many loci or populations. This poses a problem for ABC in that one then requires summary statistics for each locus, which, if used naively, leads to a consequent difficulty in conditional density estimation. We develop a general method for applying

  17. Genomic Structure and Evolution of Multigene Families: “Flowers” on the Human Genome

    PubMed Central

    Kim, Hie Lim; Iwase, Mineyo; Igawa, Takeshi; Nishioka, Tasuku; Kaneko, Satoko; Katsura, Yukako; Takahata, Naoyuki; Satta, Yoko

    2012-01-01

    We report the results of an extensive investigation of genomic structures in the human genome, with a particular focus on relatively large repeats (>50 kb) in adjacent chromosomal regions. We named such structures “Flowers” because the pattern observed on dot plots resembles a flower. We detected a total of 291 Flowers in the human genome. They were predominantly located in euchromatic regions. Flowers are gene-rich compared to the average gene density of the genome. Genes involved in systems receiving environmental information, such as immunity and detoxification, were overrepresented in Flowers. Within a Flower, the mean number of duplication units was approximately four. The maximum and minimum identities between homologs in a Flower showed different distributions; the maximum identity was often concentrated to 100% identity, while the minimum identity was evenly distributed in the range of 78% to 100%. Using a gene conversion detection test, we found frequent and/or recent gene conversion events within the tested Flowers. Interestingly, many of those converted regions contained protein-coding genes. Computer simulation studies suggest that one role of such frequent gene conversions is the elongation of the life span of gene families in a Flower by the resurrection of pseudogenes. PMID:22779033

  18. Genomic structure and evolution of multigene families: "flowers" on the human genome.

    PubMed

    Kim, Hie Lim; Iwase, Mineyo; Igawa, Takeshi; Nishioka, Tasuku; Kaneko, Satoko; Katsura, Yukako; Takahata, Naoyuki; Satta, Yoko

    2012-01-01

    We report the results of an extensive investigation of genomic structures in the human genome, with a particular focus on relatively large repeats (>50 kb) in adjacent chromosomal regions. We named such structures "Flowers" because the pattern observed on dot plots resembles a flower. We detected a total of 291 Flowers in the human genome. They were predominantly located in euchromatic regions. Flowers are gene-rich compared to the average gene density of the genome. Genes involved in systems receiving environmental information, such as immunity and detoxification, were overrepresented in Flowers. Within a Flower, the mean number of duplication units was approximately four. The maximum and minimum identities between homologs in a Flower showed different distributions; the maximum identity was often concentrated to 100% identity, while the minimum identity was evenly distributed in the range of 78% to 100%. Using a gene conversion detection test, we found frequent and/or recent gene conversion events within the tested Flowers. Interestingly, many of those converted regions contained protein-coding genes. Computer simulation studies suggest that one role of such frequent gene conversions is the elongation of the life span of gene families in a Flower by the resurrection of pseudogenes. PMID:22779033

  19. Prokaryotic genes in eukaryotic genome sequences: when to infer horizontal gene transfer and when to suspect an actual microbe.

    PubMed

    Artamonova, Irena I; Lappi, Tanya; Zudina, Liudmila; Mushegian, Arcady R

    2015-07-01

    Assessment of phylogenetic positions of predicted gene and protein sequences is a routine step in any genome project, useful for validating the species' taxonomic position and for evaluating hypotheses about genome evolution and function. Several recent eukaryotic genome projects have reported multiple gene sequences that were much more similar to homologues in bacteria than to any eukaryotic sequence. In the spirit of the times, horizontal gene transfer from bacteria to eukaryotes has been invoked in some of these cases. Here, we show, using comparative sequence analysis, that some of those bacteria-like genes indeed appear likely to have been horizontally transferred from bacteria to eukaryotes. In other cases, however, the evidence strongly indicates that the eukaryotic DNA sequenced in the genome project contains a sample of non-integrated DNA from the actual bacteria, possibly providing a window into the host microbiome. Recent literature suggests also that common reagents, kits and laboratory equipment may be systematically contaminated with bacterial DNA, which appears to be sampled by metagenome projects non-specifically. We review several bioinformatic criteria that help to distinguish putative horizontal gene transfers from the admixture of genes from autonomously replicating bacteria in their hosts' genome databases or from the reagent contamination. PMID:25919787

  20. The new physician as unwitting quantum mechanic: is adapting Dirac's inference system best practice for personalized medicine, genomics, and proteomics?

    PubMed

    Robson, Barry

    2007-08-01

    What is the Best Practice for automated inference in Medical Decision Support for personalized medicine? A known system already exists as Dirac's inference system from quantum mechanics (QM) using bra-kets and bras where A and B are states, events, or measurements representing, say, clinical and biomedical rules. Dirac's system should theoretically be the universal best practice for all inference, though QM is notorious as sometimes leading to bizarre conclusions that appear not to be applicable to the macroscopic world of everyday world human experience and medical practice. It is here argued that this apparent difficulty vanishes if QM is assigned one new multiplication function @, which conserves conditionality appropriately, making QM applicable to classical inference including a quantitative form of the predicate calculus. An alternative interpretation with the same consequences is if every i = radical-1 in Dirac's QM is replaced by h, an entity distinct from 1 and i and arguably a hidden root of 1 such that h2 = 1. With that exception, this paper is thus primarily a review of the application of Dirac's system, by application of linear algebra in the complex domain to help manipulate information about associations and ontology in complicated data. Any combined bra-ket can be shown to be composed only of the sum of QM-like bra and ket weights c(), times an exponential function of Fano's mutual information measure I(A; B) about the association between A and B, that is, an association rule from data mining. With the weights and Fano measure re-expressed as expectations on finite data using Riemann's Incomplete (i.e., Generalized) Zeta Functions, actual counts of observations for real world sparse data can be readily utilized. Finally, the paper compares identical character, distinguishability of states events or measurements, correlation, mutual information, and orthogonal character, important issues in data mining

  1. Geographic population structure analysis of worldwide human populations infers their biogeographical origins

    PubMed Central

    Elhaik, Eran; Tatarinova, Tatiana; Chebotarev, Dmitri; Piras, Ignazio S.; Maria Calò, Carla; De Montis, Antonella; Atzori, Manuela; Marini, Monica; Tofanelli, Sergio; Francalacci, Paolo; Pagani, Luca; Tyler-Smith, Chris; Xue, Yali; Cucca, Francesco; Schurr, Theodore G.; Gaieski, Jill B.; Melendez, Carlalynne; Vilar, Miguel G.; Owings, Amanda C.; Gómez, Rocío; Fujita, Ricardo; Santos, Fabrício R.; Comas, David; Balanovsky, Oleg; Balanovska, Elena; Zalloua, Pierre; Soodyall, Himla; Pitchappan, Ramasamy; GaneshPrasad, ArunKumar; Hammer, Michael; Matisoo-Smith, Lisa; Wells, R. Spencer; Acosta, Oscar; Adhikarla, Syama; Adler, Christina J.; Bertranpetit, Jaume; Clarke, Andrew C.; Cooper, Alan; Der Sarkissian, Clio S. I.; Haak, Wolfgang; Haber, Marc; Jin, Li; Kaplan, Matthew E.; Li, Hui; Li, Shilin; Martínez-Cruz, Begoña; Merchant, Nirav C.; Mitchell, John R.; Parida, Laxmi; Platt, Daniel E.; Quintana-Murci, Lluis; Renfrew, Colin; Lacerda, Daniela R.; Royyuru, Ajay K.; Sandoval, Jose Raul; Santhakumari, Arun Varatharajan; Soria Hernanz, David F.; Swamikrishnan, Pandikumar; Ziegle, Janet S.

    2014-01-01

    The search for a method that utilizes biological information to predict humans’ place of origin has occupied scientists for millennia. Over the past four decades, scientists have employed genetic data in an effort to achieve this goal but with limited success. While biogeographical algorithms using next-generation sequencing data have achieved an accuracy of 700 km in Europe, they were inaccurate elsewhere. Here we describe the Geographic Population Structure (GPS) algorithm and demonstrate its accuracy with three data sets using 40,000–130,000 SNPs. GPS placed 83% of worldwide individuals in their country of origin. Applied to over 200 Sardinians villagers, GPS placed a quarter of them in their villages and most of the rest within 50 km of their villages. GPS’s accuracy and power to infer the biogeography of worldwide individuals down to their country or, in some cases, village, of origin, underscores the promise of admixture-based methods for biogeography and has ramifications for genetic ancestry testing. PMID:24781250

  2. Population structure and demographic inferences concerning the endangered onychophoran species Epiperipatus acacioi (Onychophora: Peripatidae).

    PubMed

    Lacorte, G A; Oliveira, I S; Fonseca, C G

    2011-01-01

    Epiperipatus acacioi (Onychophora: Peripatidae) is an endemic species of the Atlantic rainforest in southeastern Brazil, with a restricted known distribution, found only in two nearby areas (Tripuí and Itacolomi). Mitochondrial gene COI sequences of 93 specimens collected across the known range of E. acacioi were used to assess the extant genetic diversity and patterns of genetic structure, as well as to infer the demographic history of this species. We found considerable variability within the populations, even though there has been recent environmental disturbance in these habitats. The samples from the two areas where this species is found showed significantly different COI sequences and constitute two distinct populations [exact test of sample differentiation (P = 0.0008) and pairwise F(ST) analyses (F(ST) = 0.214, P < 0.00001)]. However, there was little genetic differentiation among samples from different sampling sites within populations, suggesting that the potential for dispersal of E. acacioi greater than would have been expected, based on their cryptic behavior and reduced vagility. Mismatch analyses and neutrality tests revealed evidence of recent population expansion processes for both populations, possibly related to variations in the past distribution of this species. PMID:22095603

  3. Geographic population structure analysis of worldwide human populations infers their biogeographical origins.

    PubMed

    Elhaik, Eran; Tatarinova, Tatiana; Chebotarev, Dmitri; Piras, Ignazio S; Maria Calò, Carla; De Montis, Antonella; Atzori, Manuela; Marini, Monica; Tofanelli, Sergio; Francalacci, Paolo; Pagani, Luca; Tyler-Smith, Chris; Xue, Yali; Cucca, Francesco; Schurr, Theodore G; Gaieski, Jill B; Melendez, Carlalynne; Vilar, Miguel G; Owings, Amanda C; Gómez, Rocío; Fujita, Ricardo; Santos, Fabrício R; Comas, David; Balanovsky, Oleg; Balanovska, Elena; Zalloua, Pierre; Soodyall, Himla; Pitchappan, Ramasamy; Ganeshprasad, Arunkumar; Hammer, Michael; Matisoo-Smith, Lisa; Wells, R Spencer

    2014-01-01

    The search for a method that utilizes biological information to predict humans' place of origin has occupied scientists for millennia. Over the past four decades, scientists have employed genetic data in an effort to achieve this goal but with limited success. While biogeographical algorithms using next-generation sequencing data have achieved an accuracy of 700 km in Europe, they were inaccurate elsewhere. Here we describe the Geographic Population Structure (GPS) algorithm and demonstrate its accuracy with three data sets using 40,000-130,000 SNPs. GPS placed 83% of worldwide individuals in their country of origin. Applied to over 200 Sardinians villagers, GPS placed a quarter of them in their villages and most of the rest within 50 km of their villages. GPS's accuracy and power to infer the biogeography of worldwide individuals down to their country or, in some cases, village, of origin, underscores the promise of admixture-based methods for biogeography and has ramifications for genetic ancestry testing. PMID:24781250

  4. Hierarchical structure analysis describing abnormal base composition of genomes

    NASA Astrophysics Data System (ADS)

    Ouyang, Zhengqing; Liu, Jian-Kun; She, Zhen-Su

    2005-10-01

    Abnormal base compositional patterns of genomic DNA sequences are studied in the framework of a hierarchical structure (HS) model originally proposed for the study of fully developed turbulence [She and Lévêque, Phys. Rev. Lett. 72, 336 (1994)]. The HS similarity law is verified over scales between 103bp and 105bp , and the HS parameter β is proposed to describe the degree of heterogeneity in the base composition patterns. More than one hundred bacteria, archaea, virus, yeast, and human genome sequences have been analyzed and the results show that the HS analysis efficiently captures abnormal base composition patterns, and the parameter β is a characteristic measure of the genome. Detailed examination of the values of β reveals an intriguing link to the evolutionary events of genetic material transfer. Finally, a sequence complexity (S) measure is proposed to characterize gradual increase of organizational complexity of the genome during the evolution. The present study raises several interesting issues in the evolutionary history of genomes.

  5. Structural analysis of hepatitis C RNA genome using DNA microarrays

    PubMed Central

    Martell, María; Briones, Carlos; de Vicente, Aránzazu; Piron, María; Esteban, Juan I.; Esteban, Rafael; Guardia, Jaime; Gómez, Jordi

    2004-01-01

    Many studies have tried to identify specific nucleotide sequences in the quasispecies of hepatitis C virus (HCV) that determine resistance or sensitivity to interferon (IFN) therapy, unfortunately without conclusive results. Although viral proteins represent the most evident phenotype of the virus, genomic RNA sequences determine secondary and tertiary structures which are also part of the viral phenotype and can be involved in important biological roles. In this work, a method of RNA structure analysis has been developed based on the hybridization of labelled HCV transcripts to microarrays of complementary DNA oligonucleotides. Hybridizations were carried out at non-denaturing conditions, using appropriate temperature and buffer composition to allow binding to the immobilized probes of the RNA transcript without disturbing its secondary/tertiary structural motifs. Oligonucleotides printed onto the microarray covered the entire 5′ non-coding region (5′NCR), the first three-quarters of the core region, the E2–NS2 junction and the first 400 nt of the NS3 region. We document the use of this methodology to analyse the structural degree of a large region of HCV genomic RNA in two genotypes associated with different responses to IFN treatment. The results reported here show different structural degree along the genome regions analysed, and differential hybridization patterns for distinct genotypes in NS2 and NS3 HCV regions. PMID:15247323

  6. Complete nucleotide sequence of the Cryptomeria japonica D. Don. chloroplast genome and comparative chloroplast genomics: diversified genomic structure of coniferous species

    PubMed Central

    Hirao, Tomonori; Watanabe, Atsushi; Kurita, Manabu; Kondo, Teiji; Takata, Katsuhiko

    2008-01-01

    Background The recent determination of complete chloroplast (cp) genomic sequences of various plant species has enabled numerous comparative analyses as well as advances in plant and genome evolutionary studies. In angiosperms, the complete cp genome sequences of about 70 species have been determined, whereas those of only three gymnosperm species, Cycas taitungensis, Pinus thunbergii, and Pinus koraiensis have been established. The lack of information regarding the gene content and genomic structure of gymnosperm cp genomes may severely hamper further progress of plant and cp genome evolutionary studies. To address this need, we report here the complete nucleotide sequence of the cp genome of Cryptomeria japonica, the first in the Cupressaceae sensu lato of gymnosperms, and provide a comparative analysis of their gene content and genomic structure that illustrates the unique genomic features of gymnosperms. Results The C. japonica cp genome is 131,810 bp in length, with 112 single copy genes and two duplicated (trnI-CAU, trnQ-UUG) genes that give a total of 116 genes. Compared to other land plant cp genomes, the C. japonica cp has lost one of the relevant large inverted repeats (IRs) found in angiosperms, fern, liverwort, and gymnosperms, such as Cycas and Gingko, and additionally has completely lost its trnR-CCG, partially lost its trnT-GGU, and shows diversification of accD. The genomic structure of the C. japonica cp genome also differs significantly from those of other plant species. For example, we estimate that a minimum of 15 inversions would be required to transform the gene organization of the Pinus thunbergii cp genome into that of C. japonica. In the C. japonica cp genome, direct repeat and inverted repeat sequences are observed at the inversion and translocation endpoints, and these sequences may be associated with the genomic rearrangements. Conclusion The observed differences in genomic structure between C. japonica and other land plants, including

  7. A Genome Wide Survey of SNP Variation Reveals the Genetic Structure of Sheep Breeds

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genetic structure of sheep reflects their domestication and subsequent formation into discrete breeds. Understanding genetic structure is essential for achieving genetic improvement through genome-wide association studies, genomic selection and the dissection of quantitative traits. After identi...

  8. Lithospheric structure of the southern French Alps inferred from broadband analysis

    NASA Astrophysics Data System (ADS)

    Bertrand, E.; Deschamps, A.

    2000-11-01

    Broadband receiver functions analysis is commonly used to evaluate the fine-scale S-velocity structure of the lithosphere. We analyse teleseismic P-waves and their coda from 30 selected teleseismic events recorded at three seismological stations of to the French TGRS network in the Alpes Maritimes. Receiver functions are computed in the time domain using an SVD matrix inversion method. Dipping Moho and lateral heterogeneities beneath the array are inferred from the amplitude, arrival time and polarity of locally-generated PS phases. We propose that the Moho dips 11° towards 25°±10°N below station CALF, in the outer part of the Alpine belt. At this station, we determine a Moho depth of about 20±2 km; the same depth is suggested below SAOF station also located in the fold-trust belt. Beneath station STET located in the inner part of the Alpine belt, the Moho depth increases to 30 km and dips towards the N-NW. Moreover, 1D-modelling of summed receiver function from STET station constrains a crustal structure significantly different from that observed at stations located in the outer part of the Alps. Indeed, beneath CALF and SAOF stations we need a 2 km thick shallow low velocity layer to fit best the observed receiver functions whereas this layer seems not to be present beneath STET station. Because recent P-coda studies have shown that near-receiver scattering can dominate teleseismic P-wave recordings in tectonically complicated areas, we account for effect of scattering energy in our records from array measurements. As the array aperture is wide relative to the heterogeneity scale length in the area, the array analysis produces only smooth imaging of scatterers beneath the stations.

  9. Transcriptional consequences of genomic structural aberrations in breast cancer

    PubMed Central

    Inaki, Koichiro; Hillmer, Axel M.; Ukil, Leena; Yao, Fei; Woo, Xing Yi; Vardy, Leah A.; Zawack, Kelson Folkvard Braaten; Lee, Charlie Wah Heng; Ariyaratne, Pramila Nuwantha; Chan, Yang Sun; Desai, Kartiki Vasant; Bergh, Jonas; Hall, Per; Putti, Thomas Choudary; Ong, Wai Loon; Shahab, Atif; Cacheux-Rataboul, Valere; Karuturi, Radha Krishna Murthy; Sung, Wing-Kin; Ruan, Xiaoan; Bourque, Guillaume; Ruan, Yijun; Liu, Edison T.

    2011-01-01

    Using a long-span, paired-end deep sequencing strategy, we have comprehensively identified cancer genome rearrangements in eight breast cancer genomes. Herein, we show that 40%–54% of these structural genomic rearrangements result in different forms of fusion transcripts and that 44% are potentially translated. We find that single segmental tandem duplication spanning several genes is a major source of the fusion gene transcripts in both cell lines and primary tumors involving adjacent genes placed in the reverse-order position by the duplication event. Certain other structural mutations, however, tend to attenuate gene expression. From these candidate gene fusions, we have found a fusion transcript (RPS6KB1–VMP1) recurrently expressed in ∼30% of breast cancers associated with potential clinical consequences. This gene fusion is caused by tandem duplication on 17q23 and appears to be an indicator of local genomic instability altering the expression of oncogenic components such as MIR21 and RPS6KB1. PMID:21467264

  10. Evolution of genomic structures on Mammalian sex chromosomes.

    PubMed

    Katsura, Yukako; Iwase, Mineyo; Satta, Yoko

    2012-04-01

    Throughout mammalian evolution, recombination between the two sex chromosomes was suppressed in a stepwise manner. It is thought that the suppression of recombination led to an accumulation of deleterious mutations and frequent genomic rearrangements on the Y chromosome. In this article, we review three evolutionary aspects related to genomic rearrangements and structures, such as inverted repeats (IRs) and palindromes (PDs), on the mammalian sex chromosomes. First, we describe the stepwise manner in which recombination between the X and Y chromosomes was suppressed in placental mammals and discuss a genomic rearrangement that might have led to the formation of present pseudoautosomal boundaries (PAB). Second, we describe ectopic gene conversion between the X and Y chromosomes, and propose possible molecular causes. Third, we focus on the evolutionary mode and timing of PD formation on the X and Y chromosomes. The sequence of the chimpanzee Y chromosome was recently published by two groups. Both groups suggest that rapid evolution of genomic structure occurred on the Y chromosome. Our re-analysis of the sequences confirmed the species-specific mode of human and chimpanzee Y chromosomal evolution. Finally, we present a general outlook regarding the rapid evolution of mammalian sex chromosomes. PMID:23024603

  11. Six-layer structure for genomics and its applications.

    PubMed

    Kamatani, Naoyuki

    2016-03-01

    The term 'genetics' was coined before an understanding of DNA sequence data was achieved, and it is now insufficient to describe the broad areas in which DNA data have important roles. The term genomics is more broadly descriptive, but it does not provide a satisfactory conceptual framework that scientists can share. Here I propose a six-layer structure that describes the entire scientific field for 'genomics'. The proposed layers are 'life' as the uppermost layer, followed by 'species', 'population', 'family', 'individual' and finally 'cell' as the bottommost layer. In each pair of adjacent layers, each member of the upper layer comprises a set of members of the lower layer. In each layer, we can define consistent partial orders of members based on genomic data in the forms of phylogenic and pedigree trees. Although total orders such as those defined for time and space in physics cannot be defined in biology, defining consistent partial orders allows mathematical analysis to be performed. I will show that mathematical genetics studies can be understood as attempts to bridge gaps between layers of the proposed six-layer structure, while genetic tests can be understood as procedures to differentiate among members of each layer by using genomic data. PMID:26559752

  12. Action starring narratives and events: Structure and inference in visual narrative comprehension

    PubMed Central

    Cohn, Neil; Wittenberg, Eva

    2015-01-01

    Studies of discourse have long placed focus on the inference generated by information that is not overtly expressed, and theories of visual narrative comprehension similarly focused on the inference generated between juxtaposed panels. Within the visual language of comics, star-shaped “flashes” commonly signify impacts, but can be enlarged to the size of a whole panel that can omit all other representational information. These “action star” panels depict a narrative culmination (a “Peak”), but have content which readers must infer, thereby posing a challenge to theories of inference generation in visual narratives that focus only on the semantic changes between juxtaposed images. This paper shows that action stars demand more inference than depicted events, and that they are more coherent in narrative sequences than scrambled sequences (Experiment 1). In addition, action stars play a felicitous narrative role in the sequence (Experiment 2). Together, these results suggest that visual narratives use conventionalized depictions that demand the generation of inferences while retaining narrative coherence of a visual sequence. PMID:26709362

  13. Deciphering the fine-structure of tribal admixture in the Bedouin population using genomic data

    PubMed Central

    Markus, B; Alshafee, I; Birk, O S

    2014-01-01

    The Bedouin Israeli population is highly inbred and structured with a very high prevalence of recessive diseases. Many studies in the past two decades focused on linkage analysis in large, multiple consanguineous pedigrees of this population. The advent of high-throughput technologies motivated researchers to search for rare variants shared between smaller pedigrees, integrating data from clinically similar yet seemingly non-related sporadic cases. However, such analyses are challenging because, without pedigree data, there is no prior knowledge regarding possible relatedness between the sporadic cases. Here, we describe models and techniques for the study of relationships between pedigrees and use them for the inference of tribal co-ancestry, delineating the complex social interactions between different tribes in the Negev Bedouins of southern Israel. Through our analysis, we differentiate between tribes that share many yet small genomic segments because of co-ancestry versus tribes that share larger segments because of recent admixture. The emergent pattern is well correlated with the prevalence of rare mutations in the different tribes. Tribes that do not intermarry, mostly because of social restrictions, hold private mutations, whereas tribes that do intermarry demonstrate a genetic flow of mutations between them. Thus, social structure within an inbred community can be delineated through genomic data, with implications to genetic counseling and genetic mapping. PMID:24084643

  14. A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference.

    PubMed

    Shen, Xing-Xing; Salichos, Leonidas; Rokas, Antonis

    2016-01-01

    Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal

  15. A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference

    PubMed Central

    Shen, Xing-Xing; Salichos, Leonidas; Rokas, Antonis

    2016-01-01

    Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal

  16. Population Genetic Structure of the Cotton Bollworm Helicoverpa armigera (Hübner) (Lepidoptera: Noctuidae) in India as Inferred from EPIC-PCR DNA Markers

    PubMed Central

    Behere, Gajanan Tryambak; Tay, Wee Tek; Russell, Derek Alan; Kranthi, Keshav Raj; Batterham, Philip

    2013-01-01

    Helicoverpa armigera is an important pest of cotton and other agricultural crops in the Old World. Its wide host range, high mobility and fecundity, and the ability to adapt and develop resistance against all common groups of insecticides used for its management have exacerbated its pest status. An understanding of the population genetic structure in H. armigera under Indian agricultural conditions will help ascertain gene flow patterns across different agricultural zones. This study inferred the population genetic structure of Indian H. armigera using five Exon-Primed Intron-Crossing (EPIC)-PCR markers. Nested alternative EPIC markers detected moderate null allele frequencies (4.3% to 9.4%) in loci used to infer population genetic structure but the apparently genome-wide heterozygote deficit suggests in-breeding or a Wahlund effect rather than a null allele effect. Population genetic analysis of the 26 populations suggested significant genetic differentiation within India but especially in cotton-feeding populations in the 2006–07 cropping season. In contrast, overall pair-wise FST estimates from populations feeding on food crops indicated no significant population substructure irrespective of cropping seasons. A Baysian cluster analysis was used to assign the genetic make-up of individuals to likely membership of population clusters. Some evidence was found for four major clusters with individuals in two populations from cotton in one year (from two populations in northern India) showing especially high homogeneity. Taken as a whole, this study found evidence of population substructure at host crop, temporal and spatial levels in Indian H. armigera, without, however, a clear biological rationale for these structures being evident. PMID:23326431

  17. Genome-Wide Approaches for RNA Structure Probing.

    PubMed

    Silverman, Ian M; Berkowitz, Nathan D; Gosai, Sager J; Gregory, Brian D

    2016-01-01

    RNA molecules of all types fold into complex secondary and tertiary structures that are important for their function and regulation. Structural and catalytic RNAs such as ribosomal RNA (rRNA) and transfer RNA (tRNA) are central players in protein synthesis, and only function through their proper folding into intricate three-dimensional structures. Studies of messenger RNA (mRNA) regulation have also revealed that structural elements embedded within these RNA species are important for the proper regulation of their total level in the transcriptome. More recently, the discovery of microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) has shed light on the importance of RNA structure to genome, transcriptome, and proteome regulation. Due to the relatively small number, high conservation, and importance of structural and catalytic RNAs to all life, much early work in RNA structure analysis mapped out a detailed view of these molecules. Computational and physical methods were used in concert with enzymatic and chemical structure probing to create high-resolution models of these fundamental biological molecules. However, the recent expansion in our knowledge of the importance of RNA structure to coding and regulatory RNAs has left the field in need of faster and scalable methods for high-throughput structural analysis. To address this, nuclease and chemical RNA structure probing methodologies have been adapted for genome-wide analysis. These methods have been deployed to globally characterize thousands of RNA structures in a single experiment. Here, we review these experimental methodologies for high-throughput RNA structure determination and discuss the insights gained from each approach. PMID:27256381

  18. Target Selection and Determination of Function in Structural Genomics

    PubMed Central

    Watson, James D.; Todd, Annabel E.; Bray, James; Laskowski, Roman A.; Edwards, Aled; Joachimiak, Andrzej; Orengo, Christine A.; Thornton, Janet M.

    2011-01-01

    Summary The first crucial step in any structural genomics project is the selection and prioritization of target proteins for structure determination. There may be a number of selection criteria to be satisfied, including that the proteins have novel folds, that they be representatives of large families for which no structure is known, and so on. The better the selection at this stage, the greater is the value of the structures obtained at the end of the experimental process. This value can be further enhanced once the protein structures have been solved if the functions of the given proteins can also be determined. Here we describe the methods used at either end of the experimental process: firstly, sensitive sequence comparison techniques for selecting a high-quality list of target proteins, and secondly the various computational methods that can be applied to the eventual 3D structures to determine the most likely biochemical function of the proteins in question. PMID:12880206

  19. Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches

    SciTech Connect

    Chandonia, John-Marc; Brenner, Steven E.

    2004-07-14

    The structural genomics project is an international effort to determine the three-dimensional shapes of all important biological macromolecules, with a primary focus on proteins. Target proteins should be selected according to a strategy which is medically and biologically relevant, of good value, and tractable. As an option to consider, we present the Pfam5000 strategy, which involves selecting the 5000 most important families from the Pfam database as sources for targets. We compare the Pfam5000 strategy to several other proposed strategies that would require similar numbers of targets. These include including complete solution of several small to moderately sized bacterial proteomes, partial coverage of the human proteome, and random selection of approximately 5000 targets from sequenced genomes. We measure the impact that successful implementation of these strategies would have upon structural interpretation of the proteins in Swiss-Prot, TrEMBL, and 131 complete proteomes (including 10 of eukaryotes) from the Proteome Analysis database at EBI. Solving the structures of proteins from the 5000 largest Pfam families would allow accurate fold assignment for approximately 68 percent of all prokaryotic proteins (covering 59 percent of residues) and 61 percent of eukaryotic proteins (40 percent of residues). More fine-grained coverage which would allow accurate modeling of these proteins would require an order of magnitude more targets. The Pfam5000 strategy may be modified in several ways, for example to focus on larger families, bacterial sequences, or eukaryotic sequences; as long as secondary consideration is given to large families within Pfam, coverage results vary only slightly. In contrast, focusing structural genomics on a single tractable genome would have only a limited impact in structural knowledge of other proteomes: a significant fraction (about 30-40 percent of the proteins, and 40-60 percent of the residues) of each proteome is classified in small

  20. Genome-wide inference of protein interaction sites: lessons from the yeast high-quality negative protein–protein interaction dataset

    PubMed Central

    Guo, Jie; Wu, Xiaomei; Zhang, Da-Yong; Lin, Kui

    2008-01-01

    High-throughput studies of protein interactions may have produced, experimentally and computationally, the most comprehensive protein–protein interaction datasets in the completely sequenced genomes. It provides us an opportunity on a proteome scale, to discover the underlying protein interaction patterns. Here, we propose an approach to discovering motif pairs at interaction sites (often 3–8 residues) that are essential for understanding protein functions and helpful for the rational design of protein engineering and folding experiments. A gold standard positive (interacting) dataset and a gold standard negative (non-interacting) dataset were mined to infer the interacting motif pairs that are significantly overrepresented in the positive dataset compared to the negative dataset. Four negative datasets assembled by different strategies were evaluated and the one with the best performance was used as the gold standard negatives for further analysis. Meanwhile, to assess the efficiency of our method in detecting potential interacting motif pairs, other approaches developed previously were compared, and we found that our method achieved the highest prediction accuracy. In addition, many uncharacterized motif pairs of interest were found to be functional with experimental evidence in other species. This investigation demonstrates the important effects of a high-quality negative dataset on the performance of such statistical inference. PMID:18281313

  1. Genomic organization and gene expression of the multiple globins in Atlantic cod: conservation of globin-flanking genes in chordates infers the origin of the vertebrate globin clusters

    PubMed Central

    2010-01-01

    Background The vertebrate globin genes encoding the α- and β-subunits of the tetrameric hemoglobins are clustered at two unlinked loci. The highly conserved linear order of the genes flanking the hemoglobins provides a strong anchor for inferring common ancestry of the globin clusters. In fish, the number of α-β-linked globin genes varies considerably between different sublineages and seems to be related to prevailing physico-chemical conditions. Draft sequences of the Atlantic cod genome enabled us to determine the genomic organization of the globin repertoire in this marine species that copes with fluctuating environments of the temperate and Arctic regions. Results The Atlantic cod genome was shown to contain 14 globin genes, including nine hemoglobin genes organized in two unlinked clusters designated β5-α1-β1-α4 and β3-β4-α2-α3-β2. The diverged cod hemoglobin genes displayed different expression levels in adult fish, and tetrameric hemoglobins with or without a Root effect were predicted. The novel finding of maternally inherited hemoglobin mRNAs is consistent with a potential role played by fish hemoglobins in the non-specific immune response. In silico analysis of the six teleost genomes available showed that the two α-β globin clusters are flanked by paralogs of five duplicated genes, in agreement with the proposed teleost-specific duplication of the ancestral vertebrate globin cluster. Screening the genome of extant urochordate and cephalochordate species for conserved globin-flanking genes revealed linkage of RHBDF1, MPG and ARHGAP17 to globin genes in the tunicate Ciona intestinalis, while these genes together with LCMT are closely positioned in amphioxus (Branchiostoma floridae), but seem to be unlinked to the multiple globin genes identified in this species. Conclusion The plasticity of Atlantic cod to variable environmental conditions probably involves the expression of multiple globins with potentially different properties. The

  2. The Mitochondrial Genome of Soybean Reveals Complex Genome Structures and Gene Evolution at Intercellular and Phylogenetic Levels

    PubMed Central

    Chang, Shengxin; Wang, Yankun; Lu, Jiangjie; Gai, Junyi; Li, Jijie; Chu, Pu; Guan, Rongzhan; Zhao, Tuanjie

    2013-01-01

    Determining mitochondrial genomes is important for elucidating vital activities of seed plants. Mitochondrial genomes are specific to each plant species because of their variable size, complex structures and patterns of gene losses and gains during evolution. This complexity has made research on the soybean mitochondrial genome difficult compared with its nuclear and chloroplast genomes. The present study helps to solve a 30-year mystery regarding the most complex mitochondrial genome structure, showing that pairwise rearrangements among the many large repeats may produce an enriched molecular pool of 760 circles in seed plants. The soybean mitochondrial genome harbors 58 genes of known function in addition to 52 predicted open reading frames of unknown function. The genome contains sequences of multiple identifiable origins, including 6.8 kb and 7.1 kb DNA fragments that have been transferred from the nuclear and chloroplast genomes, respectively, and some horizontal DNA transfers. The soybean mitochondrial genome has lost 16 genes, including nine protein-coding genes and seven tRNA genes; however, it has acquired five chloroplast-derived genes during evolution. Four tRNA genes, common among the three genomes, are derived from the chloroplast. Sizeable DNA transfers to the nucleus, with pericentromeric regions as hotspots, are observed, including DNA transfers of 125.0 kb and 151.6 kb identified unambiguously from the soybean mitochondrial and chloroplast genomes, respectively. The soybean nuclear genome has acquired five genes from its mitochondrial genome. These results provide biological insights into the mitochondrial genome of seed plants, and are especially helpful for deciphering vital activities in soybean. PMID:23431381

  3. Northern Bobwhite (Colinus virginianus) Mitochondrial Population Genomics Reveals Structure, Divergence, and Evidence for Heteroplasmy

    PubMed Central

    Halley, Yvette A.; Oldeschulte, David L.; Bhattarai, Eric K.; Hill, Joshua; Metz, Richard P.; Johnson, Charles D.; Presley, Steven M.; Ruzicka, Rebekah E.; Rollins, Dale; Peterson, Markus J.; Murphy, William J.; Seabury, Christopher M.

    2015-01-01

    Herein, we evaluated the concordance of population inferences and conclusions resulting from the analysis of short mitochondrial fragments (i.e., partial or complete D-Loop nucleotide sequences) versus complete mitogenome sequences for 53 bobwhites representing six ecoregions across TX and OK (USA). Median joining (MJ) haplotype networks demonstrated that analyses performed using small mitochondrial fragments were insufficient for estimating the true (i.e., complete) mitogenome haplotype structure, corresponding levels of divergence, and maternal population history of our samples. Notably, discordant demographic inferences were observed when mismatch distributions of partial (i.e., partial D-Loop) versus complete mitogenome sequences were compared, with the reduction in mitochondrial genomic information content observed to encourage spurious inferences in our samples. A probabilistic approach to variant prediction for the complete bobwhite mitogenomes revealed 344 segregating sites corresponding to 347 total mutations, including 49 putative nonsynonymous single nucleotide variants (SNVs) distributed across 12 protein coding genes. Evidence of gross heteroplasmy was observed for 13 bobwhites, with 10 of the 13 heteroplasmies involving one moderate to high frequency SNV. Haplotype network and phylogenetic analyses for the complete bobwhite mitogenome sequences revealed two divergent maternal lineages (dXY = 0.00731; FST = 0.849; P < 0.05), thereby supporting the potential for two putative subspecies. However, the diverged lineage (n = 103 variants) almost exclusively involved bobwhites geographically classified as Colinus virginianus texanus, which is discordant with the expectations of previous geographic subspecies designations. Tests of adaptive evolution for functional divergence (MKT), frequency distribution tests (D, FS) and phylogenetic analyses (RAxML) provide no evidence for positive selection or hybridization with the sympatric scaled quail (Callipepla

  4. Northern Bobwhite (Colinus virginianus) Mitochondrial Population Genomics Reveals Structure, Divergence, and Evidence for Heteroplasmy.

    PubMed

    Halley, Yvette A; Oldeschulte, David L; Bhattarai, Eric K; Hill, Joshua; Metz, Richard P; Johnson, Charles D; Presley, Steven M; Ruzicka, Rebekah E; Rollins, Dale; Peterson, Markus J; Murphy, William J; Seabury, Christopher M

    2015-01-01

    Herein, we evaluated the concordance of population inferences and conclusions resulting from the analysis of short mitochondrial fragments (i.e., partial or complete D-Loop nucleotide sequences) versus complete mitogenome sequences for 53 bobwhites representing six ecoregions across TX and OK (USA). Median joining (MJ) haplotype networks demonstrated that analyses performed using small mitochondrial fragments were insufficient for estimating the true (i.e., complete) mitogenome haplotype structure, corresponding levels of divergence, and maternal population history of our samples. Notably, discordant demographic inferences were observed when mismatch distributions of partial (i.e., partial D-Loop) versus complete mitogenome sequences were compared, with the reduction in mitochondrial genomic information content observed to encourage spurious inferences in our samples. A probabilistic approach to variant prediction for the complete bobwhite mitogenomes revealed 344 segregating sites corresponding to 347 total mutations, including 49 putative nonsynonymous single nucleotide variants (SNVs) distributed across 12 protein coding genes. Evidence of gross heteroplasmy was observed for 13 bobwhites, with 10 of the 13 heteroplasmies involving one moderate to high frequency SNV. Haplotype network and phylogenetic analyses for the complete bobwhite mitogenome sequences revealed two divergent maternal lineages (dXY = 0.00731; FST = 0.849; P < 0.05), thereby supporting the potential for two putative subspecies. However, the diverged lineage (n = 103 variants) almost exclusively involved bobwhites geographically classified as Colinus virginianus texanus, which is discordant with the expectations of previous geographic subspecies designations. Tests of adaptive evolution for functional divergence (MKT), frequency distribution tests (D, FS) and phylogenetic analyses (RAxML) provide no evidence for positive selection or hybridization with the sympatric scaled quail (Callipepla

  5. USING CORONAL CELLS TO INFER THE MAGNETIC FIELD STRUCTURE AND CHIRALITY OF FILAMENT CHANNELS

    SciTech Connect

    Sheeley, N. R. Jr.; Warren, H. P.; Martin, S. F.; Panasenco, O.

    2013-08-01

    Coronal cells are visible at temperatures of {approx}1.2 MK in Fe XII coronal images obtained from the Solar Dynamics Observatory and Solar Terrestrial Relations Observatory spacecraft. We show that near a filament channel, the plumelike tails of these cells bend horizontally in opposite directions on the two sides of the channel like fibrils in the chromosphere. Because the cells are rooted in magnetic flux concentrations of majority polarity, these observations can be used with photospheric magnetograms to infer the direction of the horizontal field in filament channels and the chirality of the associated magnetic field. This method is similar to the procedure for inferring the direction of the magnetic field and the chirality of the fibril pattern in filament channels from H{alpha} observations. However, the coronal cell observations are easier to use and provide clear inferences of the horizontal field direction for heights up to {approx}50 Mm into the corona.

  6. Bayesian inference of Earth's radial seismic structure from body-wave traveltimes using neural networks

    NASA Astrophysics Data System (ADS)

    de Wit, Ralph W. L.; Valentine, Andrew P.; Trampert, Jeannot

    2013-10-01

    How do body-wave traveltimes constrain the Earth's radial (1-D) seismic structure? Existing 1-D seismological models underpin 3-D seismic tomography and earthquake location algorithms. It is therefore crucial to assess the quality of such 1-D models, yet quantifying uncertainties in seismological models is challenging and thus often ignored. Ideally, quality assessment should be an integral part of the inverse method. Our aim in this study is twofold: (i) we show how to solve a general Bayesian non-linear inverse problem and quantify model uncertainties, and (ii) we investigate the constraint on spherically symmetric P-wave velocity (VP) structure provided by body-wave traveltimes from the EHB bulletin (phases Pn, P, PP and PKP). Our approach is based on artificial neural networks, which are very common in pattern recognition problems and can be used to approximate an arbitrary function. We use a Mixture Density Network to obtain 1-D marginal posterior probability density functions (pdfs), which provide a quantitative description of our knowledge on the individual Earth parameters. No linearization or model damping is required, which allows us to infer a model which is constrained purely by the data. We present 1-D marginal posterior pdfs for the 22 VP parameters and seven discontinuity depths in our model. P-wave velocities in the inner core, outer core and lower mantle are resolved well, with standard deviations of ˜0.2 to 1 per cent with respect to the mean of the posterior pdfs. The maximum likelihoods of VP are in general similar to the corresponding ak135 values, which lie within one or two standard deviations from the posterior means, thus providing an independent validation of ak135 in this part of the radial model. Conversely, the data contain little or no information on P-wave velocity in the D'' layer, the upper mantle and the homogeneous crustal layers. Further, the data do not constrain the depth of the discontinuities in our model. Using additional

  7. Oceanic Domains - Observed Relationship With Tomographic Features and Inferred Mantle Structure

    NASA Astrophysics Data System (ADS)

    Loubet, M.

    A persistent contradiction exists between the current views of mantle stratification derived from geochemistry and number of geophysical and simulations which sug- gest the existence of a significant material exchange throughout the entire mantle and favor mixing processes. In this presentation, we will show that the common interpre- tation of oceanic basalt heterogeneities can be contested and that a new interpretation of these heterogeneities can be done which leads to interesting relationships between geochemical and geophysical (tomographic) features. The new approach is based on (a) identification of mantle heterogeneities at the scale of oceanic domains recovering in some cases MORB and OIB basalt types and (b) use of incompatible element ratios in (Cx/Cz,Cy/Cz) representations as in particular the (Th/La,Nb/La) representation. This last representation is very interesting for identification of magmatic processes and for estimating magma sources compositions. Analysis of oceanic basalts compo- sitions based on a large set of literature data leads to identify 4 (eventually 5) large scale oceanic domains: Atlantic East Pacific (AEP), Indian ocean (IO), South Central Pacific (SCP), Kerguelen South Atlantic (KSA) (and eventually Hawaï (H)). The two first ones which include MORB sources extend at upper mantle levels. The good geo- graphical recovery of the SCP and KSA domains with tomographic features assigned to take place within the mantle at the D" level in the Central Pacific and South Africa (Masters et al., 2000) leads to interpret the basalts from the KSA and SCP domains as issued from D" layer source. Two different mantle structures (general ones before discussing more complex ones), both comprising a D" layer (composed of recycled oceanic crust enriched materials) at the CMB, can be inferred from these oceanic basalt source interpretations: (a) a layered mantle with an upper and a lower mantle with primitive mantle material composing a significant part of

  8. A spruce gene map infers ancient plant genome reshuffling and subsequent slow evolution in the gymnosperm lineage leading to extant conifers

    PubMed Central

    2012-01-01

    Background Seed plants are composed of angiosperms and gymnosperms, which diverged from each other around 300 million years ago. While much light has been shed on the mechanisms and rate of genome evolution in flowering plants, such knowledge remains conspicuously meagre for the gymnosperms. Conifers are key representatives of gymnosperms and the sheer size of their genomes represents a significant challenge for characterization, sequencing and assembling. Results To gain insight into the macro-organisation and long-term evolution of the conifer genome, we developed a genetic map involving 1,801 spruce genes. We designed a statistical approach based on kernel density estimation to analyse gene density and identified seven gene-rich isochors. Groups of co-localizing genes were also found that were transcriptionally co-regulated, indicative of functional clusters. Phylogenetic analyses of 157 gene families for which at least two duplicates were mapped on the spruce genome indicated that ancient gene duplicates shared by angiosperms and gymnosperms outnumbered conifer-specific duplicates by a ratio of eight to one. Ancient duplicates were much more translocated within and among spruce chromosomes than conifer-specific duplicates, which were mostly organised in tandem arrays. Both high synteny and collinearity were also observed between the genomes of spruce and pine, two conifers that diverged more than 100 million years ago. Conclusions Taken together, these results indicate that much genomic evolution has occurred in the seed plant lineage before the split between gymnosperms and angiosperms, and that the pace of evolution of the genome macro-structure has been much slower in the gymnosperm lineage leading to extent conifers than that seen for the same period of time in flowering plants. This trend is largely congruent with the contrasted rates of diversification and morphological evolution observed between these two groups of seed plants. PMID:23102090

  9. The phylogenetic position of the roughskin skate Dipturus trachyderma (Krefft & Stehmann, 1975) (Rajiformes, Rajidae) inferred from the mitochondrial genome.

    PubMed

    Vargas-Caro, Carolina; Bustamante, Carlos; Lamilla, Julio; Bennett, Michael B; Ovenden, Jennifer R

    2016-07-01

    The complete mitochondrial genome of the roughskin skate Dipturus trachyderma is described from 1 455 724 sequences obtained using Illumina NGS technology. Total length of the mitogenome was 16 909 base pairs, comprising 2 rRNAs, 13 protein-coding genes, 22 tRNAs and 2 non-coding regions. Phylogenetic analysis based on mtDNA revealed low genetic divergence among longnose skates, in particular, those dwelling the continental shelf and slope off the coasts of Chile and Argentina. PMID:26122334

  10. Viral genome structures are optimal for capsid assembly

    PubMed Central

    Perlmutter, Jason D; Qiao, Cong; Hagan, Michael F

    2013-01-01

    Understanding how virus capsids assemble around their nucleic acid (NA) genomes could promote efforts to block viral propagation or to reengineer capsids for gene therapy applications. We develop a coarse-grained model of capsid proteins and NAs with which we investigate assembly dynamics and thermodynamics. In contrast to recent theoretical models, we find that capsids spontaneously ‘overcharge’; that is, the negative charge of the NA exceeds the positive charge on capsid. When applied to specific viruses, the optimal NA lengths closely correspond to the natural genome lengths. Calculations based on linear polyelectrolytes rather than base-paired NAs underpredict the optimal length, demonstrating the importance of NA structure to capsid assembly. These results suggest that electrostatics, excluded volume, and NA tertiary structure are sufficient to predict assembly thermodynamics and that the ability of viruses to selectively encapsidate their genomic NAs can be explained, at least in part, on a thermodynamic basis. DOI: http://dx.doi.org/10.7554/eLife.00632.001 PMID:23795290

  11. Meet me halfway: when genomics meets structural bioinformatics.

    PubMed

    Gong, Sungsam; Worth, Catherine L; Cheng, Tammy M K; Blundell, Tom L

    2011-06-01

    The DNA sequencing technology developed by Frederick Sanger in the 1970s established genomics as the basis of comparative genetics. The recent invention of next-generation sequencing (NGS) platform has added a new dimension to genome research by generating ultra-fast and high-throughput sequencing data in an unprecedented manner. The advent of NGS technology also provides the opportunity to study genetic diseases where sequence variants or mutations are sought to establish a causal relationship with disease phenotypes. However, it is not a trivial task to seek genetic variants responsible for genetic diseases and even harder for complex diseases such as diabetes and cancers. In such polygenic diseases, multiple genes and alleles, which can exist in healthy individuals, come together to contribute to common disease phenotypes in a complex manner. Hence, it is desirable to have an approach that integrates omics data with both knowledge of protein structure and function and an understanding of networks/pathways, i.e. functional genomics and systems biology; in this way, genotype-phenotype relationships can be better understood. In this review, we bring this 'bottom-up' approach alongside the current NGS-driven genetic study of genetic variations and disease aetiology. We describe experimental and computational techniques for assessing genetic variants and their deleterious effects on protein structure and function. PMID:21350909

  12. Consistency and inconsistency of consensus methods for inferring species trees from gene trees in the presence of ancestral population structure.

    PubMed

    DeGiorgio, Michael; Rosenberg, Noah A

    2016-08-01

    In the last few years, several statistically consistent consensus methods for species tree inference have been devised that are robust to the gene tree discordance caused by incomplete lineage sorting in unstructured ancestral populations. One source of gene tree discordance that has only recently been identified as a potential obstacle for phylogenetic inference is ancestral population structure. In this article, we describe a general model of ancestral population structure, and by relying on a single carefully constructed example scenario, we show that the consensus methods Democratic Vote, STEAC, STAR, R(∗) Consensus, Rooted Triple Consensus, Minimize Deep Coalescences, and Majority-Rule Consensus are statistically inconsistent under the model. We find that among the consensus methods evaluated, the only method that is statistically consistent in the presence of ancestral population structure is GLASS/Maximum Tree. We use simulations to evaluate the behavior of the various consensus methods in a model with ancestral population structure, showing that as the number of gene trees increases, estimates on the basis of GLASS/Maximum Tree approach the true species tree topology irrespective of the level of population structure, whereas estimates based on the remaining methods only approach the true species tree topology if the level of structure is low. However, through simulations using species trees both with and without ancestral population structure, we show that GLASS/Maximum Tree performs unusually poorly on gene trees inferred from alignments with little information. This practical limitation of GLASS/Maximum Tree together with the inconsistency of other methods prompts the need for both further testing of additional existing methods and development of novel methods under conditions that incorporate ancestral population structure. PMID:27086043

  13. Adaptive change inferred from genomic population analysis of the ST93 epidemic clone of community-associated methicillin-resistant Staphylococcus aureus.

    PubMed

    Stinear, Timothy P; Holt, Kathryn E; Chua, Kyra; Stepnell, Justin; Tuck, Kellie L; Coombs, Geoffrey; Harrison, Paul Francis; Seemann, Torsten; Howden, Benjamin P

    2014-02-01

    Community-associated methicillin-resistant Staphylococcus aureus (CA-MRSA) has emerged as a major public health problem around the world. In Australia, ST93-IV[2B] is the dominant CA-MRSA clone and displays significantly greater virulence than other S. aureus. Here, we have examined the evolution of ST93 via genomic analysis of 12 MSSA and 44 MRSA ST93 isolates, collected from around Australia over a 17-year period. Comparative analysis revealed a core genome of 2.6 Mb, sharing greater than 99.7% nucleotide identity. The accessory genome was 0.45 Mb and comprised additional mobile DNA elements, harboring resistance to erythromycin, trimethoprim, and tetracycline. Phylogenetic inference revealed a molecular clock and suggested that a single clone of methicillin susceptible, Panton-Valentine leukocidin (PVL) positive, ST93 S. aureus likely spread from North Western Australia in the early 1970s, acquiring methicillin resistance at least twice in the mid 1990s. We also explored associations between genotype and important MRSA phenotypes including oxacillin MIC and production of exotoxins (α-hemolysin [Hla], δ-hemolysin [Hld], PSMα3, and PVL). High-level expression of Hla is a signature feature of ST93 and reduced expression in eight isolates was readily explained by mutations in the agr locus. However, subtle but significant decreases in Hld were also noted over time that coincided with decreasing oxacillin resistance and were independent of agr mutations. The evolution of ST93 S. aureus is thus associated with a reduction in both exotoxin expression and oxacillin MIC, suggesting MRSA ST93 isolates are under pressure for adaptive change. PMID:24482534

  14. Adaptive Change Inferred from Genomic Population Analysis of the ST93 Epidemic Clone of Community-Associated Methicillin-Resistant Staphylococcus aureus

    PubMed Central

    Stinear, Timothy P.; Holt, Kathryn E.; Chua, Kyra; Stepnell, Justin; Tuck, Kellie L.; Coombs, Geoffrey; Harrison, Paul Francis; Seemann, Torsten; Howden, Benjamin P.

    2014-01-01

    Community-associated methicillin-resistant Staphylococcus aureus (CA-MRSA) has emerged as a major public health problem around the world. In Australia, ST93-IV[2B] is the dominant CA-MRSA clone and displays significantly greater virulence than other S. aureus. Here, we have examined the evolution of ST93 via genomic analysis of 12 MSSA and 44 MRSA ST93 isolates, collected from around Australia over a 17-year period. Comparative analysis revealed a core genome of 2.6 Mb, sharing greater than 99.7% nucleotide identity. The accessory genome was 0.45 Mb and comprised additional mobile DNA elements, harboring resistance to erythromycin, trimethoprim, and tetracycline. Phylogenetic inference revealed a molecular clock and suggested that a single clone of methicillin susceptible, Panton-Valentine leukocidin (PVL) positive, ST93 S. aureus likely spread from North Western Australia in the early 1970s, acquiring methicillin resistance at least twice in the mid 1990s. We also explored associations between genotype and important MRSA phenotypes including oxacillin MIC and production of exotoxins (α-hemolysin [Hla], δ-hemolysin [Hld], PSMα3, and PVL). High-level expression of Hla is a signature feature of ST93 and reduced expression in eight isolates was readily explained by mutations in the agr locus. However, subtle but significant decreases in Hld were also noted over time that coincided with decreasing oxacillin resistance and were independent of agr mutations. The evolution of ST93 S. aureus is thus associated with a reduction in both exotoxin expression and oxacillin MIC, suggesting MRSA ST93 isolates are under pressure for adaptive change. PMID:24482534

  15. Complete genome and molecular epidemiological data infer the maintenance of rabies among kudu (Tragelaphus strepsiceros) in Namibia.

    PubMed

    Scott, Terence P; Fischer, Melina; Khaiseb, Siegfried; Freuling, Conrad; Höper, Dirk; Hoffmann, Bernd; Markotter, Wanda; Müller, Thomas; Nel, Louis H

    2013-01-01

    Rabies in kudu is unique to Namibia and two major peaks in the epizootic have occurred since it was first noted in 1977. Due to the large numbers of kudu that were affected, it was suspected that horizontal transmission of rabies occurs among kudu and that rabies was being maintained independently within the Namibian kudu population - separate from canid cycles, despite geographic overlap. In this study, it was our aim to show, through phylogenetic analyses, that rabies was being maintained independently within the Namibian kudu population. We also tested, through complete genome sequencing of four rabies virus isolates from jackal and kudu, whether specific mutations occurred in the virus genome due to host adaptation. We found the separate grouping of all rabies isolates from kudu to those of any other canid species in Namibia, suggesting that rabies was being maintained independently in kudu. Additionally, we noted several mutations unique to isolates from kudu, suggesting that these mutations may be due to the adaptation of rabies to a new host. In conclusion, we show clear evidence that rabies is being maintained independently in the Namibian kudu population - a unique phenomenon with ecological and economic impacts. PMID:23527015

  16. Complete Genome and Molecular Epidemiological Data Infer the Maintenance of Rabies among Kudu (Tragelaphus strepsiceros) in Namibia

    PubMed Central

    Scott, Terence P.; Fischer, Melina; Khaiseb, Siegfried; Freuling, Conrad; Höper, Dirk; Hoffmann, Bernd; Markotter, Wanda; Müller, Thomas; Nel, Louis H.

    2013-01-01

    Rabies in kudu is unique to Namibia and two major peaks in the epizootic have occurred since it was first noted in 1977. Due to the large numbers of kudu that were affected, it was suspected that horizontal transmission of rabies occurs among kudu and that rabies was being maintained independently within the Namibian kudu population – separate from canid cycles, despite geographic overlap. In this study, it was our aim to show, through phylogenetic analyses, that rabies was being maintained independently within the Namibian kudu population. We also tested, through complete genome sequencing of four rabies virus isolates from jackal and kudu, whether specific mutations occurred in the virus genome due to host adaptation. We found the separate grouping of all rabies isolates from kudu to those of any other canid species in Namibia, suggesting that rabies was being maintained independently in kudu. Additionally, we noted several mutations unique to isolates from kudu, suggesting that these mutations may be due to the adaptation of rabies to a new host. In conclusion, we show clear evidence that rabies is being maintained independently in the Namibian kudu population – a unique phenomenon with ecological and economic impacts. PMID:23527015

  17. Structural Analysis of Treatment Cycles Representing Transitions between Nursing Organizational Units Inferred from Diabetes

    PubMed Central

    Dehmer, Matthias; Kurt, Zeyneb; Emmert-Streib, Frank; Them, Christa; Schulc, Eva; Hofer, Sabine

    2015-01-01

    In this paper, we investigate treatment cycles inferred from diabetes data by means of graph theory. We define the term treatment cycles graph-theoretically and perform a descriptive as well as quantitative analysis thereof. Also, we interpret our findings in terms of nursing and clinical management. PMID:26030296

  18. Rapid evolution and complex structural organization in genomic regions harboring multiple prolamin genes in the polyploid wheat genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genes encoding wheat prolamins belong to complicated multi-gene families in the wheat genome. To understand the structural complexity of storage protein loci, we sequenced and analyzed orthologous regions containing both gliadin and LMW-glutenin genes from the A and B genomes of a tetraploid wheat ...

  19. Genomic Characterization of Prenatally Detected Chromosomal Structural Abnormalities Using Oligonucleotide Array Comparative Genomic Hybridization

    PubMed Central

    Li, Peining; Pomianowski, Pawel; DiMaio, Miriam S.; Florio, Joanne R.; Rossi, Michael R.; Xiang, Bixia; Xu, Fang; Yang, Hui; Geng, Qian; Xie, Jiansheng; Mahoney, Maurice J.

    2013-01-01

    Detection of chromosomal structural abnormalities using conventional cytogenetic methods poses a challenge for prenatal genetic counseling due to unpredictable clinical outcomes and risk of recurrence. Of the 1,726 prenatal cases in a 3-year period, we performed oligonucleotide array comparative genomic hybridization (aCGH) analysis on 11 cases detected with various structural chromosomal abnormalities. In nine cases, genomic aberrations and gene contents involving a 3p distal deletion, a marker chromosome from chromosome 4, a derivative chromosome 5 from a 5p/7q translocation, a de novo distal 6q deletion, a recombinant chromosome 8 comprised of an 8p duplication and an 8q deletion, an extra derivative chromosome 9 from an 8p/9q translocation, mosaicism for chromosome 12q with added material of initially unknown origin, an unbalanced 13q/15q rearrangement, and a distal 18q duplication and deletion were delineated. An absence of pathogenic copy number changes was noted in one case with a de novo 11q/14q translocation and in another with a familial insertion of 21q into a 19q. Genomic characterization of the structural abnormalities aided in the prediction of clinical outcomes. These results demonstrated the value of aCGH analysis in prenatal cases with subtle or complex chromosomal rearrangements. Furthermore, a retrospective analysis of clinical indications of our prenatal cases showed that approximately 20% of them had abnormal ultrasound findings and should be considered as high risk pregnancies for a combined chromosome and aCGH analysis. PMID:21671377

  20. X-ray scattering data and structural genomics

    NASA Astrophysics Data System (ADS)

    Doniach, Sebastian

    2003-03-01

    High throughput structural genomics has the ambitious goal of determining the structure of all, or a very large number of protein folds using the high-resolution techniques of protein crystallography and NMR. However, the program is facing significant bottlenecks in reaching this goal, which include problems of protein expression and crystallization. In this talk, some preliminary results on how the low-resolution technique of small-angle X-ray solution scattering (SAXS) can help ameliorate some of these bottlenecks will be presented. One of the most significant bottlenecks arises from the difficulty of crystallizing integral membrane proteins, where only a handful of structures are available compared to thousands of structures for soluble proteins. By 3-dimensional reconstruction from SAXS data, the size and shape of detergent-solubilized integral membrane proteins can be characterized. This information can then be used to classify membrane proteins which constitute some 25% of all genomes. SAXS may also be used to study the dependence of interparticle interference scattering on solvent conditions so that regions of the protein solution phase diagram which favor crystallization can be elucidated. As a further application, SAXS may be used to provide physical constraints on computational methods for protein structure prediction based on primary sequence information. This in turn can help in identifying structural homologs of a given protein, which can then give clues to its function. D. Walther, F. Cohen and S. Doniach. "Reconstruction of low resolution three-dimensional density maps from one-dimensional small angle x-ray scattering data for biomolecules." J. Appl. Cryst. 33(2):350-363 (2000). Protein structure prediction constrained by solution X-ray scattering data and structural homology identification Zheng WJ, Doniach S JOURNAL OF MOLECULAR BIOLOGY , v. 316(#1) pp. 173-187 FEB 8, 2002

  1. Structural Genomics: From Genes to Structures With Valuable Materials And Many Questions in Between

    SciTech Connect

    Fox, B.G.; Goulding, C.; Malkowski, M.G.; Stewart, L.; Deacon, A.; /SLAC, SSRL

    2009-04-30

    The Protein Structure Initiative (PSI), funded by the US National Institutes of Health (NIH), provides a framework for the development and systematic evaluation of methods to solve protein structures. Although the PSI and other structural genomics efforts around the world have led to the solution of many new protein structures as well as the development of new methods, methodological bottlenecks still exist and are being addressed in this 'production phase' of PSI.

  2. The Seattle Structural Genomics Center for Infectious Disease (SSGCID)

    PubMed Central

    Myler, P.J.; Stacy, R.; Stewart, L.; Staker, B.L.; Van Voorhis, W.C.; Varani, G.; Buchko, G.W.

    2010-01-01

    The NIAID-funded Seattle Structural Genomics Center for Infectious Disease (SSGCID) is a consortium established to apply structural genomics approaches to potential drug targets from NIAID priority organisms for biodefense and emerging and re-emerging diseases. The mission of the SSGCID is to determine ~400 protein structures over five years ending in 2012. In order to maximize biomedical impact, ligand-based drug-lead discovery campaigns will be pursued for a small number of high-impact targets. Here we review the center’s target selection processes, which include pro-active engagement of the infectious disease research and drug therapy communities to identify drug targets, essential enzymes, virulence factors and vaccine candidates of biomedical relevance to combat infectious diseases. This is followed by a brief overview of the SSGCID structure determination pipeline and ligand screening methodology. Finally, specifics of our resources available to the scientific community are presented. Physical materials and data produced by SSGCID will be made available to the scientific community, with the aim that they will provide essential groundwork benefiting future research and drug discovery. PMID:19594426

  3. AFLP markers resolve intra-specific relationships and infer genetic structure among lineages of the canyon treefrog, Hyla arenicolor.

    PubMed

    Klymus, Katy E; Carl Gerhardt, H

    2012-11-01

    The canyon treefrog, Hyla arenicolor, is a wide-ranging hylid found from southwestern US into southern Mexico. Recent studies have shown this species to have a complex evolutionary history, with several phylogeographically distinct lineages, a probable cryptic species, and multiple episodes of mitochondrial introgression with the sister group, the H. eximia complex. We aimed to use genome wide AFLP markers to better resolve relationships within this group. As in other studies, our inferred phylogeny not only provides evidence for repeated mitochondrial introgression between H. arenicolor lineages and H. eximia/H. wrightorum, but it also affords more resolution within the main H. arenicolor clade than was previously achieved with sequence data. However, as with a previous study, the placement of a lineage of H. arenicolor whose distribution is centered in the Balsas Basin of Mexico remains poorly resolved, perhaps due to past hybridization with the H. eximia complex. Furthermore, the AFLP data set shows no differentiation among lineages from the Grand Canyon and Colorado Plateau despite their large mitochondrial sequence divergence. Finally, our results infer a well-supported sister relationship between this combined Colorado Plateau/Grand Canyon lineage and the Sonoran Desert lineage, a relationship that strongly contradicts conclusions drawn from the mtDNA evidence. Our study provides a basis for further behavioral and ecological speciation studies of this system and highlights the importance of multi-taxon (species) sampling in phylogenetic and phylogeographic studies. PMID:22898531

  4. The phylogenetic position of the giant devil ray Mobula mobular (Bonnaterre, 1788) (Myliobatiformes, Myliobatidae) inferred from the mitochondrial genome.

    PubMed

    Bustamante, Carlos; Barría, Claudio; Vargas-Caro, Carolina; Ovenden, Jennifer R; Bennett, Michael B

    2016-09-01

    The giant devil ray, Mobula mobular, is a member of one of the most distinct groups of cartilaginous fishes, the Mobulidae (manta and devil rays), and is the only mobulid assessed as Endangered due its restricted distribution, high bycatch mortality and suspected population decline. The complete mitochondrial genome is 18 913 base pairs in length and comprises 2 rRNAs, 13 protein-coding genes, 22 tRNAs and 2 non-coding regions. Comparison with the partial mitogenome of M. japanica suggests a sister-cryptic species complex and two different taxonomic units. However, the limited divergence within the species (>99.9% genetic identity) may be the result of a geographically and numerically restricted population of M. mobular within the Mediterranean Sea. PMID:26260171

  5. Genomic Heterogeneity and Structural Variation in Soybean Near Isogenic Lines

    PubMed Central

    Stec, Adrian O.; Bhaskar, Pudota B.; Bolon, Yung-Tsi; Nolan, Rebecca; Shoemaker, Randy C.; Vance, Carroll P.; Stupar, Robert M.

    2013-01-01

    Near isogenic lines (NILs) are a critical genetic resource for the soybean research community. The ability to identify and characterize the genes driving the phenotypic differences between NILs is limited by the degree to which differential genetic introgressions can be resolved. Furthermore, the genetic heterogeneity extant among NIL sub-lines is an unaddressed research topic that might have implications for how genomic and phenotypic data from NILs are utilized. In this study, a recently developed high-resolution comparative genomic hybridization (CGH) platform was used to investigate the structure and diversity of genetic introgressions in two classical soybean NIL populations, respectively varying in protein content and iron deficiency chlorosis (IDC) susceptibility. There were three objectives: assess the capacity for CGH to resolve genomic introgressions, identify introgressions that are heterogeneous among NIL sub-lines, and associate heterogeneous introgressions with susceptibility to IDC. Using the CGH approach, introgression boundaries were refined and previously unknown introgressions were revealed. Furthermore, heterogeneous introgressions were identified within seven sub-lines of the IDC NIL “IsoClark.” This included three distinct introgression haplotypes linked to the major iron susceptible locus on chromosome 03. A phenotypic assessment of the seven sub-lines did not reveal any differences in IDC susceptibility, indicating that the genetic heterogeneity among the lines does not have a significant impact on the primary NIL phenotype. PMID:23630538

  6. A Roadmap for Functional Structural Variants in the Soybean Genome

    PubMed Central

    Anderson, Justin E.; Kantar, Michael B.; Kono, Thomas Y.; Fu, Fengli; Stec, Adrian O.; Song, Qijian; Cregan, Perry B.; Specht, James E.; Diers, Brian W.; Cannon, Steven B.; McHale, Leah K.; Stupar, Robert M.

    2014-01-01

    Gene structural variation (SV) has recently emerged as a key genetic mechanism underlying several important phenotypic traits in crop species. We screened a panel of 41 soybean (Glycine max) accessions serving as parents in a soybean nested association mapping population for deletions and duplications in more than 53,000 gene models. Array hybridization and whole genome resequencing methods were used as complementary technologies to identify SV in 1528 genes, or approximately 2.8%, of the soybean gene models. Although SV occurs throughout the genome, SV enrichment was noted in families of biotic defense response genes. Among accessions, SV was nearly eightfold less frequent for gene models that have retained paralogs since the last whole genome duplication event, compared with genes that have not retained paralogs. Increases in gene copy number, similar to that described at the Rhg1 resistance locus, account for approximately one-fourth of the genic SV events. This assessment of soybean SV occurrence presents a target list of genes potentially responsible for rapidly evolving and/or adaptive traits. PMID:24855315

  7. The impact of structural genomics: the first quindecennial.

    PubMed

    Grabowski, Marek; Niedzialkowska, Ewa; Zimmerman, Matthew D; Minor, Wladek

    2016-03-01

    The period 2000-2015 brought the advent of high-throughput approaches to protein structure determination. With the overall funding on the order of $2 billion (in 2010 dollars), the structural genomics (SG) consortia established worldwide have developed pipelines for target selection, protein production, sample preparation, crystallization, and structure determination by X-ray crystallography and NMR. These efforts resulted in the determination of over 13,500 protein structures, mostly from unique protein families, and increased the structural coverage of the expanding protein universe. SG programs contributed over 4400 publications to the scientific literature. The NIH-funded Protein Structure Initiatives alone have produced over 2000 scientific publications, which to date have attracted more than 93,000 citations. Software and database developments that were necessary to handle high-throughput structure determination workflows have led to structures of better quality and improved integrity of the associated data. Organized and accessible data have a positive impact on the reproducibility of scientific experiments. Most of the experimental data generated by the SG centers are freely available to the community and has been utilized by scientists in various fields of research. SG projects have created, improved, streamlined, and validated many protocols for protein production and crystallization, data collection, and functional analysis, significantly benefiting biological and biomedical research. PMID:26935210

  8. Population structure and minimum core genome typing of Legionella pneumophila

    PubMed Central

    Qin, Tian; Zhang, Wen; Liu, Wenbin; Zhou, Haijian; Ren, Hongyu; Shao, Zhujun; Lan, Ruiting; Xu, Jianguo

    2016-01-01

    Legionella pneumophila is an important human pathogen causing Legionnaires’ disease. In this study, whole genome sequencing (WGS) was used to study the characteristics and population structure of L. pneumophila strains. We sequenced and compared 53 isolates of L. pneumophila covering different serogroups and sequence-based typing (SBT) types (STs). We found that 1,896 single-copy orthologous genes were shared by all isolates and were defined as the minimum core genome (MCG) of L. pneumophila. A total of 323,224 single-nucleotide polymorphisms (SNPs) were identified among the 53 strains. After excluding 314,059 SNPs which were likely to be results of recombination, the remaining 9,165 SNPs were referred to as MCG SNPs. Population Structure analysis based on MCG divided the 53 L. pneumophila into nine MCG groups. The within-group distances were much smaller than the between-group distances, indicating considerable divergence between MCG groups. MCG groups were also supplied by phylogenetic analysis and may be considered as robust taxonomic units within L. pneumophila. Among the nine MCG groups, eight showed high intracellular growth ability while one showed low intracellular growth ability. Furthermore, MCG typing also showed high resolution in subtyping ST1 strains. The results obtained in this study provided significant insights into the evolution, population structure and pathogenicity of L. pneumophila. PMID:26888563

  9. Population structure and minimum core genome typing of Legionella pneumophila.

    PubMed

    Qin, Tian; Zhang, Wen; Liu, Wenbin; Zhou, Haijian; Ren, Hongyu; Shao, Zhujun; Lan, Ruiting; Xu, Jianguo

    2016-01-01

    Legionella pneumophila is an important human pathogen causing Legionnaires' disease. In this study, whole genome sequencing (WGS) was used to study the characteristics and population structure of L. pneumophila strains. We sequenced and compared 53 isolates of L. pneumophila covering different serogroups and sequence-based typing (SBT) types (STs). We found that 1,896 single-copy orthologous genes were shared by all isolates and were defined as the minimum core genome (MCG) of L. pneumophila. A total of 323,224 single-nucleotide polymorphisms (SNPs) were identified among the 53 strains. After excluding 314,059 SNPs which were likely to be results of recombination, the remaining 9,165 SNPs were referred to as MCG SNPs. Population Structure analysis based on MCG divided the 53 L. pneumophila into nine MCG groups. The within-group distances were much smaller than the between-group distances, indicating considerable divergence between MCG groups. MCG groups were also supplied by phylogenetic analysis and may be considered as robust taxonomic units within L. pneumophila. Among the nine MCG groups, eight showed high intracellular growth ability while one showed low intracellular growth ability. Furthermore, MCG typing also showed high resolution in subtyping ST1 strains. The results obtained in this study provided significant insights into the evolution, population structure and pathogenicity of L. pneumophila. PMID:26888563

  10. Simple repetitive sequences in the genome: structure and functional significance.

    PubMed

    Brahmachari, S K; Meera, G; Sarkar, P S; Balagurumoorthy, P; Tripathi, J; Raghavan, S; Shaligram, U; Pataskar, S

    1995-09-01

    The current explosion of DNA sequence information has generated increasing evidence for the claim that noncoding repetitive DNA sequences present within and around different genes could play an important role in genetic control processes, although the precise role and mechanism by which these sequences function are poorly understood. Several of the simple repetitive sequences which occur in a large number of loci throughout the human and other eukaryotic genomes satisfy the sequence criteria for forming non-B DNA structures in vitro. We have summarized some of the features of three different types of simple repeats that highlight the importance of repetitive DNA in the control of gene expression and chromatin organization. (i) (TG/CA)n repeats are widespread and conserved in many loci. These sequences are associated with nucleosomes of varying linker length and may play a role in chromatin organization. These Z-potential sequences can help absorb superhelical stress during transcription and aid in recombination. (ii) Human telomeric repeat (TTAGGG)n adopts a novel quadruplex structure and exhibits unusual chromatin organization. This unusual structural motif could explain chromosome pairing and stability. (iii) Intragenic amplification of (CTG)n/(CAG)n trinucleotide repeat, which is now known to be associated with several genetic disorders, could down-regulate gene expression in vivo. The overall implications of these findings vis-à-vis repetitive sequences in the genome are summarized. PMID:8582360

  11. First all-in-one diagnostic tool for DNA intelligence: genome-wide inference of biogeographic ancestry, appearance, relatedness, and sex with the Identitas v1 Forensic Chip.

    PubMed

    Keating, Brendan; Bansal, Aruna T; Walsh, Susan; Millman, Jonathan; Newman, Jonathan; Kidd, Kenneth; Budowle, Bruce; Eisenberg, Arthur; Donfack, Joseph; Gasparini, Paolo; Budimlija, Zoran; Henders, Anjali K; Chandrupatla, Hareesh; Duffy, David L; Gordon, Scott D; Hysi, Pirro; Liu, Fan; Medland, Sarah E; Rubin, Laurence; Martin, Nicholas G; Spector, Timothy D; Kayser, Manfred

    2013-05-01

    When a forensic DNA sample cannot be associated directly with a previously genotyped reference sample by standard short tandem repeat profiling, the investigation required for identifying perpetrators, victims, or missing persons can be both costly and time consuming. Here, we describe the outcome of a collaborative study using the Identitas Version 1 (v1) Forensic Chip, the first commercially available all-in-one tool dedicated to the concept of developing intelligence leads based on DNA. The chip allows parallel interrogation of 201,173 genome-wide autosomal, X-chromosomal, Y-chromosomal, and mitochondrial single nucleotide polymorphisms for inference of biogeographic ancestry, appearance, relatedness, and sex. The first assessment of the chip's performance was carried out on 3,196 blinded DNA samples of varying quantities and qualities, covering a wide range of biogeographic origin and eye/hair coloration as well as variation in relatedness and sex. Overall, 95 % of the samples (N = 3,034) passed quality checks with an overall genotype call rate >90 % on variable numbers of available recorded trait information. Predictions of sex, direct match, and first to third degree relatedness were highly accurate. Chip-based predictions of biparental continental ancestry were on average ~94 % correct (further support provided by separately inferred patrilineal and matrilineal ancestry). Predictions of eye color were 85 % correct for brown and 70 % correct for blue eyes, and predictions of hair color were 72 % for brown, 63 % for blond, 58 % for black, and 48 % for red hair. From the 5 % of samples (N = 162) with <90 % call rate, 56 % yielded correct continental ancestry predictions while 7 % yielded sufficient genotypes to allow hair and eye color prediction. Our results demonstrate that the Identitas v1 Forensic Chip holds great promise for a wide range of applications including criminal investigations, missing person investigations, and for national security

  12. Demographic inferences using short-read genomic data in an approximate Bayesian computation framework: in silico evaluation of power, biases and proof of concept in Atlantic walrus.

    PubMed

    Shafer, Aaron B A; Gattepaille, Lucie M; Stewart, Robert E A; Wolf, Jochen B W

    2015-01-01

    Approximate Bayesian computation (ABC) is a powerful tool for model-based inference of demographic histories from large genetic data sets. For most organisms, its implementation has been hampered by the lack of sufficient genetic data. Genotyping-by-sequencing (GBS) provides cheap genome-scale data to fill this gap, but its potential has not fully been exploited. Here, we explored power, precision and biases of a coalescent-based ABC approach where GBS data were modelled with either a population mutation parameter (θ) or a fixed site (FS) approach, allowing single or several segregating sites per locus. With simulated data ranging from 500 to 50 000 loci, a variety of demographic models could be reliably inferred across a range of timescales and migration scenarios. Posterior estimates were informative with 1000 loci for migration and split time in simple population divergence models. In more complex models, posterior distributions were wide and almost reverted to the uninformative prior even with 50 000 loci. ABC parameter estimates, however, were generally more accurate than an alternative composite-likelihood method. Bottleneck scenarios proved particularly difficult, and only recent bottlenecks without recovery could be reliably detected and dated. Notably, minor-allele-frequency filters - usual practice for GBS data - negatively affected nearly all estimates. With this in mind, we used a combination of FS and θ approaches on empirical GBS data generated from the Atlantic walrus (Odobenus rosmarus rosmarus), collectively providing support for a population split before the last glacial maximum followed by asymmetrical migration and a high Arctic bottleneck. Overall, this study evaluates the potential and limitations of GBS data in an ABC-coalescence framework and proposes a best-practice approach. PMID:25482153

  13. Southeast Asian origins of five Hill Tribe populations and correlation of genetic to linguistic relationships inferred with genome-wide SNP data.

    PubMed

    Listman, J B; Malison, R T; Sanichwankul, K; Ittiwut, C; Mutirangura, A; Gelernter, J

    2011-02-01

    In Thailand, the term Hill Tribe is used to describe populations whose members traditionally practice slash and burn agriculture and reside in the mountains. These tribes are thought to have migrated throughout Asia for up to 5,000 years, including migrations through Southern China and/or Southeast Asia. There have been continuous migrations southward from China into Thailand for approximately the past thousand years and the present geographic range of any given tribe straddles multiple political borders. As none of these populations have autochthonous scripts, written histories have until recently, been externally produced. Northern Asian, Tibetan, and Siberian origins of Hill Tribes have been proposed. All purport endogamy and have nonmutually intelligible languages. To test hypotheses regarding the geographic origins of these populations, relatedness and migrations among them and neighboring populations, and whether their genetic relationships correspond with their linguistic relationships, we analyzed 2,445 genome-wide SNP markers in 118 individuals from five Thai Hill Tribe populations (Akha, Hmong, Karen, Lahu, and Lisu), 90 individuals from majority Thai populations, and 826 individuals from Asian and Oceanean HGDP and HapMap populations using a Bayesian clustering method. Considering these results within the context of results ofrecent large-scale studies of Asian geographic genetic variation allows us to infer a shared Southeast Asian origin of these five Hill Tribe populations as well ancestry components that distinguish among them seen in successive levels of clustering. In addition, the inferred level of shared ancestry among the Hill Tribes corresponds well to relationships among their languages. PMID:20979205

  14. The complete mitochondrial genome structure of the jaguar (Panthera onca).

    PubMed

    Caragiulo, Anthony; Dougherty, Eric; Soto, Sofia; Rabinowitz, Salisa; Amato, George

    2016-01-01

    The jaguar (Panthera onca) is the largest felid in the Western hemisphere, and the only member of the Panthera genus in the New World. The jaguar inhabits most countries within Central and South America, and is considered near threatened by the International Union for the Conservation of Nature. This study represents the first sequence of the entire jaguar mitogenome, which was the only Panthera mitogenome that had not been sequenced. The jaguar mitogenome is 17,049 bases and possesses the same molecular structure as other felid mitogenomes. Bayesian inference (BI) and maximum likelihood (ML) were used to determine the phylogenetic placement of the jaguar within the Panthera genus. Both BI and ML analyses revealed the jaguar to be sister to the tiger/leopard/snow leopard clade. PMID:25010076

  15. RCK: accurate and efficient inference of sequence- and structure-based protein–RNA binding models from RNAcompete data

    PubMed Central

    Orenstein, Yaron; Wang, Yuhao; Berger, Bonnie

    2016-01-01

    Motivation: Protein–RNA interactions, which play vital roles in many processes, are mediated through both RNA sequence and structure. CLIP-based methods, which measure protein–RNA binding in vivo, suffer from experimental noise and systematic biases, whereas in vitro experiments capture a clearer signal of protein RNA-binding. Among them, RNAcompete provides binding affinities of a specific protein to more than 240 000 unstructured RNA probes in one experiment. The computational challenge is to infer RNA structure- and sequence-based binding models from these data. The state-of-the-art in sequence models, Deepbind, does not model structural preferences. RNAcontext models both sequence and structure preferences, but is outperformed by GraphProt. Unfortunately, GraphProt cannot detect structural preferences from RNAcompete data due to the unstructured nature of the data, as noted by its developers, nor can it be tractably run on the full RNACompete dataset. Results: We develop RCK, an efficient, scalable algorithm that infers both sequence and structure preferences based on a new k-mer based model. Remarkably, even though RNAcompete data is designed to be unstructured, RCK can still learn structural preferences from it. RCK significantly outperforms both RNAcontext and Deepbind in in vitro binding prediction for 244 RNAcompete experiments. Moreover, RCK is also faster and uses less memory, which enables scalability. While currently on par with existing methods in in vivo binding prediction on a small scale test, we demonstrate that RCK will increasingly benefit from experimentally measured RNA structure profiles as compared to computationally predicted ones. By running RCK on the entire RNAcompete dataset, we generate and provide as a resource a set of protein–RNA structure-based models on an unprecedented scale. Availability and Implementation: Software and models are freely available at http://rck.csail.mit.edu/ Contact: bab@mit.edu Supplementary information

  16. GAGA: A New Algorithm for Genomic Inference of Geographic Ancestry Reveals Fine Level Population Substructure in Europeans

    PubMed Central

    Lao, Oscar; Liu, Fan; Wollstein, Andreas; Kayser, Manfred

    2014-01-01

    Attempts to detect genetic population substructure in humans are troubled by the fact that the vast majority of the total amount of observed genetic variation is present within populations rather than between populations. Here we introduce a new algorithm for transforming a genetic distance matrix that reduces the within-population variation considerably. Extensive computer simulations revealed that the transformed matrix captured the genetic population differentiation better than the original one which was based on the T1 statistic. In an empirical genomic data set comprising 2,457 individuals from 23 different European subpopulations, the proportion of individuals that were determined as a genetic neighbour to another individual from the same sampling location increased from 25% with the original matrix to 52% with the transformed matrix. Similarly, the percentage of genetic variation explained between populations by means of Analysis of Molecular Variance (AMOVA) increased from 1.62% to 7.98%. Furthermore, the first two dimensions of a classical multidimensional scaling (MDS) using the transformed matrix explained 15% of the variance, compared to 0.7% obtained with the original matrix. Application of MDS with Mclust, SPA with Mclust, and GemTools algorithms to the same dataset also showed that the transformed matrix gave a better association of the genetic clusters with the sampling locations, and particularly so when it was used in the AMOVA framework with a genetic algorithm. Overall, the new matrix transformation introduced here substantially reduces the within population genetic differentiation, and can be broadly applied to methods such as AMOVA to enhance their sensitivity to reveal population substructure. We herewith provide a publically available (http://www.erasmusmc.nl/fmb/resources/GAGA) model-free method for improved genetic population substructure detection that can be applied to human as well as any other species data in future studies relevant to

  17. Structural genomic variation in childhood epilepsies with complex phenotypes

    PubMed Central

    Helbig, Ingo; Swinkels, Marielle E M; Aten, Emmelien; Caliebe, Almuth; van 't Slot, Ruben; Boor, Rainer; von Spiczak, Sarah; Muhle, Hiltrud; Jähn, Johanna A; van Binsbergen, Ellen; van Nieuwenhuizen, Onno; Jansen, Floor E; Braun, Kees P J; de Haan, Gerrit-Jan; Tommerup, Niels; Stephani, Ulrich; Hjalgrim, Helle; Poot, Martin; Lindhout, Dick; Brilstra, Eva H; Møller, Rikke S; Koeleman, Bobby PC

    2014-01-01

    A genetic contribution to a broad range of epilepsies has been postulated, and particularly copy number variations (CNVs) have emerged as significant genetic risk factors. However, the role of CNVs in patients with epilepsies with complex phenotypes is not known. Therefore, we investigated the role of CNVs in patients with unclassified epilepsies and complex phenotypes. A total of 222 patients from three European countries, including patients with structural lesions on magnetic resonance imaging (MRI), dysmorphic features, and multiple congenital anomalies, were clinically evaluated and screened for CNVs. MRI findings including acquired or developmental lesions and patient characteristics were subdivided and analyzed in subgroups. MRI data were available for 88.3% of patients, of whom 41.6% had abnormal MRI findings. Eighty-eight rare CNVs were discovered in 71 out of 222 patients (31.9%). Segregation of all identified variants could be assessed in 42 patients, 11 of which were de novo. The frequency of all structural variants and de novo variants was not statistically different between patients with or without MRI abnormalities or MRI subcategories. Patients with dysmorphic features were more likely to carry a rare CNV. Genome-wide screening methods for rare CNVs may provide clues for the genetic etiology in patients with a broader range of epilepsies than previously anticipated, including in patients with various brain anomalies detectable by MRI. Performing genome-wide screens for rare CNVs can be a valuable contribution to the routine diagnostic workup in patients with a broad range of childhood epilepsies. PMID:24281369

  18. Recognizing genes and other components of genomic structure

    SciTech Connect

    Burks, C. ); Myers, E. . Dept. of Computer Science); Stormo, G.D. . Dept. of Molecular, Cellular and Developmental Biology)

    1991-01-01

    The Aspen Center for Physics (ACP) sponsored a three-week workshop, with 26 scientists participating, from 28 May to 15 June, 1990. The workshop, entitled Recognizing Genes and Other Components of Genomic Structure, focussed on discussion of current needs and future strategies for developing the ability to identify and predict the presence of complex functional units on sequenced, but otherwise uncharacterized, genomic DNA. We addressed the need for computationally-based, automatic tools for synthesizing available data about individual consensus sequences and local compositional patterns into the composite objects (e.g., genes) that are -- as composite entities -- the true object of interest when scanning DNA sequences. The workshop was structured to promote sustained informal contact and exchange of expertise between molecular biologists, computer scientists, and mathematicians. No participant stayed for less than one week, and most attended for two or three weeks. Computers, software, and databases were available for use as electronic blackboards'' and as the basis for collaborative exploration of ideas being discussed and developed at the workshop. 23 refs., 2 tabs.

  19. Structural constraints in the packaging of bluetongue virus genomic segments

    PubMed Central

    Burkhardt, Christiane; Sung, Po-Yu; Celma, Cristina C.

    2014-01-01

    The mechanism used by bluetongue virus (BTV) to ensure the sorting and packaging of its 10 genomic segments is still poorly understood. In this study, we investigated the packaging constraints for two BTV genomic segments from two different serotypes. Segment 4 (S4) of BTV serotype 9 was mutated sequentially and packaging of mutant ssRNAs was investigated by two newly developed RNA packaging assay systems, one in vivo and the other in vitro. Modelling of the mutated ssRNA followed by biochemical data analysis suggested that a conformational motif formed by interaction of the 5′ and 3′ ends of the molecule was necessary and sufficient for packaging. A similar structural signal was also identified in S8 of BTV serotype 1. Furthermore, the same conformational analysis of secondary structures for positive-sense ssRNAs was used to generate a chimeric segment that maintained the putative packaging motif but contained unrelated internal sequences. This chimeric segment was packaged successfully, confirming that the motif identified directs the correct packaging of the segment. PMID:24980574

  20. Secure web book to store structural genomics research data.

    PubMed

    Manjasetty, Babu A; Höppner, Klaus; Mueller, Uwe; Heinemann, Udo

    2003-01-01

    Recently established collaborative structural genomics programs aim at significantly accelerating the crystal structure analysis of proteins. These large-scale projects require efficient data management systems to ensure seamless collaboration between different groups of scientists working towards the same goal. Within the Berlin-based Protein Structure Factory, the synchrotron X-ray data collection and the subsequent crystal structure analysis tasks are located at BESSY, a third-generation synchrotron source. To organize file-based communication and data transfer at the BESSY site of the Protein Structure Factory, we have developed the web-based BCLIMS, the BESSY Crystallography Laboratory Information Management System. BCLIMS is a relational data management system which is powered by MySQL as the database engine and Apache HTTP as the web server. The database interface routines are written in Python programing language. The software is freely available to academic users. Here we describe the storage, retrieval and manipulation of laboratory information, mainly pertaining to the synchrotron X-ray diffraction experiments and the subsequent protein structure analysis, using BCLIMS. PMID:14649296

  1. The impact of population structure on genomic prediction in stratified populations.

    PubMed

    Guo, Zhigang; Tucker, Dominic M; Basten, Christopher J; Gandhi, Harish; Ersoz, Elhan; Guo, Baohong; Xu, Zhanyou; Wang, Daolong; Gay, Gilles

    2014-03-01

    Impacts of population structure on the evaluation of genomic heritability and prediction were investigated and quantified using high-density markers in diverse panels in rice and maize. Population structure is an important factor affecting estimation of genomic heritability and assessment of genomic prediction in stratified populations. In this study, our first objective was to assess effects of population structure on estimations of genomic heritability using the diversity panels in rice and maize. Results indicate population structure explained 33 and 7.5% of genomic heritability for rice and maize, respectively, depending on traits, with the remaining heritability explained by within-subpopulation variation. Estimates of within-subpopulation heritability were higher than that derived from quantitative trait loci identified in genome-wide association studies, suggesting 65% improvement in genetic gains. The second objective was to evaluate effects of population structure on genomic prediction using cross-validation experiments. When population structure exists in both training and validation sets, correcting for population structure led to a significant decrease in accuracy with genomic prediction. In contrast, when prediction was limited to a specific subpopulation, population structure showed little effect on accuracy and within-subpopulation genetic variance dominated predictions. Finally, effects of genomic heritability on genomic prediction were investigated. Accuracies with genomic prediction increased with genomic heritability in both training and validation sets, with the former showing a slightly greater impact. In summary, our results suggest that the population structure contribution to genomic prediction varies based on prediction strategies, and is also affected by the genetic architectures of traits and populations. In practical breeding, these conclusions may be helpful to better understand and utilize the different genetic resources in genomic

  2. Ecological Inference

    NASA Astrophysics Data System (ADS)

    King, Gary; Rosen, Ori; Tanner, Martin A.

    2004-09-01

    This collection of essays brings together a diverse group of scholars to survey the latest strategies for solving ecological inference problems in various fields. The last half-decade has witnessed an explosion of research in ecological inference--the process of trying to infer individual behavior from aggregate data. Although uncertainties and information lost in aggregation make ecological inference one of the most problematic types of research to rely on, these inferences are required in many academic fields, as well as by legislatures and the Courts in redistricting, by business in marketing research, and by governments in policy analysis.

  3. Backbone Solution Structures of Proteins Using Residual Dipolar Couplings: Application to a Novel Structural Genomics Target

    PubMed Central

    Valafar, H.; Mayer, K. L.; Bougault, C. M.; LeBlond, P. D.; Jenney, F. E.; Brereton, P. S.; Adams, M.W.W.; Prestegard, J.H.

    2006-01-01

    Structural genomics (or proteomics) activities are critically dependent on the availability of high-throughput structure determination methodology. Development of such methodology has been a particular challenge for NMR based structure determination because of the demands for isotopic labeling of proteins and the requirements for very long data acquisition times. We present here a methodology that gains efficiency from a focus on determination of backbone structures of proteins as opposed to full structures with all side chains in place. This focus is appropriate given the presumption that many protein structures in the future will be built using computational methods that start from representative fold family structures and replace as many as 70% of the side chains in the course of structure determination. The methodology we present is based primarily on residual dipolar couplings (RDCs), readily accessible NMR observables that constrain the orientation of backbone fragments irrespective of separation in space. A new software tool is described for the assembly of backbone fragments under RDC constraints and an application to a structural genomics target is presented. The target is an 8.7 kDa protein from Pyrococcus furiosus, PF1061, that was previously not well annotated, and had a nearest structurally characterized neighbor with only 33% sequence identity. The structure produced shows structural similarity to this sequence homologue, but also shows similarity to other proteins that suggests a functional role in sulfur transfer. Given the backbone structure and a possible functional link this should be an ideal target for development of modeling methods. PMID:15704012

  4. Studies on cattle genomic structural variation provide insights into ruminant speciation and adaptation

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genomic structural variations, including segmental duplications (SD) and copy number variations (CNV), contribute significantly to individual health and disease in primates and rodents. As a part of the bovine genome annotation effort, we performed the first genome-wide analysis of SD in cattle usin...

  5. Physical limits of inference

    NASA Astrophysics Data System (ADS)

    Wolpert, David H.

    2008-07-01

    We show that physical devices that perform observation, prediction, or recollection share an underlying mathematical structure. We call devices with that structureinference devices”. We present a set of existence and impossibility results concerning inference devices. These results hold independent of the precise physical laws governing our universe. In a limited sense, the impossibility results establish that Laplace was wrong to claim that even in a classical, non-chaotic universe the future can be unerringly predicted, given sufficient knowledge of the present. Alternatively, these impossibility results can be viewed as a non-quantum-mechanical “uncertainty principle”. The mathematics of inference devices has close connections to the mathematics of Turing Machines (TMs). In particular, the impossibility results for inference devices are similar to the Halting theorem for TMs. Furthermore, one can define an analog of Universal TMs (UTMs) for inference devices. We call those analogs “strong inference devices”. We use strong inference devices to define the “inference complexity” of an inference task, which is the analog of the Kolmogorov complexity of computing a string. A task-independent bound is derived on how much the inference complexity of an inference task can differ for two different inference devices. This is analogous to the “encoding” bound governing how much the Kolmogorov complexity of a string can differ between two UTMs used to compute that string. However no universe can contain more than one strong inference device. So whereas the Kolmogorov complexity of a string is arbitrary up to specification of the UTM, there is no such arbitrariness in the inference complexity of an inference task. We informally discuss the philosophical implications of these results, e.g., for whether the universe “is” a computer. We also derive some graph-theoretic properties governing any set of multiple inference devices. We also present an

  6. 3D structures of membrane proteins from genomic sequencing

    PubMed Central

    Hopf, Thomas A.; Colwell, Lucy J.; Sheridan, Robert; Rost, Burkhard; Sander, Chris; Marks, Debora S.

    2012-01-01

    Summary We show that amino acid co-variation in proteins, extracted from the evolutionary sequence record, can be used to fold transmembrane proteins. We use this technique to predict previously unknown, 3D structures for 11 transmembrane proteins (with up to 14 helices) from their sequences alone. The prediction method (EVfold_membrane), applies a maximum entropy approach to infer evolutionary co-variation in pairs of sequence positions within a protein family and then generates all-atom models with the derived pairwise distance constraints. We benchmark the approach with blinded, de novo computation of known transmembrane protein structures from 23 families, demonstrating unprecedented accuracy of the method for large transmembrane proteins. We show how the method can predict oligomerization, functional sites, and conformational changes in transmembrane proteins. With the rapid rise in large-scale sequencing, more accurate and more comprehensive information on evolutionary constraints can be decoded from genetic variation, greatly expanding the repertoire of transmembrane proteins amenable to modelling by this method. PMID:22579045

  7. Utility of DNA taxonomy and barcoding for the inference of larval community structure in morphologically cryptic Chironomus (Diptera) species.

    PubMed

    Pfenninger, Markus; Nowak, Carsten; Kley, Christoph; Steinke, Dirk; Streit, Bruno

    2007-05-01

    Biodiversity studies require species level analyses for the accurate assessment of community structures. However, while specialized taxonomic knowledge is only rarely available for routine identifications, DNA taxonomy and DNA barcoding could provide the taxonomic basis for ecological inferences. In this study, we assessed the community structure of sediment dwelling, morphologically cryptic Chironomus larvae in the Rhine-valley plain/Germany, comparing larval type classification, cytotaxonomy, DNA taxonomy and barcoding. While larval type classification performed poorly, cytotaxonomy and DNA-based methods yielded comparable results: detrended correspondence analysis and permutation analyses indicated that the assemblages are not randomly but competitively structured. However, DNA taxonomy identified an additional species that could not be resolved by the traditional method. We argue that DNA-based identification methods such as DNA barcoding can be a valuable tool to increase accuracy, objectivity and comparability of the taxonomic assessment in biodiversity and community ecology studies. PMID:17444904

  8. Genetic structure of Colletotrichum gloeosporioides sensu lato isolates infecting papaya inferred by multilocus ISSR markers.

    PubMed

    Rampersad, Sephra N

    2013-02-01

    Colletotrichum gloeosporioides sensu lato is widely distributed throughout temperate and tropical regions and causes anthracnose disease in numerous plant species. Development of effective disease management strategies is dependent on, among other factors, an understanding of pathogen genetic diversity and population stratification at the intraspecific level. For 132 isolates of C. gloeosporioides sensu lato collected from papaya in Trinidad, inter-simple-sequence repeat-polymerase chain reaction (ISSR-PCR) generated 121 polymorphic loci from five ISSR primers selected from an initial screen of 22 ISSR primers. The mean percentage of polymorphic loci was 99.18%. Bayesian cluster analysis inferred three genetic subpopulations, where group 1 consisted exclusively of isolates collected in the southern part of Trinidad whereas groups 2 and 3, although genetically distinct, were mixtures of isolates collected from both the northern and southern parts of Trinidad. Principal coordinates analysis and unweighted pair-group method with arithmetic mean phylogeny were concordant with Bayesian cluster analysis and supported subdivision into the three subpopulations. Overall, the total mean gene diversity was 0.279, the mean within-population gene diversity was 0.2161, and genetic differentiation for the Trinidad population was 0.225. Regionally, northern isolates had a lower gene diversity compared with southern isolates. Nei's gene diversity was highest for group 1 (h = 0.231), followed by group 2 (h = 0.215) and group 3 (h = 0.202). Genotypic diversity was at or near maximum for all three subpopulations after clone correction. Pairwise estimates of differentiation indicated high and significant genetic differentiation among the inferred subpopulations (Weir's θ of 0.212 to 0.325). Pairwise comparisons among subpopulations suggested restricted gene flow between groups 1 and 2 and groups 1 and 3 but not between groups 2 and 3. The null hypothesis of random mating was rejected

  9. Comparative genetics and genomics of nematodes: genome structure, development, and lifestyle.

    PubMed

    Sommer, Ralf J; Streit, Adrian

    2011-01-01

    Nematodes are found in virtually all habitats on earth. Many of them are parasites of plants and animals, including humans. The free-living nematode, Caenorhabditis elegans, is one of the genetically best-studied model organisms and was the first metazoan whose genome was fully sequenced. In recent years, the draft genome sequences of another six nematodes representing four of the five major clades of nematodes were published. Compared to mammalian genomes, all these genomes are very small. Nevertheless, they contain almost the same number of genes as the human genome. Nematodes are therefore a very attractive system for comparative genetic and genomic studies, with C. elegans as an excellent baseline. Here, we review the efforts that were made to extend genetic analysis to nematodes other than C. elegans, and we compare the seven available nematode genomes. One of the most striking findings is the unexpectedly high incidence of gene acquisition through horizontal gene transfer (HGT). PMID:21721943

  10. The genome-wide structure of the Jewish people.

    PubMed

    Behar, Doron M; Yunusbayev, Bayazit; Metspalu, Mait; Metspalu, Ene; Rosset, Saharon; Parik, Jüri; Rootsi, Siiri; Chaubey, Gyaneshwer; Kutuev, Ildus; Yudkovsky, Guennady; Khusnutdinova, Elza K; Balanovsky, Oleg; Semino, Ornella; Pereira, Luisa; Comas, David; Gurwitz, David; Bonne-Tamir, Batsheva; Parfitt, Tudor; Hammer, Michael F; Skorecki, Karl; Villems, Richard

    2010-07-01

    Contemporary Jews comprise an aggregate of ethno-religious communities whose worldwide members identify with each other through various shared religious, historical and cultural traditions. Historical evidence suggests common origins in the Middle East, followed by migrations leading to the establishment of communities of Jews in Europe, Africa and Asia, in what is termed the Jewish Diaspora. This complex demographic history imposes special challenges in attempting to address the genetic structure of the Jewish people. Although many genetic studies have shed light on Jewish origins and on diseases prevalent among Jewish communities, including studies focusing on uniparentally and biparentally inherited markers, genome-wide patterns of variation across the vast geographic span of Jewish Diaspora communities and their respective neighbours have yet to be addressed. Here we use high-density bead arrays to genotype individuals from 14 Jewish Diaspora communities and compare these patterns of genome-wide diversity with those from 69 Old World non-Jewish populations, of which 25 have not previously been reported. These samples were carefully chosen to provide comprehensive comparisons between Jewish and non-Jewish populations in the Diaspora, as well as with non-Jewish populations from the Middle East and north Africa. Principal component and structure-like analyses identify previously unrecognized genetic substructure within the Middle East. Most Jewish samples form a remarkably tight subcluster that overlies Druze and Cypriot samples but not samples from other Levantine populations or paired Diaspora host populations. In contrast, Ethiopian Jews (Beta Israel) and Indian Jews (Bene Israel and Cochini) cluster with neighbouring autochthonous populations in Ethiopia and western India, respectively, despite a clear paternal link between the Bene Israel and the Levant. These results cast light on the variegated genetic architecture of the Middle East, and trace the origins

  11. Integrated consensus map of cultivated peanut and wild relatives reveals structures of the A and B genomes of Arachis and divergence of the legume genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The complex, tetraploid genome structure of peanut (Arachis hypogaea) has obstructed advances in genetics and genomics in the species. The aim of this study is to understand the genome structure of Arachis by developing a high-density integrated consensus map. Three recombinant inbred line populatio...

  12. A high-density genetic recombination map of sequence-tagged sites for sorghum, as a framework for comparative structural and evolutionary genomics of tropical grains and grasses.

    PubMed Central

    Bowers, John E; Abbey, Colette; Anderson, Sharon; Chang, Charlene; Draye, Xavier; Hoppe, Alison H; Jessup, Russell; Lemke, Cornelia; Lennington, Jennifer; Li, Zhikang; Lin, Yann-Rong; Liu, Sin-Chieh; Luo, Lijun; Marler, Barry S; Ming, Reiguang; Mitchell, Sharon E; Qiang, Dou; Reischmann, Kim; Schulze, Stefan R; Skinner, D Neil; Wang, Yue-Wen; Kresovich, Stephen; Schertz, Keith F; Paterson, Andrew H

    2003-01-01

    We report a genetic recombination map for Sorghum of 2512 loci spaced at average 0.4 cM ( approximately 300 kb) intervals based on 2050 RFLP probes, including 865 heterologous probes that foster comparative genomics of Saccharum (sugarcane), Zea (maize), Oryza (rice), Pennisetum (millet, buffelgrass), the Triticeae (wheat, barley, oat, rye), and Arabidopsis. Mapped loci identify 61.5% of the recombination events in this progeny set and reveal strong positive crossover interference acting across intervals of structural rearrangements between Sorghum bicolor and S. propinquum, but not to variation in levels of intraspecific allelic richness. While cDNA and genomic clones are similarly distributed across the genome, SSR-containing clones show different abundance patterns. Rapidly evolving hypomethylated DNA may contribute to intraspecific genomic differentiation. Nonrandom distribution patterns of multiple loci detected by 357 probes suggest ancient chromosomal duplication followed by extensive rearrangement and gene loss. Exemplifying the value of these data for comparative genomics, we support and extend prior findings regarding maize-sorghum synteny-in particular, 45% of comparative loci fall outside the inferred colinear/syntenic regions, suggesting that many small rearrangements have occurred since maize-sorghum divergence. These genetically anchored sequence-tagged sites will foster many structural, functional and evolutionary genomic studies in major food, feed, and biomass crops. PMID:14504243

  13. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing

    PubMed Central

    2013-01-01

    FST. Conclusions This initial survey of genetic variation within honey bee RNA viruses suggests future directions for studies examining the underlying causes of population-genetic structure in these economically important pathogens. PMID:23497218

  14. Genetic variation and population structure of hair crab (Erimacrus isenbeckii ) in Japan inferred from mitochondrial DNA sequence analysis.

    PubMed

    Azuma, Noriko; Kunihiro, Yasushi; Sasaki, Jun; Mihara, Eiji; Mihara, Yukio; Yasunaga, Tomoaki; Jin, Deuk-Hee; Abe, Syuiti

    2008-01-01

    Genetic variation and population structure of hair crab (Erimacrus isenbeckii) were examined using nucleotide sequence analysis of 580 base pairs (bp) in the 3' portion of the mitochondrial cytochrome c oxidase subunit I gene (COI) of 20 samples collected from 16 locales in Japan (the Hokkaido and Honshu Islands) and one in Korea. A total of 27 haplotypes was defined by 23 variable nucleotide sites in the examined COI region. Pairwise population F (ST) estimates and neighbor-joining tree inferred distinct genetic differentiation between the representative samples from the Pacific Ocean off the Eastern Hokkaido Island and the Sea of Japan, while others were intermediate between these two groups. AMOVA also showed a weak but significant differentiation among these three groups. The present results suggest a moderate population structure of hair crab, probably influenced by high gene flow between regional populations due to sea current dependent larval dispersal of this species. PMID:17955293

  15. Internal structure of the Moon inferred from Apollo seismic data and selenodetic data from GRAIL and LLR

    NASA Astrophysics Data System (ADS)

    Matsumoto, Koji; Yamada, Ryuhei; Kikuchi, Fuyuhiko; Kamata, Shunichi; Ishihara, Yoshiaki; Iwata, Takahiro; Hanada, Hideo; Sasaki, Sho

    2015-09-01

    The internal structure of the Moon is important for discussions on its origin and evolution. However, the deep structure of the Moon is still debated due to the absence of comprehensive seismic data. This study explores lunar interior models by complementing Apollo seismic travel time data with selenodetic data which have recently been improved by Gravity Recovery and Interior Laboratory (GRAIL) and Lunar Laser Ranging (LLR). The observed data can be explained by models including a deep-seated zone with a low velocity (S wave velocity = 2.9 ± 0.5 km/s) and a low viscosity (˜3 × 1016 Pa s). The thickness of this zone above the core-mantle boundary is larger than 170 km, showing a negative correlation with the radius of the fluid outer core. The inferred density of the lowermost mantle suggests a high TiO2 content (>11 wt.%) which prefers a mantle overturn scenario.

  16. Structural variation of the human genome: mechanisms, assays, and role in male infertility.

    PubMed

    Carvalho, Claudia M B; Zhang, Feng; Lupski, James R

    2011-02-01

    Genomic disorders are defined as diseases caused by rearrangements of the genome incited by a genomic architecture that conveys instability. Y-chromosome related dysfunctions such as male infertility are frequently associated with gross DNA rearrangements resulting from its peculiar genomic architecture. The Y-chromosome has evolved into a highly specialized chromosome to perform male functions, mainly spermatogenesis. Direct and inverted repeats, some of them palindromes with highly identical nucleotide sequences that can form DNA cruciform structures, characterize the genomic structure of the Y-chromosome long arm. Some particular Y chromosome genomic deletions can cause spermatogenic failure likely because of removal of one or more transcriptional units with a potential role in spermatogenesis. We describe mechanisms underlying the formation of human genomic rearrangements on autosomes and review Y-chromosome deletions associated with male infertility. PMID:21210740

  17. Inferred vs realized patterns of gene flow: an analysis of population structure in the Andros Island Rock Iguana.

    PubMed

    Colosimo, Giuliano; Knapp, Charles R; Wallace, Lisa E; Welch, Mark E

    2014-01-01

    Ecological data, the primary source of information on patterns and rates of migration, can be integrated with genetic data to more accurately describe the realized connectivity between geographically isolated demes. In this paper we implement this approach and discuss its implications for managing populations of the endangered Andros Island Rock Iguana, Cyclura cychlura cychlura. This iguana is endemic to Andros, a highly fragmented landmass of large islands and smaller cays. Field observations suggest that geographically isolated demes were panmictic due to high, inferred rates of gene flow. We expand on these observations using 16 polymorphic microsatellites to investigate the genetic structure and rates of gene flow from 188 Andros Iguanas collected across 23 island sites. Bayesian clustering of specimens assigned individuals to three distinct genotypic clusters. An analysis of molecular variance (AMOVA) indicates that allele frequency differences are responsible for a significant portion of the genetic variance across the three defined clusters (Fst =  0.117, p<0.01). These clusters are associated with larger islands and satellite cays isolated by broad water channels with strong currents. These findings imply that broad water channels present greater obstacles to gene flow than was inferred from field observation alone. Additionally, rates of gene flow were indirectly estimated using BAYESASS 3.0. The proportion of individuals originating from within each identified cluster varied from 94.5 to 98.7%, providing further support for local isolation. Our assessment reveals a major disparity between inferred and realized gene flow. We discuss our results in a conservation perspective for species inhabiting highly fragmented landscapes. PMID:25229344

  18. Inferred vs Realized Patterns of Gene Flow: An Analysis of Population Structure in the Andros Island Rock Iguana

    PubMed Central

    Colosimo, Giuliano; Knapp, Charles R.; Wallace, Lisa E.; Welch, Mark E.

    2014-01-01

    Ecological data, the primary source of information on patterns and rates of migration, can be integrated with genetic data to more accurately describe the realized connectivity between geographically isolated demes. In this paper we implement this approach and discuss its implications for managing populations of the endangered Andros Island Rock Iguana, Cyclura cychlura cychlura. This iguana is endemic to Andros, a highly fragmented landmass of large islands and smaller cays. Field observations suggest that geographically isolated demes were panmictic due to high, inferred rates of gene flow. We expand on these observations using 16 polymorphic microsatellites to investigate the genetic structure and rates of gene flow from 188 Andros Iguanas collected across 23 island sites. Bayesian clustering of specimens assigned individuals to three distinct genotypic clusters. An analysis of molecular variance (AMOVA) indicates that allele frequency differences are responsible for a significant portion of the genetic variance across the three defined clusters (Fst =  0.117, p0.01). These clusters are associated with larger islands and satellite cays isolated by broad water channels with strong currents. These findings imply that broad water channels present greater obstacles to gene flow than was inferred from field observation alone. Additionally, rates of gene flow were indirectly estimated using BAYESASS 3.0. The proportion of individuals originating from within each identified cluster varied from 94.5 to 98.7%, providing further support for local isolation. Our assessment reveals a major disparity between inferred and realized gene flow. We discuss our results in a conservation perspective for species inhabiting highly fragmented landscapes. PMID:25229344

  19. Identification of novel RNA secondary structures within the hepatitis C virus genome reveals a cooperative involvement in genome packaging

    PubMed Central

    Stewart, H.; Bingham, R.J.; White, S. J.; Dykeman, E. C.; Zothner, C.; Tuplin, A. K.; Stockley, P. G.; Twarock, R.; Harris, M.

    2016-01-01

    The specific packaging of the hepatitis C virus (HCV) genome is hypothesised to be driven by Core-RNA interactions. To identify the regions of the viral genome involved in this process, we used SELEX (systematic evolution of ligands by exponential enrichment) to identify RNA aptamers which bind specifically to Core in vitro. Comparison of these aptamers to multiple HCV genomes revealed the presence of a conserved terminal loop motif within short RNA stem-loop structures. We postulated that interactions of these motifs, as well as sub-motifs which were present in HCV genomes at statistically significant levels, with the Core protein may drive virion assembly. We mutated 8 of these predicted motifs within the HCV infectious molecular clone JFH-1, thereby producing a range of mutant viruses predicted to possess altered RNA secondary structures. RNA replication and viral titre were unaltered in viruses possessing only one mutated structure. However, infectivity titres were decreased in viruses possessing a higher number of mutated regions. This work thus identified multiple novel RNA motifs which appear to contribute to genome packaging. We suggest that these structures act as cooperative packaging signals to drive specific RNA encapsidation during HCV assembly. PMID:26972799

  20. Iterative Voting for Inference of Structural Saliency andCharacterization of Subcellular Events

    SciTech Connect

    Parvin, Bahram; Yang, Qing; Han, Ju; Chang, Hang; Rydberg, Bjorn; Barcellos-Hoff, Mary Helen

    2006-05-06

    Saliency is an important perceptual cue that occurs at different levels of resolution. Important attributes of saliency are symmetry, continuity, and closure. Detection of these attributes is often hindered by noise, variation in scale, and incomplete information. This paper introduces the iterative voting method, which uses oriented kernels for inferring saliency as it relates to symmetry. A unique aspect of the technique is the kernel topography, which is refined and reoriented iteratively. The technique can cluster and group nonconvex perceptual circular symmetries along the radial line of an object's shape. It has an excellent noise immunity and is shown to be tolerant to perturbation in scale. The application of this technique to images obtained through various modes of microscopy is demonstrated. Furthermore, as a case example, the method has been applied to quantify kinetics of nuclear foci formation that are formed by phosphorylation of hislone {gamma}H2AX following ionizing radiation. Iterative voting has been implemented in both 2-D and 3-D for multi image analysis.

  1. Statistical inference of the time-varying structure of gene-regulation networks

    PubMed Central

    2010-01-01

    Background Biological networks are highly dynamic in response to environmental and physiological cues. This variability is in contrast to conventional analyses of biological networks, which have overwhelmingly employed static graph models which stay constant over time to describe biological systems and their underlying molecular interactions. Methods To overcome these limitations, we propose here a new statistical modelling framework, the ARTIVA formalism (Auto Regressive TIme VArying models), and an associated inferential procedure that allows us to learn temporally varying gene-regulation networks from biological time-course expression data. ARTIVA simultaneously infers the topology of a regulatory network and how it changes over time. It allows us to recover the chronology of regulatory associations for individual genes involved in a specific biological process (development, stress response, etc.). Results We demonstrate that the ARTIVA approach generates detailed insights into the function and dynamics of complex biological systems and exploits efficiently time-course data in systems biology. In particular, two biological scenarios are analyzed: the developmental stages of Drosophila melanogaster and the response of Saccharomyces cerevisiae to benomyl poisoning. Conclusions ARTIVA does recover essential temporal dependencies in biological systems from transcriptional data, and provide a natural starting point to learn and investigate their dynamics in greater detail. PMID:20860793

  2. [Phylogeny of the order Rodentia inferred from structural analysis of short retrotransposon B1].

    PubMed

    Veniaminova, N A; Vasetskiĭ, N S; Lavrechenko, L A; Popov, S V; Kramerov, D A

    2007-07-01

    A large-scale study of short retroposon (SINE) B1 has been conducted in the genome of rodents from most of the known families of this mammalian order. The B1 nucleotide sequences of rodents from different families exhibited a number of characteristic features including substitutions, deletions, and tandem duplications. Comparing the distribution of these features among the rodent families, the currently discussed phylogenetic relationships were tested. The results of analysis indicated (1) an early divergence of Sciuridae and related families (Aplodontidae and Gliridae) from the other rodents; (2) a possible subsequent divergence of beavers (Castoridae); (3) a monophyletic origin of the group Hystricognathi, which includes several families, such as porcupines (Hystricidae) and guinea pigs (Caviidae); (4) a possible monophyletic origin of the group formed by the remaining families, including six families of mouselike rodents (Myodonta). Various approaches to the use of short retroposons for phylogenetic studies are discussed. PMID:17899810

  3. Population genetic structure of Indian shad, Tenualosa ilisha inferred from variation in mitochondrial DNA sequences.

    PubMed

    Behera, B K; Singh, N S; Paria, P; Sahoo, A K; Panda, D; Meena, D K; Das, P; Pakrashi, S; Biswas, D K; Sharma, A P

    2015-09-01

    Indian shad, Tenualosa ilisha, is a commercially important anadromous fish representing major catch in Indo-pacific region. The present study evaluated partial Cytochrome b (Cyt b) gene sequence of mtDNA in T. ilisha for determining genetic variation from Bay of Bengal and Arabian Sea origins. The genomic DNA extracted from T. ilisha samples representing two distant rivers in the Indian subcontinent, the Bhagirathi (lower stretch of Ganges) and the Tapi was analyzed. Sequencing of 307 bp mtDNA Cytochrome b gene fragment revealed the presence of 5 haplotypes, with high haplotype diversity (Hd) of 0.9048 with variance 0.103 and low nucleotide diversity (π) of 0.14301. Three population specific haplotypes were observed in river Ganga and two haplotypes in river Tapi. Neighbour-joining tree based on Cytochrome b gene sequences of T. ilisha showed that population from Bay of Bengal and Arabian Sea origins belonged to two distinct clusters. PMID:26521565

  4. SPring-8 Structural Biology Beamlines / Automatic Beamline Operation at RIKEN Structural Genomics Beamlines

    SciTech Connect

    Ueno, Go; Hasegawa, Kazuya; Okazaki, Nobuo; Sakai, Hisanobu; Kumasaka, Takashi; Yamamoto, Masaki

    2007-01-19

    RIKEN Structural Genomics Beamlines (BL26B1 and BL26B2) at SPring-8 have been constructed for high throughput protein crystallography. The beamline operation is automated cooperating with the sample changer robot. The operation software provides a centralized control utilizing the client and server architecture. The sample management system with the networked database has been implemented to accept dry-shipped crystals from distant users.

  5. Large-scale structure of genomic methylation patterns.

    PubMed

    Rollins, Robert A; Haghighi, Fatemeh; Edwards, John R; Das, Rajdeep; Zhang, Michael Q; Ju, Jingyue; Bestor, Timothy H

    2006-02-01

    The mammalian genome depends on patterns of methylated cytosines for normal function, but the relationship between genomic methylation patterns and the underlying sequence is unclear. We have characterized the methylation landscape of the human genome by global analysis of patterns of CpG depletion and by direct sequencing of 3073 unmethylated domains and 2565 methylated domains from human brain DNA. The genome was found to consist of short (<4 kb) unmethylated domains embedded in a matrix of long methylated domains. Unmethylated domains were enriched in promoters, CpG islands, and first exons, while methylated domains comprised interspersed and tandem-repeated sequences, exons other than first exons, and non-annotated single-copy sequences that are depleted in the CpG dinucleotide. The enrichment of regulatory sequences in the relatively small unmethylated compartment suggests that cytosine methylation constrains the effective size of the genome through the selective exposure of regulatory sequences. This buffers regulatory networks against changes in total genome size and provides an explanation for the C value paradox, which concerns the wide variations in genome size that scale independently of gene number. This suggestion is compatible with the finding that cytosine methylation is universal among large-genome eukaryotes, while many eukaryotes with genome sizes <5 x 10(8) bp do not methylate their DNA. PMID:16365381

  6. Population structure and genotypic variation of Crataegus pontica inferred by molecular markers.

    PubMed

    Rahmani, Mohammad-Shafie; Shabanian, Naghi; Khadivi-Khub, Abdollah; Woeste, Keith E; Badakhshan, Hedieh; Alikhani, Leila

    2015-11-01

    Information about the natural patterns of genetic variability and their evolutionary bases are of fundamental practical importance for sustainable forest management and conservation. In the present study, the genetic diversity of 164 individuals from fourteen natural populations of Crataegus pontica K.Koch was assessed for the first time using three genome-based molecular techniques; inter-retrotransposon amplified polymorphism (IRAP); inter-simple sequence repeats (ISSR) and start codon targeted (SCoT) polymorphism. IRAP, ISSR and SCoT analyses yielded 126, 254 and 199 scorable amplified bands, respectively, of which 90.48, 93.37 and 83.78% were polymorphic. ISSR revealed efficiency over IRAP and SCoT due to high effective multiplex ratio, marker index and resolving power. The dendrograms based on the markers used and combined data divided individuals into three major clusters. The correlation between the coefficient matrices for the IRAP, ISSR and SCoT data was significant. A higher level of genetic variation was observed within populations than among populations based on the markers used. The lower divergence levels depicted among the studied populations could be seen as evidence of gene flow. The promotion of gene exchange will be very beneficial to conserve and utilize the enormous genetic variability. PMID:26143753

  7. Inferring R0 in emerging epidemics—the effect of common population structure is small

    PubMed Central

    Ball, Frank; Dhersin, Jean-Stéphane; Tran, Viet Chi; Wallinga, Jacco; Britton, Tom

    2016-01-01

    When controlling an emerging outbreak of an infectious disease, it is essential to know the key epidemiological parameters, such as the basic reproduction number R0 and the control effort required to prevent a large outbreak. These parameters are estimated from the observed incidence of new cases and information about the infectious contact structures of the population in which the disease spreads. However, the relevant infectious contact structures for new, emerging infections are often unknown or hard to obtain. Here, we show that, for many common true underlying heterogeneous contact structures, the simplification to neglect such structures and instead assume that all contacts are made homogeneously in the whole population results in conservative estimates for R0 and the required control effort. This means that robust control policies can be planned during the early stages of an outbreak, using such conservative estimates of the required control effort. PMID:27581480

  8. Inferring R0 in emerging epidemics-the effect of common population structure is small.

    PubMed

    Trapman, Pieter; Ball, Frank; Dhersin, Jean-Stéphane; Tran, Viet Chi; Wallinga, Jacco; Britton, Tom

    2016-08-01

    When controlling an emerging outbreak of an infectious disease, it is essential to know the key epidemiological parameters, such as the basic reproduction number R0 and the control effort required to prevent a large outbreak. These parameters are estimated from the observed incidence of new cases and information about the infectious contact structures of the population in which the disease spreads. However, the relevant infectious contact structures for new, emerging infections are often unknown or hard to obtain. Here, we show that, for many common true underlying heterogeneous contact structures, the simplification to neglect such structures and instead assume that all contacts are made homogeneously in the whole population results in conservative estimates for R0 and the required control effort. This means that robust control policies can be planned during the early stages of an outbreak, using such conservative estimates of the required control effort. PMID:27581480

  9. Entropic Inference

    NASA Astrophysics Data System (ADS)

    Caticha, Ariel

    2011-03-01

    In this tutorial we review the essential arguments behing entropic inference. We focus on the epistemological notion of information and its relation to the Bayesian beliefs of rational agents. The problem of updating from a prior to a posterior probability distribution is tackled through an eliminative induction process that singles out the logarithmic relative entropy as the unique tool for inference. The resulting method of Maximum relative Entropy (ME), includes as special cases both MaxEnt and Bayes' rule, and therefore unifies the two themes of these workshops—the Maximum Entropy and the Bayesian methods—into a single general inference scheme.

  10. Genomic structural variants are linked with intellectual disability.

    PubMed

    Bulayeva, Kazima; Lesch, Klaus-Peter; Bulayev, Oleg; Walsh, Christopher; Glatt, Stephen; Gurgenova, Farida; Omarova, Jamilja; Berdichevets, Irina; Thompson, Paul M

    2015-09-01

    Mutations in more than 500 genes have been associated with intellectual disability (ID) and related disorders of cognitive function, such as autism and schizophrenia. Here we aimed to unravel the molecular epidemiology of non-specific ID in a genetic isolate using a combination of population and molecular genetic approaches. A large multigenerational pedigree was ascertained within a Dagestan Genetic Heritage research program in a genetic isolate of indigenous ethnics. Clinical characteristics of the affected members were based on combining diagnoses from regional psychiatric hospitals with our own clinical assessment, using a Russian translation of the structured psychiatric interviews, the Diagnostic Interview for Genetic Studies and the Family Interview for Genetic Studies, based on DSM-IV criteria. Weber/CHLC 9.0 STRs set was used for multipoint parametric linkage analyses (Simwalk2.91). Next, we checked CNVs and LOH (based on Affymetrix SNP 5.0 data) in the linked with ID genomic regions with the aim to identify candidate genes associated with mutations in linked regions. The number of statistically significant (p ≤ 0.05) suggestive linkage peaks with 1.3 < LOD < 3.0 we detected in a total of 10 genomic regions: 1q41, 2p25.3-p24.2, 3p13-p12.1, 4q13.3, 10p11, 11q23, 12q24.22-q24.31, 17q24.2-q25.1, 21q22.13 and 22q12.3-q13.1. Three significant linkage signals with LOD >3 were obtained at 2p25.3-p24.2 under the dominant model, with a peak at 21 cM flanked by loci D2S2976 and D2S2952; at 12q24.22-q24.31 under the recessive model, with a peak at -120 cM flanked by marker D12S2070 and D12S395 and at 22q12.3 under the dominant model, with a peak at 32 cM flanked by marker D22S683 and D22S445. After a set of genes had been designated as possible candidates in these specific chromosomal regions,we conducted an exploratory search for LOH and CNV based on microarray data to detect structural genomic variants within five ID-linked regions with LOD scores between 2.0 and

  11. HorA web server to infer homology between proteins using sequence and structural similarity.

    PubMed

    Kim, Bong-Hyun; Cheng, Hua; Grishin, Nick V

    2009-07-01

    The biological properties of proteins are often gleaned through comparative analysis of evolutionary relatives. Although protein structure similarity search methods detect more distant homologs than purely sequence-based methods, structural resemblance can result from either homology (common ancestry) or analogy (similarity without common ancestry). While many existing web servers detect structural neighbors, they do not explicitly address the question of homology versus analogy. Here, we present a web server named HorA (Homology or Analogy) that identifies likely homologs for a query protein structure. Unlike other servers, HorA combines sequence information from state-of-the-art profile methods with structure information from spatial similarity measures using an advanced computational technique. HorA aims to identify biologically meaningful connections rather than purely 3D-geometric similarities. The HorA method finds approximately 90% of remote homologs defined in the manually curated database SCOP. HorA will be especially useful for finding remote homologs that might be overlooked by other sequence or structural similarity search servers. The HorA server is available at http://prodata.swmed.edu/horaserver. PMID:19417074

  12. HorA web server to infer homology between proteins using sequence and structural similarity

    PubMed Central

    Kim, Bong-Hyun; Cheng, Hua; Grishin, Nick V.

    2009-01-01

    The biological properties of proteins are often gleaned through comparative analysis of evolutionary relatives. Although protein structure similarity search methods detect more distant homologs than purely sequence-based methods, structural resemblance can result from either homology (common ancestry) or analogy (similarity without common ancestry). While many existing web servers detect structural neighbors, they do not explicitly address the question of homology versus analogy. Here, we present a web server named HorA (Homology or Analogy) that identifies likely homologs for a query protein structure. Unlike other servers, HorA combines sequence information from state-of-the-art profile methods with structure information from spatial similarity measures using an advanced computational technique. HorA aims to identify biologically meaningful connections rather than purely 3D-geometric similarities. The HorA method finds ∼90% of remote homologs defined in the manually curated database SCOP. HorA will be especially useful for finding remote homologs that might be overlooked by other sequence or structural similarity search servers. The HorA server is available at http://prodata.swmed.edu/horaserver. PMID:19417074

  13. Crustal Structure of the Dabie Orogenic Belt (eastern China) Inferred from Gravity and Magnetic Data

    NASA Astrophysics Data System (ADS)

    Yang, Y.; Li, Y.

    2012-12-01

    This study investigates the utility of the potential fields (gravity and magnetics) in orogenic belt settings as observed in the Dabie orogenic belt. The methods applied are Euler deconvolution on the magnetic data, continuous wavelet transform (CWT), and the gravity forward modeling. We devise a 500km long profile I, which trends N21°E and begins in the north within the Hefei basin of the North China craton, passes through the Dabie orogen, and ends in the Yangtze craton. The Euler deconvolution on the magnetic signal provides depth estimation for magnetized sources along the profile. The CWT method is then adopted to characterize the underground density interfaces of gravity field, which provides a useful constraint for the forward modeling of the profile I. Finally we present a crustal cross section through the Dabie orogenic belt, derived from the gravity forward modeling results with integration of source depth estimations from the gravity and magnetic data. As indicated by the forward gravity modeling results, there are low density bodies in the central orogen, which are probably due to the over-thickened crustal materials formed during the continuous convergence between the North China craton (NCC) and the Yangtze craton after break-off of the subducted slab. After the delamination or mountain root removal in the Dabie orogen, the asthenosphere upwelled to replace the volume previously occupied by the delaminated lithosphere. Under the extensional environment in the early Cretaceous, the upwelling mantle was metasomatised and partially melted to produce the parental magma of the post-collisional mafic-ultramafic intrusive rocks. Two anomalous denser materials with low magnetization in the lower crust and upper mantle of the Dabie orogen can be recognized in the CWT-based spectral analysis result on the magnetic data, which may be inferred as sequence of mass transport from the metasomatised mantle. Figure 1. A profile view from the Dabie orogen along the

  14. A Markov Chain Monte Carlo Approach for Joint Inference of Population Structure and Inbreeding Rates From Multilocus Genotype Data

    PubMed Central

    Gao, Hong; Williamson, Scott; Bustamante, Carlos D.

    2007-01-01

    Nonrandom mating induces correlations in allelic states within and among loci that can be exploited to understand the genetic structure of natural populations (Wright 1965). For many species, it is of considerable interest to quantify the contribution of two forms of nonrandom mating to patterns of standing genetic variation: inbreeding (mating among relatives) and population substructure (limited dispersal of gametes). Here, we extend the popular Bayesian clustering approach STRUCTURE (Pritchard et al. 2000) for simultaneous inference of inbreeding or selfing rates and population-of-origin classification using multilocus genetic markers. This is accomplished by eliminating the assumption of Hardy–Weinberg equilibrium within clusters and, instead, calculating expected genotype frequencies on the basis of inbreeding or selfing rates. We demonstrate the need for such an extension by showing that selfing leads to spurious signals of population substructure using the standard STRUCTURE algorithm with a bias toward spurious signals of admixture. We gauge the performance of our method using extensive coalescent simulations and demonstrate that our approach can correct for this bias. We also apply our approach to understanding the population structure of the wild relative of domesticated rice, Oryza rufipogon, an important partially selfing grass species. Using a sample of n = 16 individuals sequenced at 111 random loci, we find strong evidence for existence of two subpopulations, which correlates well with geographic location of sampling, and estimate selfing rates for both groups that are consistent with estimates from experimental data (s ≈ 0.48–0.70). PMID:17483417

  15. Crustal structure beneath the Japanese Islands inferred from receiver function analysis using similar earthquakes

    NASA Astrophysics Data System (ADS)

    Igarashi, Toshihiro

    2016-04-01

    The stress concentration and strain accumulation process due to inter-plate coupling of the subducting plate should have a large effect on inland shallow earthquakes that occur in the overriding plate. Information on the crustal structure and the crustal thickness is important to understanding their process. In this study, I applied receiver function analysis using similar earthquakes to estimate the crustal velocity structures beneath the Japanese Islands. Because similar earthquakes are caused repeatedly at almost the same place, they are useful for extracting information on spatial distribution and temporal changes of seismic velocity structures beneath the seismic stations. I used telemetric seismographic network data covered the Japanese Islands and moderate-sized similar earthquakes which occurred in the southern Hemisphere with epicentral distances between 30 and 90 degrees for about 26 years from October 1989. Data analysis was performed separately before and after the 2011 Tohoku-Oki earthquake. To identify the spatial distribution of crustal structure, I searched for the best-correlated model between an observed receiver function at each station and synthetic ones by using a grid search method. As results, I clarified the spatial distribution of the crustal velocity structures. The spatial patterns of velocities from the ground surface to 5 km deep are corresponding with basement depth models although the velocities are slower than those of tomography models. They indicate thick sediment layers in several plain and basin areas. The crustal velocity perturbations are consistent with existing tomography models. The active volcanoes correspond low-velocity zones from the upper crust to the crust-mantle transition. A comparison of the crustal structure before and after the 2011 Tohoku-Oki earthquake suggests that the northeastern Japan arc changed to lower velocities in some areas. This kind of velocity changes might be due to other effects such as changes of

  16. Comparative Genomics of Sibling Fungal Pathogenic Taxa Identifies Adaptive Evolution without Divergence in Pathogenicity Genes or Genomic Structure

    PubMed Central

    Sillo, Fabiano; Garbelotto, Matteo; Friedman, Maria; Gonthier, Paolo

    2015-01-01

    It has been estimated that the sister plant pathogenic fungal species Heterobasidion irregulare and Heterobasidion annosum may have been allopatrically isolated for 34–41 Myr. They are now sympatric due to the introduction of the first species from North America into Italy, where they freely hybridize. We used a comparative genomic approach to 1) confirm that the two species are distinct at the genomic level; 2) determine which gene groups have diverged the most and the least between species; 3) show that their overall genomic structures are similar, as predicted by the viability of hybrids, and identify genomic regions that instead are incongruent; and 4) test the previously formulated hypothesis that genes involved in pathogenicity may be less divergent between the two species than genes involved in saprobic decay and sporulation. Results based on the sequencing of three genomes per species identified a high level of interspecific similarity, but clearly confirmed the status of the two as distinct taxa. Genes involved in pathogenicity were more conserved between species than genes involved in saprobic growth and sporulation, corroborating at the genomic level that invasiveness may be determined by the two latter traits, as documented by field and inoculation studies. Additionally, the majority of genes under positive selection and the majority of genes bearing interspecific structural variations were involved either in transcriptional or in mitochondrial functions. This study provides genomic-level evidence that invasiveness of pathogenic microbes can be attained without the high levels of pathogenicity presumed to exist for pathogens challenging naïve hosts. PMID:26527650

  17. Genetic structure of Mesoamerican populations of Big-leaf mahogany (Swietenia macrophylla) inferred from microsatellite analysis.

    PubMed

    Novick, Rachel Roth; Dick, Christopher W; Lemes, Maristerra R; Navarro, Carlos; Caccone, Adalgisa; Bermingham, Eldredge

    2003-11-01

    While microsatellites have been used to examine genetic structure in local populations of Neotropical trees, genetic studies based on such high-resolution markers have not been carried out for Mesoamerica as a whole. Here we assess the genetic structure of the Mesoamerican mahogany Swietenia macrophylla King (big-leaf mahogany), a Neotropical tree species recently listed as endangered in CITES which is commercially extinct through much of its native range. We used seven variable microsatellite loci to assess genetic diversity and population structure in eight naturally established mahogany populations from six Mesoamerican countries. Measures of genetic differentiation (FST and RST) indicated significant differences between most populations. Unrooted dendrograms based on genetic distances between populations provide evidence of strong phylogeographic structure in Mesoamerican mahogany. The two populations on the Pacific coasts of Costa Rica and Panama were genetically distant from all the others, and from one another. The remaining populations formed two clusters, one comprised of the northern populations of Mexico, Belize and Guatemala and the other containing the southern Atlantic populations of Nicaragua and Costa Rica. Significant correlation was found between geographical distance and all pairwise measures of genetic divergence, suggesting the importance of regional biogeography and isolation by distance in Mesoamerican mahogany. The results of this study demonstrate greater phylogeographic structure than has been found across Amazon basin S. macrophylla. Our findings suggest a relatively complex Mesoamerican biogeographic history and lead to the prediction that other Central American trees will show similar patterns of regional differentiation. PMID:14629370

  18. Spatial Structure and Asymmetries of Magnetospheric Currents Inferred from High-Resolution Empirical Geomagnetic Field Models

    NASA Astrophysics Data System (ADS)

    Sitnov, M. I.; Stephens, G. K.; Ukhorskiy, A. Y.; Brandt, P. C.; Korth, H.; Anderson, B. J.

    2014-12-01

    Reconstruction of the large-scale magnetospheric current systems from data has long been based on ad hoc assumptions regarding their spatial structure. A dramatic increase of amount of data provided by space-borne magnetometers from geosynchronous satellites, IMP 8, Geotail, Polar, Cluster, THEMIS, and Van Allen Probes missions enabled the development of a new approach to empirical geomagnetic field modeling. In this approach the custom-tailored modules prescribing the configuration of magnetospheric current systems were replaced by basis function expansions making the model structure free from previous a priori constraints. The new approach reveals a complex structure of the magnetospheric current systems and, in particular, their substantial dawn-dusk asymmetry during magnetic storms. This includes the formation of the hook-shaped current in the main phase, the double partial ring current near the Sym-H minimum, ring current erosion, and near-magnetopause eastward current in the pre-noon sector. With the help of data from the Van Allen Probes mission, the highest-resolution empirical models have resolved the eastward current in the innermost magnetosphere and its local-time asymmetry, including the so-called banana-current structures. At the same time, the increase of the number of degrees of freedom of the empirical model in the description of field-aligned currents has shown that the hook-shaped equatorial current corresponds to the spiral structure of the large-scale upward Birkeland currents, which can now be resolved in detail by AMPERE.

  19. Estimating the difference between structure-factor amplitudes using multivariate Bayesian inference.

    PubMed

    Katona, Gergely; Garcia-Bonete, Maria José; Lundholm, Ida V

    2016-05-01

    In experimental research referencing two or more measurements to one another is a powerful tool to reduce the effect of systematic errors between different sets of measurements. The interesting quantity is usually derived from two measurements on the same sample under different conditions. While an elaborate experimental design is essential for improving the estimate, the data analysis should also maximally exploit the covariance between the measurements. In X-ray crystallography the difference between structure-factor amplitudes carries important information to solve experimental phasing problems or to determine time-dependent structural changes in pump-probe experiments. Here a multivariate Bayesian method was used to analyse intensity measurement pairs to determine their underlying structure-factor amplitudes and their differences. The posterior distribution of the model parameter was approximated with a Markov chain Monte Carlo algorithm. The described merging method is shown to be especially advantageous when systematic and random errors result in recording negative intensity measurements. PMID:27126118

  20. Estimating the difference between structure-factor amplitudes using multivariate Bayesian inference

    PubMed Central

    Katona, Gergely; Garcia-Bonete, Maria-José; Lundholm, Ida V.

    2016-01-01

    In experimental research referencing two or more measurements to one another is a powerful tool to reduce the effect of systematic errors between different sets of measurements. The interesting quantity is usually derived from two measurements on the same sample under different conditions. While an elaborate experimental design is essential for improving the estimate, the data analysis should also maximally exploit the covariance between the measurements. In X-ray crystallography the difference between structure-factor amplitudes carries important information to solve experimental phasing problems or to determine time-dependent structural changes in pump–probe experiments. Here a multivariate Bayesian method was used to analyse intensity measurement pairs to determine their underlying structure-factor amplitudes and their differences. The posterior distribution of the model parameter was approximated with a Markov chain Monte Carlo algorithm. The described merging method is shown to be especially advantageous when systematic and random errors result in recording negative intensity measurements. PMID:27126118

  1. Preliminary Results of the Lithospheric Structure Beneath the Aeolian Archipelago (Italy) Inferred from Teleseismic Receiver Functions

    NASA Astrophysics Data System (ADS)

    Musumeci, C.; Martinez-Arevalo, C.; de Lis Mancilla3, F.; Patanè, D.

    2009-12-01

    The Aeolian archipelago (Italy) represents an approximately one-million-year-old volcanic arc related to the subduction of the Ionian oceanic plate beneath the Calabrian continental crust. The objective of this work is to develop a better understanding of the regional structure of the whole archipelago. The crustal structure under each station was obtained applying P-receiver function technique to the teleseismic P-coda data recorded by the broadband seismic network (10 stations) installed by the Istituto Nazionale di Geofisica e Volcanologia (INGV-CT). Receiver functions were computed by using the Extended-Time Multitaper Frequency Domain Cross-Correlation Receiver Function (ET-MTRF) method. The preliminary results suggest a very similar listhospheric structure below all the islands of the Aeolian archipelago, with the exception of Stromboli. The boundary between the subducting ocean crust of the Ionian plate and the Thyrrenian mantle is clearly observed below all the stations.

  2. Nuclear species-diagnostic SNP markers mined from 454 amplicon sequencing reveal admixture genomic structure of modern citrus varieties.

    PubMed

    Curk, Franck; Ancillo, Gema; Ollitrault, Frédérique; Perrier, Xavier; Jacquemoud-Collet, Jean-Pierre; Garcia-Lor, Andres; Navarro, Luis; Ollitrault, Patrick

    2015-01-01

    Most cultivated Citrus species originated from interspecific hybridisation between four ancestral taxa (C. reticulata, C. maxima, C. medica, and C. micrantha) with limited further interspecific recombination due to vegetative propagation. This evolution resulted in admixture genomes with frequent interspecific heterozygosity. Moreover, a major part of the phenotypic diversity of edible citrus results from the initial differentiation between these taxa. Deciphering the phylogenomic structure of citrus germplasm is therefore essential for an efficient utilization of citrus biodiversity in breeding schemes. The objective of this work was to develop a set of species-diagnostic single nucleotide polymorphism (SNP) markers for the four Citrus ancestral taxa covering the nine chromosomes, and to use these markers to infer the phylogenomic structure of secondary species and modern cultivars. Species-diagnostic SNPs were mined from 454 amplicon sequencing of 57 gene fragments from 26 genotypes of the four basic taxa. Of the 1,053 SNPs mined from 28,507 kb sequence, 273 were found to be highly diagnostic for a single basic taxon. Species-diagnostic SNP markers (105) were used to analyse the admixture structure of varieties and rootstocks. This revealed C. maxima introgressions in most of the old and in all recent selections of mandarins, and suggested that C. reticulata × C. maxima reticulation and introgression processes were important in edible mandarin domestication. The large range of phylogenomic constitutions between C. reticulata and C. maxima revealed in mandarins, tangelos, tangors, sweet oranges, sour oranges, grapefruits, and orangelos is favourable for genetic association studies based on phylogenomic structures of the germplasm. Inferred admixture structures were in agreement with previous hypotheses regarding the origin of several secondary species and also revealed the probable origin of several acid citrus varieties. The developed species-diagnostic SNP

  3. Nuclear Species-Diagnostic SNP Markers Mined from 454 Amplicon Sequencing Reveal Admixture Genomic Structure of Modern Citrus Varieties

    PubMed Central

    Curk, Franck; Ancillo, Gema; Ollitrault, Frédérique; Perrier, Xavier; Jacquemoud-Collet, Jean-Pierre; Garcia-Lor, Andres; Navarro, Luis; Ollitrault, Patrick

    2015-01-01

    Most cultivated Citrus species originated from interspecific hybridisation between four ancestral taxa (C. reticulata, C. maxima, C. medica, and C. micrantha) with limited further interspecific recombination due to vegetative propagation. This evolution resulted in admixture genomes with frequent interspecific heterozygosity. Moreover, a major part of the phenotypic diversity of edible citrus results from the initial differentiation between these taxa. Deciphering the phylogenomic structure of citrus germplasm is therefore essential for an efficient utilization of citrus biodiversity in breeding schemes. The objective of this work was to develop a set of species-diagnostic single nucleotide polymorphism (SNP) markers for the four Citrus ancestral taxa covering the nine chromosomes, and to use these markers to infer the phylogenomic structure of secondary species and modern cultivars. Species-diagnostic SNPs were mined from 454 amplicon sequencing of 57 gene fragments from 26 genotypes of the four basic taxa. Of the 1,053 SNPs mined from 28,507 kb sequence, 273 were found to be highly diagnostic for a single basic taxon. Species-diagnostic SNP markers (105) were used to analyse the admixture structure of varieties and rootstocks. This revealed C. maxima introgressions in most of the old and in all recent selections of mandarins, and suggested that C. reticulata × C. maxima reticulation and introgression processes were important in edible mandarin domestication. The large range of phylogenomic constitutions between C. reticulata and C. maxima revealed in mandarins, tangelos, tangors, sweet oranges, sour oranges, grapefruits, and orangelos is favourable for genetic association studies based on phylogenomic structures of the germplasm. Inferred admixture structures were in agreement with previous hypotheses regarding the origin of several secondary species and also revealed the probable origin of several acid citrus varieties. The developed species-diagnostic SNP

  4. Sexual structures in Aspergillus: morphology, importance and genomics.

    PubMed

    Geiser, David M

    2009-01-01

    The genus Aspergillus comprises a few hundred species sharing a common asexual spore forming structure, the aspergillum. Approximately one-third of these species also produce a sexual stage, all but five of which are known to be homothallic. Sexual stages associated with Aspergillus fall into approximately ten different genera, reflecting a tremendous degree of phylogenetic and biological diversity. Sexual stages in Aspergillus are plectomycetous, typical for the order in which it resides, the Eurotiales. Theoretically, a homothallic Aspergillus species can produce both asexual conidia and sexual ascospores in both clonal and recombinant fashion, although the actual significance of these potential modes of reproduction is unclear. Aspergillus species with known sexual stages tend to be minor players in infections of humans, perhaps because of their tendency to produce fewer asexual spores compared to their non-teleomorphic congeners. The discovery of population genetic and genomic evidence for sex in species with no known sexual stage indicates that no assumptions can be made about the clonal versus recombinant life histories of a species based on its known mitotic and/or meiotic reproductive modes. PMID:18608901

  5. Sequence, genomic structure, and chromosomal assignment of human DOC-2

    SciTech Connect

    Albertsen, H.M.; Williams, B.; Smith, S.A.

    1996-04-15

    DOC-2 is a human gene originally identified as a 767-bp cDNA fragment isolated from normal ovarian epithelial cells by differential display against ovarian carcinoma cells. We have now determined the complete cDNA sequence of the 3.2-kb DOC-2 transcript and localized the gene to chromosome 5. A 12.5-kb genomic fragment at the 5{prime}-end of DOC-2 has also been sequenced, revealing the intron-exon structure of the first eight exons (788 bases) of the DOC-2 gene. Translation of the DOC-2 cDNA predicts a hydrophobic protein of 770 amino acid residues with a molecular weight of 82.5 kDa. Comparison of the DNA and amino acid sequences of DOC-2 to publicly accessible sequence data-bases revealed 83% identity to p96, a murine-responsive phosphoprotein. In addition, about 45% identity was observed between the first 140 N-terminal residues of DOC-2 and the Caenorhabditas elegans M110.5 and Drosophila melanoaster Dab genes. 14 refs., 3 figs.

  6. Population Structure and Comparative Genome Hybridization of European Flor Yeast Reveal a Unique Group of Saccharomyces cerevisiae Strains with Few Gene Duplications in Their Genome

    PubMed Central

    Legras, Jean-Luc; Erny, Claude; Charpentier, Claudine

    2014-01-01

    Wine biological aging is a wine making process used to produce specific beverages in several countries in Europe, including Spain, Italy, France, and Hungary. This process involves the formation of a velum at the surface of the wine. Here, we present the first large scale comparison of all European flor strains involved in this process. We inferred the population structure of these European flor strains from their microsatellite genotype diversity and analyzed their ploidy. We show that almost all of these flor strains belong to the same cluster and are diploid, except for a few Spanish strains. Comparison of the array hybridization profile of six flor strains originating from these four countries, with that of three wine strains did not reveal any large segmental amplification. Nonetheless, some genes, including YKL221W/MCH2 and YKL222C, were amplified in the genome of four out of six flor strains. Finally, we correlated ICR1 ncRNA and FLO11 polymorphisms with flor yeast population structure, and associate the presence of wild type ICR1 and a long Flo11p with thin velum formation in a cluster of Jura strains. These results provide new insight into the diversity of flor yeast and show that combinations of different adaptive changes can lead to an increase of hydrophobicity and affect velum formation. PMID:25272156

  7. Automated search of natively folded protein fragments for high-throughput structure determination in structural genomics.

    PubMed Central

    Kuroda, Y.; Tani, K.; Matsuo, Y.; Yokoyama, S.

    2000-01-01

    Structural genomic projects envision almost routine protein structure determinations, which are currently imaginable only for small proteins with molecular weights below 25,000 Da. For larger proteins, structural insight can be obtained by breaking them into small segments of amino acid sequences that can fold into native structures, even when isolated from the rest of the protein. Such segments are autonomously folding units (AFU) and have sizes suitable for fast structural analyses. Here, we propose to expand an intuitive procedure often employed for identifying biologically important domains to an automatic method for detecting putative folded protein fragments. The procedure is based on the recognition that large proteins can be regarded as a combination of independent domains conserved among diverse organisms. We thus have developed a program that reorganizes the output of BLAST searches and detects regions with a large number of similar sequences. To automate the detection process, it is reduced to a simple geometrical problem of recognizing rectangular shaped elevations in a graph that plots the number of similar sequences at each residue of a query sequence. We used our program to quantitatively corroborate the premise that segments with conserved sequences correspond to domains that fold into native structures. We applied our program to a test data set composed of 99 amino acid sequences containing 150 segments with structures listed in the Protein Data Bank, and thus known to fold into native structures. Overall, the fragments identified by our program have an almost 50% probability of forming a native structure, and comparable results are observed with sequences containing domain linkers classified in SCOP. Furthermore, we verified that our program identifies AFU in libraries from various organisms, and we found a significant number of AFU candidates for structural analysis, covering an estimated 5 to 20% of the genomic databases. Altogether, these

  8. Origin and evolution of protein fold designs inferred from phylogenomic analysis of CATH domain structures in proteomes.

    PubMed

    Bukhari, Syed Abbas; Caetano-Anollés, Gustavo

    2013-01-01

    The spatial arrangements of secondary structures in proteins, irrespective of their connectivity, depict the overall shape and organization of protein domains. These features have been used in the CATH and SCOP classifications to hierarchically partition fold space and define the architectural make up of proteins. Here we use phylogenomic methods and a census of CATH structures in hundreds of genomes to study the origin and diversification of protein architectures (A) and their associated topologies (T) and superfamilies (H). Phylogenies that describe the evolution of domain structures and proteomes were reconstructed from the structural census and used to generate timelines of domain discovery. Phylogenies of CATH domains at T and H levels of structural abstraction and associated chronologies revealed patterns of reductive evolution, the early rise of Archaea, three epochs in the evolution of the protein world, and patterns of structural sharing between superkingdoms. Phylogenies of proteomes confirmed the early appearance of Archaea. While these findings are in agreement with previous phylogenomic studies based on the SCOP classification, phylogenies unveiled sharing patterns between Archaea and Eukarya that are recent and can explain the canonical bacterial rooting typically recovered from sequence analysis. Phylogenies of CATH domains at A level uncovered general patterns of architectural origin and diversification. The tree of A structures showed that ancient structural designs such as the 3-layer (αβα) sandwich (3.40) or the orthogonal bundle (1.10) are comparatively simpler in their makeup and are involved in basic cellular functions. In contrast, modern structural designs such as prisms, propellers, 2-solenoid, super-roll, clam, trefoil and box are not widely distributed and were probably adopted to perform specialized functions. Our timelines therefore uncover a universal tendency towards protein structural complexity that is remarkable. PMID:23555236

  9. Genome mapping in capsicum and the evolution of genome structure in the solanaceae.

    PubMed Central

    Livingstone, K D; Lackney, V K; Blauth, J R; van Wijk, R; Jahn, M K

    1999-01-01

    We have created a genetic map of Capsicum (pepper) from an interspecific F2 population consisting of 11 large (76.2-192.3 cM) and 2 small (19.1 and 12.5 cM) linkage groups that cover a total of 1245.7 cM. Many of the markers are tomato probes that were chosen to cover the tomato genome, allowing comparison of this pepper map to the genetic map of tomato. Hybridization of all tomato-derived probes included in this study to positions throughout the pepper map suggests that no major losses have occurred during the divergence of these genomes. Comparison of the pepper and tomato genetic maps showed that 18 homeologous linkage blocks cover 98.1% of the tomato genome and 95.0% of the pepper genome. Through these maps and the potato map, we determined the number and types of rearrangements that differentiate these species and reconstructed a hypothetical progenitor genome. We conclude there have been 30 breaks as part of 5 translocations, 10 paracentric inversions, 2 pericentric inversions, and 4 disassociations or associations of genomic regions that differentiate tomato, potato, and pepper, as well as an additional reciprocal translocation, nonreciprocal translocation, and a duplication or deletion that differentiate the two pepper mapping parents. PMID:10388833

  10. GenomeTools: a comprehensive software library for efficient processing of structured genome annotations.

    PubMed

    Gremme, Gordon; Steinbiss, Sascha; Kurtz, Stefan

    2013-01-01

    Genome annotations are often published as plain text files describing genomic features and their subcomponents by an implicit annotation graph. In this paper, we present the GenomeTools, a convenient and efficient software library and associated software tools for developing bioinformatics software intended to create, process or convert annotation graphs. The GenomeTools strictly follow the annotation graph approach, offering a unified graph-based representation. This gives the developer intuitive and immediate access to genomic features and tools for their manipulation. To process large annotation sets with low memory overhead, we have designed and implemented an efficient pull-based approach for sequential processing of annotations. This allows to handle even the largest annotation sets, such as a complete catalogue of human variations. Our object-oriented C-based software library enables a developer to conveniently implement their own functionality on annotation graphs and to integrate it into larger workflows, simultaneously accessing compressed sequence data if required. The careful C implementation of the GenomeTools does not only ensure a light-weight memory footprint while allowing full sequential as well as random access to the annotation graph, but also facilitates the creation of bindings to a variety of script programming languages (like Python and Ruby) sharing the same interface. PMID:24091398

  11. INFERRING FUNCTIONAL NETWORK-BASED SIGNATURES VIA STRUCTURALLY-WEIGHTED LASSO MODEL

    PubMed Central

    Zhu, Dajiang; Shen, Dinggang; Liu, Tianming

    2014-01-01

    Most current research approaches for functional/effective connectivity analysis focus on pair-wise connectivity and cannot deal with network-scale functional interactions. In this paper, we propose a structurally-weighted LASSO (SW-LASSO) regression model to represent the functional interaction among multiple regions of interests (ROIs) based on resting state fMRI (R-fMRI) data. The structural connectivity constraints derived from diffusion tenor imaging (DTI) data will guide the selection of the weights which adjust the penalty levels of different coefficients corresponding to different ROIs. Using the Default Mode Network (DMN) as a test-bed, our results indicate that the learned SW-LASSO has good capability of differentiating Mild Cognitive Impairment (MCI) subjects from their normal controls and has promising potential to characterize the brain functions among different condition, thus serving as the functional network-based signature. PMID:25002915

  12. Geophysical inferences of thermal-chemical structures in the lower mantle

    NASA Technical Reports Server (NTRS)

    Yuen, D. A.; Cadek, O.; Chopelas, A.; Matyska, C.

    1993-01-01

    Lateral variations of the temperature field in the lower mantle have been reconstructed using new results in mineral physics and seismic tomographic data. We show that, with the application of high-pressure experimental values of thermal expansivity and of sound velocities, the slow seismic anomalies in the lower mantle under the Pacific and Africa can be converted into realistic-looking plume structures with large dimensions of 0(1000 km). The outer fringes of the plumes have an excess temperature of around 400 K. In the core of the plumes are found tonguelike structures with extremely high thermal anomalies. These values can exceed 1200 K and are too high to be explained on the basis of thermal anomalies alone. We suggest that these major plumes in the deep mantle may be driven by both thermal and chemical buoyancies or that enhanced conductive heat-transfer may be important there.

  13. Correlation between local structural dynamics of proteins inferred from NMR ensembles and evolutionary dynamics of homologues of known structure.

    PubMed

    Mahajan, Swapnil; de Brevern, Alexandre G; Offmann, Bernard; Srinivasan, Narayanaswamy

    2014-01-01

    Conformational changes in proteins are extremely important for their biochemical functions. Correlation between inherent conformational variations in a protein and conformational differences in its homologues of known structure is still unclear. In this study, we have used a structural alphabet called Protein Blocks (PBs). PBs are used to perform abstraction of protein 3-D structures into a 1-D strings of 16 alphabets (a-p) based on dihedral angles of overlapping pentapeptides. We have analyzed the variations in local conformations in terms of PBs represented in the ensembles of 801 protein structures determined using NMR spectroscopy. In the analysis of concatenated data over all the residues in all the NMR ensembles, we observe that the overall nature of inherent local structural variations in NMR ensembles is similar to the nature of local structural differences in homologous proteins with a high correlation coefficient of .94. High correlation at the alignment positions corresponding to helical and β-sheet regions is only expected. However, the correlation coefficient by considering only the loop regions is also quite high (.91). Surprisingly, segregated position-wise analysis shows that this high correlation does not hold true to loop regions at the structurally equivalent positions in NMR ensembles and their homologues of known structure. This suggests that the general nature of local structural changes is unique; however most of the local structural variations in loop regions of NMR ensembles do not correlate to their local structural differences at structurally equivalent positions in homologues. PMID:23730714

  14. Interferometric synthetic aperture microscopy: tissue structure inferred by computed imaging techniques

    NASA Astrophysics Data System (ADS)

    Marks, Daniel L.; Ralston, Tyler S.; Davis, Brynmor J.; Carney, P. Scott; Boppart, Stephen A.

    2008-02-01

    Interferometric Synthetic Aperture Microscopy (ISAM)1 is an optical microscopy computed-imaging technique for measuring the optical properties of three-dimensional structures and biological tissues. In this work, the principle of ISAM is reviewed, and its application to imaging tissue properties in various scanning geometries and instrument configurations is explored. The practicality of ISAM is demonstrated by imaging a rat heart and muscle using a real-time implementation of ISAM in conjunction with a clinical cart Optical Coherence Tomography instrument.

  15. Phylogeography and population structure of the red stingray, Dasyatis akajei inferred by mitochondrial control region.

    PubMed

    Li, Ning; Chen, Xiao; Sun, Dianrong; Song, Na; Lin, Qin; Gao, Tianxiang

    2015-08-01

    The red stingray Dasyatis akajei is distributed in both marine and freshwater, but little is known about its phylogeography and population structure. We sampled 107 individuals from one freshwater region and 6 coastal localities within the distribution range of D. akajei. Analyses of the first hypervariable region of mitochondrial DNA control region of 474 bp revealed only 17 polymorphism sites that defined 28 haplotypes, with no unique haplotype for the freshwater population. A high level of haplotype diversity and low nucleotide diversity were observed in both marine (h = 0.9393 ± 0.0104, π = 0.0069 ± 0.0040) and freshwater populations (h = 0.8333 ± 0.2224, π = 0.0084 ± 0.0063). Significant level of genetic structure was detected between four marine populations (TZ, WZ, ND and ZZ) via both hierarchical molecular variance analysis (AMOVA) and pairwise FST (with two exceptions), which is unusual for elasmobranchs detected previously over such short geographical distance. However, limited sampling suggested that the freshwater population was not particularly distinct (p > 0.05), but additional samples would be needed to confirm it. Demersal and slow-moving characters likely have contributed to the genetically heterogeneous population structure. The demographic history of D. akajei examined by mismatch distribution analyses, neutrality tests and Bayesian skyline analyses suggested a sudden population expansion dating to upper Pleistocene. The information on genetic diversity and genetic structure will have implications for the management of fisheries and conservation efforts. PMID:24409898

  16. Internal structure of a cold dark molecular cloud inferred from the extinction of background starlight.

    PubMed

    Alves, J F; Lada, C J; Lada, E A

    2001-01-11

    Stars and planets form within dark molecular clouds, but little is understood about the internal structure of these clouds, and consequently about the initial conditions that give rise to star and planet formation. The clouds are primarily composed of molecular hydrogen, which is virtually inaccessible to direct observation. But the clouds also contain dust, which is well mixed with the gas and which has well understood effects on the transmission of light. Here we use sensitive near-infrared measurements of the light from background stars as it is absorbed and scattered by trace amounts of dust to probe the internal structure of the dark cloud Barnard 68 with unprecedented detail. We find the cloud's density structure to be very well described by the equations for a pressure-confined, self-gravitating isothermal sphere that is critically stable according to the Bonnor-Ebert criteria. As a result we can precisely specify the physical conditions inside a dark cloud on the verge of collapse to form a star. PMID:11196632

  17. Prominence fine-structure dynamics as inferred from 2D non-LTE models

    NASA Astrophysics Data System (ADS)

    Gunar, Stanislav; Schmieder, Brigitte; Mein, Pierre; Heinzel, Petr

    2012-07-01

    2D multi-thread prominence fine structure models are able to produce synthetic Lyman spectra in very good agreement with spectral observations by SOHO/SUMER including the spectral line asymmetries. The synthetic differential emission measure curves derived from these models are also in a good agreement with observations. Now we show that these models are also able to produce synthetic H-alpha line profiles in very good agreement with observations which allows us to analyze not only the physical parameters of the prominence fine-structure plasma but also some aspects of its dynamical behaviour. We compare the synthetic H-alpha spectra with the observed spectra of the April 26, 2007 prominence using three statistical parameters: the line integrated intensity, the line full-width at the half-maximum (FWHM), and the Doppler velocity derived from shifts of the line profiles. This statistical analysis allows us to conclude that the overall statistical distribution of the LOS velocities in the April 26, 2007 prominence at the time of the observations was below +/-15 km/s and in the prominence core was close to +/-10 km/s. In combination with the analysis of the Lyman spectra we determine several physical parameters of the observed prominence fine structures which show that the April 26, 2007 prominence was relatively less massive. We are also able to put some constrains on the prominence core temperature that might be relatively low, reaching values below 6000 K.

  18. Inferring population structure and demographic history using Y-STR data from worldwide populations.

    PubMed

    Xu, Hongyang; Wang, Chuan-Chao; Shrestha, Rukesh; Wang, Ling-Xiang; Zhang, Manfei; He, Yungang; Kidd, Judith R; Kidd, Kenneth K; Jin, Li; Li, Hui

    2015-02-01

    The Y chromosome is one of the best genetic materials to explore the evolutionary history of human populations. Global analyses of Y chromosomal short tandem repeats (STRs) data can reveal very interesting world population structures and histories. However, previous Y-STR works tended to focus on small geographical ranges or only included limited sample sizes. In this study, we have investigated population structure and demographic history using 17 Y chromosomal STRs data of 979 males from 44 worldwide populations. The largest genetic distances have been observed between pairs of African and non-African populations. American populations with the lowest genetic diversities also showed large genetic distances and coancestry coefficients with other populations, whereas Eurasian populations displayed close genetic affinities. African populations tend to have the oldest time to the most recent common ancestors (TMRCAs), the largest effective population sizes and the earliest expansion times, whereas the American, Siberian, Melanesian, and isolated Atayal populations have the most recent TMRCAs and expansion times, and the smallest effective population sizes. This clear geographic pattern is well consistent with serial founder model for the origin of populations outside Africa. The Y-STR dataset presented here provides the most detailed view of worldwide population structure and human male demographic history, and additionally will be of great benefit to future forensic applications and population genetic studies. PMID:25159112

  19. Sensitivity of inferred regional CO source estimates to the vertical structure in CO as observed by MOPITT

    NASA Astrophysics Data System (ADS)

    Jiang, Z.; Jones, D. B. A.; Worden, H. M.; Henze, D. K.

    2014-09-01

    Vertical transport of surface emission to the free troposphere, usually associated with frontal lifting in warm conveyor belts or ascent in deep convection, has significant influence on the vertical structure of atmospheric trace gases. Consequently, it may impact estimates of the surface fluxes of these gases inferred from remote sensing observations that are based on thermal infrared radiances (TIR), since these measurements are sensitive mainly to signals in the free troposphere. In this work, we assessed the sensitivity of regional CO source estimates to the vertical CO distribution, by assimilating multi-spectral MOPITT V5J CO retrievals with the GEOS-Chem model. We compared the source estimates obtained by assimilating the CO profiles and the surface layer retrievals from June 2004 to May 2005. The inversion analyses all produced a reduction in CO emissions in the tropics and subtropics and an increase in the extratropics. The tropical decreases were particularly pronounced for regions where the biogenic source of CO was dominant, suggesting an overestimate of the a priori isoprene source of CO in the model. We found that the differences between the regional source estimates inferred from the profile and surface layer retrievals for 2004-2005 were small, generally less than 5% for the main continental regions, except for estimates for South Asia, North America, and Europe. Because of discrepancies in convective transport in the model, the CO source estimates for India and Southeast Asia inferred from the CO profiles were significantly higher than those estimated from the surface layer retrievals during June-August 2004. On the other hand, the profile inversion underestimated the CO emissions from North America and Europe compared to the assimilation of the surface layer retrievals. We showed that vertical transport of air from the North American and European boundary layer is slower than from other continental regions and thus air in the free troposphere from

  20. Inferring coarse-grain histone-DNA interaction potentials from high-resolution structures of the nucleosome

    NASA Astrophysics Data System (ADS)

    Meyer, Sam; Everaers, Ralf

    2015-02-01

    The histone-DNA interaction in the nucleosome is a fundamental mechanism of genomic compaction and regulation, which remains largely unknown despite increasing structural knowledge of the complex. In this paper, we propose a framework for the extraction of a nanoscale histone-DNA force-field from a collection of high-resolution structures, which may be adapted to a larger class of protein-DNA complexes. We applied the procedure to a large crystallographic database extended by snapshots from molecular dynamics simulations. The comparison of the structural models first shows that, at histone-DNA contact sites, the DNA base-pairs are shifted outwards locally, consistent with locally repulsive forces exerted by the histones. The second step shows that the various force profiles of the structures under analysis derive locally from a unique, sequence-independent, quadratic repulsive force-field, while the sequence preferences are entirely due to internal DNA mechanics. We have thus obtained the first knowledge-derived nanoscale interaction potential for histone-DNA in the nucleosome. The conformations obtained by relaxation of nucleosomal DNA with high-affinity sequences in this potential accurately reproduce the experimental values of binding preferences. Finally we address the more generic binding mechanisms relevant to the 80% genomic sequences incorporated in nucleosomes, by computing the conformation of nucleosomal DNA with sequence-averaged properties. This conformation differs from those found in crystals, and the analysis suggests that repulsive histone forces are related to local stretch tension in nucleosomal DNA, mostly between adjacent contact points. This tension could play a role in the stability of the complex.

  1. Draft Genome of the Wheat Rust Pathogen (Puccinia triticina) Unravels Genome-Wide Structural Variations during Evolution.

    PubMed

    Kiran, Kanti; Rawal, Hukam C; Dubey, Himanshu; Jaswal, Rajdeep; Devanna, B N; Gupta, Deepak Kumar; Bhardwaj, Subhash C; Prasad, P; Pal, Dharam; Chhuneja, Parveen; Balasubramanian, P; Kumar, J; Swami, M; Solanke, Amolkumar U; Gaikwad, Kishor; Singh, Nagendra K; Sharma, Tilak Raj

    2016-01-01

    Leaf rust is one of the most important diseases of wheat and is caused by Puccinia triticina, a highly variable rust pathogen prevalent worldwide. Decoding the genome of this pathogen will help in unraveling the molecular basis of its evolution and in the identification of genes responsible for its various biological functions. We generated high quality draft genome sequences (approximately 100- 106 Mb) of two races of P. triticina; the variable and virulent Race77 and the old, avirulent Race106. The genomes of races 77 and 106 had 33X and 27X coverage, respectively. We predicted 27678 and 26384 genes, with average lengths of 1,129 and 1,086 bases in races 77 and 106, respectively and found that the genomes consisted of 37.49% and 39.99% repetitive sequences. Genome wide comparative analysis revealed that Race77 differs substantially from Race106 with regard to segmental duplication (SD), repeat element, and SNP/InDel characteristics. Comparative analyses showed that Race 77 is a recent, highly variable and adapted Race compared with Race106. Further sequence analyses of 13 additional pathotypes of Race77 clearly differentiated the recent, active and virulent, from the older pathotypes. Average densities of 2.4 SNPs and 0.32 InDels per kb were obtained for all P. triticina pathotypes. Secretome analysis demonstrated that Race77 has more virulence factors than Race 106, which may be responsible for the greater degree of adaptation of this pathogen. We also found that genes under greater selection pressure were conserved in the genomes of both races, and may affect functions crucial for the higher levels of virulence factors in Race77. This study provides insights into the genome structure, genome organization, molecular basis of variation, and pathogenicity of P. triticina The genome sequence data generated in this study have been submitted to public domain databases and will be an important resource for comparative genomics studies of the more than 4000 existing

  2. Recent Developments in Parameter Estimation and Structure Identification of Biochemical and Genomic Systems

    PubMed Central

    Chou, I-Chun; Voit, Eberhard O.

    2009-01-01

    The organization, regulation and dynamical responses of biological systems are in many cases too complex to allow intuitive predictions and require the support of mathematical modeling for quantitative assessments and a reliable understanding of system functioning. All steps of constructing mathematical models for biological systems are challenging, but arguably the most difficult task among them is the estimation of model parameters and the identification of the structure and regulation of the underlying biological networks. Recent advancements in modern high-throughput techniques have been allowing the generation of time series data that characterize the dynamics of genomic, proteomic, metabolic, and physiological responses and enable us, at least in principle, to tackle estimation and identification tasks using “top-down” or “inverse” approaches. While the rewards of a successful inverse estimation or identification are great, the process of extracting structural and regulatory information is technically difficult. The challenges can generally be categorized into four areas, namely, issues related to the data, the model, the mathematical structure of the system, and the optimization and support algorithms. Many recent articles have addressed inverse problems within the modeling framework of Biochemical Systems Theory (BST). BST was chosen for these tasks because of its unique structural flexibility and the fact that the structure and regulation of a biological system are mapped essentially one-to-one onto the parameters of the describing model. The proposed methods mainly focused on various optimization algorithms, but also on support techniques, including methods for circumventing the time consuming numerical integration of systems of differential equations, smoothing overly noisy data, estimating slopes of time series, reducing the complexity of the inference task, and constraining the parameter search space. Other methods targeted issues of data

  3. Phylogenetic Gaussian Process Model for the Inference of Functionally Important Regions in Protein Tertiary Structures

    PubMed Central

    Huang, Yi-Fei; Golding, G. Brian

    2014-01-01

    A critical question in biology is the identification of functionally important amino acid sites in proteins. Because functionally important sites are under stronger purifying selection, site-specific substitution rates tend to be lower than usual at these sites. A large number of phylogenetic models have been developed to estimate site-specific substitution rates in proteins and the extraordinarily low substitution rates have been used as evidence of function. Most of the existing tools, e.g. Rate4Site, assume that site-specific substitution rates are independent across sites. However, site-specific substitution rates may be strongly correlated in the protein tertiary structure, since functionally important sites tend to be clustered together to form functional patches. We have developed a new model, GP4Rate, which incorporates the Gaussian process model with the standard phylogenetic model to identify slowly evolved regions in protein tertiary structures. GP4Rate uses the Gaussian process to define a nonparametric prior distribution of site-specific substitution rates, which naturally captures the spatial correlation of substitution rates. Simulations suggest that GP4Rate can potentially estimate site-specific substitution rates with a much higher accuracy than Rate4Site and tends to report slowly evolved regions rather than individual sites. In addition, GP4Rate can estimate the strength of the spatial correlation of substitution rates from the data. By applying GP4Rate to a set of mammalian B7-1 genes, we found a highly conserved region which coincides with experimental evidence. GP4Rate may be a useful tool for the in silico prediction of functionally important regions in the proteins with known structures. PMID:24453956

  4. Structure of the D″ layer inferred from the Earth's free oscillations

    NASA Astrophysics Data System (ADS)

    Kumagai, Hiroyuki; Fukao, Yoshio; Suda, Naoki; Kobayashi, Naoki

    1992-09-01

    We detected, by applying the Sompi method of spectral analysis to the long-period records of the International Deployment of Accelerometers (IDA), the spheroidal modes of the Earth's free oscillations 1S8, 1S9, 1S10 and 2S16, which are primarily sensitive to the structure adjacent to the core-mantle boundary (CMB). The observed central frequencies are systematically lower than those predicted from the standard Preliminary Reference Earth Model (PREM). These frequencies and those of several other lower-mantle modes were inverted into a spherical structure of the D″ layer. A global feature resolvable from the data is a decrease in P velocity gradient in the D″ layer. A possibility of anomalous density increase in the D″ layer is suggested but a trade-off between density and S velocity hampers a definitive conclusion. There is no indication of a strong decrease in Q in the D″ layer as might be expected from a simple thermal boundary layer model. We retrieved the splitting functions up to spherical harmonic degree four from the frequencies of apparently split peaks of 0S7, 1S8 and 1S10. The splitting functions of these modes consistently show negative anomalies in the Pacific and in Africa and positive anomalies in the circum-Pacific region, in good agreement with body wave tomographic models. Their intensities increase significantly with the increasing sensitivity to the structure near the CMB, indicating enhancement of lateral heterogeneity in the lowermost mantle, a result difficult to explain by a simple thermal boundary layer model, in which the isothermal CMB tends to minimize the thermally induced heterogeneity of the D″ layer.

  5. Architecture and Secondary Structure of an Entire HIV-1 RNA Genome

    PubMed Central

    Watts, Joseph M.; Dang, Kristen K.; Gorelick, Robert J.; Leonard, Christopher W.; Bess, Julian W.; Swanstrom, Ronald; Burch, Christina L.; Weeks, Kevin M.

    2009-01-01

    Single-stranded RNA viruses encompass broad classes of infectious agents and cause the common cold, cancer, AIDS, and other serious health threats. Viral replication is regulated at many levels, including using conserved genomic RNA structures. Most potential regulatory elements within viral RNA genomes are uncharacterized. Here we report the structure of an entire HIV-1 genome at single nucleotide resolution using SHAPE, a high-throughput RNA analysis technology. The genome encodes protein structure at two levels. In addition to the correspondence between RNA and protein primary sequences, a correlation exists between high levels of RNA structure and sequences that encode inter-domain loops in HIV proteins. This correlation suggests RNA structure modulates ribosome elongation to promote native protein folding. Some simple genome elements previously shown to be important, including the ribosomal gag-pol frameshift stem-loop, are components of larger RNA motifs. We also identify organizational principles for unstructured RNA regions. Highly used splice acceptors lie in unstructured motifs and hypervariable regions are sequestered from flanking genome regions by stable insulator helices. These results emphasize that the HIV-1 genome and, potentially, many coding RNAs are punctuated by numerous previously unrecognized regulatory motifs and that extensive RNA structure may constitute an additional level of the genetic code. PMID:19661910

  6. Insight into asphaltene nanoaggregate structure inferred by small angle neutron and X-ray scattering.

    PubMed

    Eyssautier, Joëlle; Levitz, Pierre; Espinat, Didier; Jestin, Jacques; Gummel, Jérémie; Grillo, Isabelle; Barré, Loïc

    2011-06-01

    Complementary neutron and X-ray small angle scattering results give prominent information on the asphaltene nanostructure. Precise SANS and SAXS measurements on a large q-scale were performed on the same dilute asphaltene-toluene solution, and absolute intensity scaling was carried out. Direct comparison of neutron and X-ray spectra enables description of a fractal organization made from the aggregation of small entities of 16 kDa, exhibiting an internal fine structure. Neutron contrast variation experiments enhance the description of this nanoaggregate in terms of core-shell disk organization, giving insight into core and shell dimensions and chemical compositions. The nanoaggregates are best described by a disk of total radius 32 Å with 30% polydispersity and a height of 6.7 Å. Composition and density calculations show that the core is a dense and aromatic structure, contrary to the shell, which is highly aliphatic. These results show a good agreement with the general view of the Yen model (Yen, T. F.; et al. Anal. Chem.1961, 33, 1587-1594) and as for the modified Yen model (Mullins, O. C. Energy Fuels2010, 24, 2179-2207), provide characteristic dimensions of the asphaltene nanoaggregate in good solvent. PMID:21553910

  7. Inferring functional connectivity in MRI using Bayesian network structure learning with a modified PC algorithm.

    PubMed

    Iyer, Swathi P; Shafran, Izhak; Grayson, David; Gates, Kathleen; Nigg, Joel T; Fair, Damien A

    2013-07-15

    Resting state functional connectivity MRI (rs-fcMRI) is a popular technique used to gauge the functional relatedness between regions in the brain for typical and special populations. Most of the work to date determines this relationship by using Pearson's correlation on BOLD fMRI timeseries. However, it has been recognized that there are at least two key limitations to this method. First, it is not possible to resolve the direct and indirect connections/influences. Second, the direction of information flow between the regions cannot be differentiated. In the current paper, we follow-up on recent work by Smith et al. (2011), and apply PC algorithm to both simulated data and empirical data to determine whether these two factors can be discerned with group average, as opposed to single subject, functional connectivity data. When applied on simulated individual subjects, the algorithm performs well determining indirect and direct connection but fails in determining directionality. However, when applied at group level, PC algorithm gives strong results for both indirect and direct connections and the direction of information flow. Applying the algorithm on empirical data, using a diffusion-weighted imaging (DWI) structural connectivity matrix as the baseline, the PC algorithm outperformed the direct correlations. We conclude that, under certain conditions, the PC algorithm leads to an improved estimate of brain network structure compared to the traditional connectivity analysis based on correlations. PMID:23501054

  8. Inferring the interplay between network structure and market effects in Bitcoin

    NASA Astrophysics Data System (ADS)

    Kondor, Dániel; Csabai, István; Szüle, János; Pósfai, Márton; Vattay, Gábor

    2014-12-01

    A main focus in economics research is understanding the time series of prices of goods and assets. While statistical models using only the properties of the time series itself have been successful in many aspects, we expect to gain a better understanding of the phenomena involved if we can model the underlying system of interacting agents. In this article, we consider the history of Bitcoin, a novel digital currency system, for which the complete list of transactions is available for analysis. Using this dataset, we reconstruct the transaction network between users and analyze changes in the structure of the subgraph induced by the most active users. Our approach is based on the unsupervised identification of important features of the time variation of the network. Applying the widely used method of Principal Component Analysis to the matrix constructed from snapshots of the network at different times, we are able to show how structural changes in the network accompany significant changes in the exchange price of bitcoins.

  9. Genetic structure and diversity among maize inbred lines as inferred from DNA microsatellites.

    PubMed Central

    Liu, Kejun; Goodman, Major; Muse, Spencer; Smith, J Stephen; Buckler, Ed; Doebley, John

    2003-01-01

    Two hundred and sixty maize inbred lines, representative of the genetic diversity among essentially all public lines of importance to temperate breeding and many important tropical and subtropical lines, were assayed for polymorphism at 94 microsatellite loci. The 2039 alleles identified served as raw data for estimating genetic structure and diversity. A model-based clustering analysis placed the inbred lines in five clusters that correspond to major breeding groups plus a set of lines showing evidence of mixed origins. A "phylogenetic" tree was constructed to further assess the genetic structure of maize inbreds, showing good agreement with the pedigree information and the cluster analysis. Tropical and subtropical inbreds possess a greater number of alleles and greater gene diversity than their temperate counterparts. The temperate Stiff Stalk lines are on average the most divergent from all other inbred groups. Comparison of diversity in equivalent samples of inbreds and open-pollinated landraces revealed that maize inbreds capture <80% of the alleles in the landraces, suggesting that landraces can provide additional genetic diversity for maize breeding. The contributions of four different segments of the landrace gene pool to each inbred group's gene pool were estimated using a novel likelihood-based model. The estimates are largely consistent with known histories of the inbreds and indicate that tropical highland germplasm is poorly represented in maize inbreds. Core sets of inbreds that capture maximal allelic richness were defined. These or similar core sets can be used for a variety of genetic applications in maize. PMID:14704191

  10. Full-length RNA structure prediction of the HIV-1 genome reveals a conserved core domain.

    PubMed

    Sükösd, Zsuzsanna; Andersen, Ebbe S; Seemann, Stefan E; Jensen, Mads Krogh; Hansen, Mathias; Gorodkin, Jan; Kjems, Jørgen

    2015-12-01

    A distance constrained secondary structural model of the ≈10 kb RNA genome of the HIV-1 has been predicted but higher-order structures, involving long distance interactions, are currently unknown. We present the first global RNA secondary structure model for the HIV-1 genome, which integrates both comparative structure analysis and information from experimental data in a full-length prediction without distance constraints. Besides recovering known structural elements, we predict several novel structural elements that are conserved in HIV-1 evolution. Our results also indicate that the structure of the HIV-1 genome is highly variable in most regions, with a limited number of stable and conserved RNA secondary structures. Most interesting, a set of long distance interactions form a core organizing structure (COS) that organize the genome into three major structural domains. Despite overlapping protein-coding regions the COS is supported by a particular high frequency of compensatory base changes, suggesting functional importance for this element. This new structural element potentially organizes the whole genome into three major domains protruding from a conserved core structure with potential roles in replication and evolution for the virus. PMID:26476446

  11. Unsupervised pattern discovery in human chromatin structure through genomic segmentation

    PubMed Central

    Hoffman, Michael M.; Buske, Orion J.; Wang, Jie; Weng, Zhiping; Bilmes, Jeff A.; Noble, William Stafford

    2012-01-01

    We applied a dynamic Bayesian network method that identifies joint patterns from multiple functional genomics experiments to ChIP-seq histone modification and transcription factor data, and DNaseI-seq and FAIRE-seq open chromatin readouts from the human cell line K562. In an unsupervised fashion, we identified patterns associated with transcription start sites, gene ends, enhancers, CTCF elements, and repressed regions. Software and genome browser tracks are at http://noble.gs.washington.edu/proj/segway/. PMID:22426492

  12. Perceptual inference.

    PubMed

    Aggelopoulos, Nikolaos C

    2015-08-01

    Perceptual inference refers to the ability to infer sensory stimuli from predictions that result from internal neural representations built through prior experience. Methods of Bayesian statistical inference and decision theory model cognition adequately by using error sensing either in guiding action or in "generative" models that predict the sensory information. In this framework, perception can be seen as a process qualitatively distinct from sensation, a process of information evaluation using previously acquired and stored representations (memories) that is guided by sensory feedback. The stored representations can be utilised as internal models of sensory stimuli enabling long term associations, for example in operant conditioning. Evidence for perceptual inference is contributed by such phenomena as the cortical co-localisation of object perception with object memory, the response invariance in the responses of some neurons to variations in the stimulus, as well as from situations in which perception can be dissociated from sensation. In the context of perceptual inference, sensory areas of the cerebral cortex that have been facilitated by a priming signal may be regarded as comparators in a closed feedback loop, similar to the better known motor reflexes in the sensorimotor system. The adult cerebral cortex can be regarded as similar to a servomechanism, in using sensory feedback to correct internal models, producing predictions of the outside world on the basis of past experience. PMID:25976632

  13. Micro and nanofluidic structures for cell sorting and genomic analysis

    NASA Astrophysics Data System (ADS)

    Morton, Keith J.

    Microfluidic systems promise rapid analysis of small samples in a compact and inexpensive format. But direct scaling of lab bench protocols on-chip is challenging because laminar flows in typical microfluidic devices are characterized by non-mixing streamlines. Common microfluidic mixers and sorters work by diffusion, limiting application to objects that diffuse slowly such as cells and DNA. Recently Huang et.al. developed a passive microfluidic element to continuously separate bio-particles deterministically. In Deterministic Lateral Displacement (DLD), objects are sorted by size as they transit an asymmetric array of microfabricated posts. This thesis further develops DLD arrays with applications in three broad new areas. First the arrays are used, not simply to sort particles, but to move streams of cells through functional flows for chemical treatment---such as on-chip immunofluorescent labeling of blood cells with washing, and on-chip E.coli cell lysis with simultaneous chromosome extraction. Secondly, modular tiling of the basic DLD element is used to construct complex particle handling modes that include beam steering for jets of cells and beads. Thirdly, nanostructured DLD arrays are built using Nanoimprint Lithography (NIL) and continuous-flow separation of 100 nm and 200 nm size particles is demonstrated. Finally a number of ancillary nanofabrication techniques were developed in support of these overall goals, including methods to interface nanofluidic structures with standard microfluidic components such as inlet channels and reservoirs, precision etching of ultra-high aspect ratio (>50:1) silicon nanostructures, and fabrication of narrow (˜ 35 nm) channels used to stretch genomic length DNA.

  14. Three-dimensional structure of the heliosphere as inferred from observations with a Japanese Halley spacecraft

    NASA Technical Reports Server (NTRS)

    Saito, T.; Yumoto, K.; Hirao, K.; Aoyama, I.; Smith, E. J.

    1986-01-01

    A sinusoidal neutral line with a pair of giant regions appeared on the sun about one year before the launch of 'Sakigake', the first of two Japanese Comet Halley spacecraft. The Sakigake magnetometer data during the early part of the mission (February-March 1985) are well interpreted by an eastward shift of the tilting neutral sheet. The shift is further explained by an effect of a new giant region appearing at about 10 deg heliolatitude and about 50 deg Carrington longitude in August 1984. The toward polarity ratio of IMF observed by Sakigake changed from about 22 percent in February to about 62 percent in early June and then increased rapidly up to 98 percent. This ratio is interpreted as a decrease of the tilt angle of the sheet down to only about 4 deg. It is the first spacecraft observation of 'the disappearing sector structure' with such small tilt angle.

  15. Causal Inference in Occupational Epidemiology: Accounting for the Healthy Worker Effect by Using Structural Nested Models

    PubMed Central

    Naimi, Ashley I.; Richardson, David B.; Cole, Stephen R.

    2013-01-01

    In a recent issue of the Journal, Kirkeleit et al. (Am J Epidemiol. 2013;177(11):1218–1224) provided empirical evidence for the potential of the healthy worker effect in a large cohort of Norwegian workers across a range of occupations. In this commentary, we provide some historical context, define the healthy worker effect by using causal diagrams, and use simulated data to illustrate how structural nested models can be used to estimate exposure effects while accounting for the healthy worker survivor effect in 4 simple steps. We provide technical details and annotated SAS software (SAS Institute, Inc., Cary, North Carolina) code corresponding to the example analysis in the Web Appendices, available at http://aje.oxfordjournals.org/. PMID:24077092

  16. Geologic structure of the northern New Caledonia ridge, as inferred from magnetic and gravity anomalies

    USGS Publications Warehouse

    Collot, J.-Y.; Rigolot, P.; Missegue, F.

    1988-01-01

    Bathymetric, gravity, and magnetic data collected in the southwest Pacific Ocean over the northern New Caledonia ridge show that the main geological units known from the island of New Caledonia extend northward from this island, beneath the Grand Lagon Nord, the Grand Passage, and the d'Entrecasteaux reefs. These data support the model of tectonic evolution of the New Caledonia region proposed by Kroenke (1984). Differences in structure, geophysical signatures and morphology evident between areas north and those south of the Grand Passage, together with the nearness of the Le Noroit massif west of the Grand Passage, suggest that contemporaneously with Eocene to early Oligocene subduction along the western New Caledonia margin, an arc-ridge collision may have occurred near the northern termination of this subduction zone. -from Authors

  17. Inferring demographic structure with moccasin size data from the Promontory Caves, Utah.

    PubMed

    Billinger, Michael; Ives, John W

    2015-01-01

    The moccasin assemblage Julian Steward recovered from the Promontory caves in 1930-31 provides a novel example in which material culture can be used to understand the structure of an AD thirteenth century population. Several studies shed light on the relationship between shoe size, foot size, and stature. We develop an anthropometric model for understanding the composition of the Promontory Cave population by using moccasin size as a proxy for foot size. We then predict the stature of the individual who would have worn a moccasin. Stature is closely related to age for children, subadults and adult males. Although there are predictable sex and age factors biasing moccasin discard practices, moccasin dimensions suggest a relatively large proportion of children and subadults occupied the Promontory caves. This bison and antelope hunting population appears to have thrived during its stay on Promontory Point. PMID:25297837

  18. The Structure of Olivine Grain Boundaries Inferred from Transient and Steady State Deformation Experiments

    NASA Astrophysics Data System (ADS)

    Faul, U.; Jackson, I.

    2015-12-01

    A consensus has not been reached regarding the structure of general, high angle grain boundaries in olivine. Published high resolution transmission electron microscope images show either abutting lattice planes of the grains on either side of the boundary, or a distinct grain boundary region, about 1 nm wide, that is more disordered than the grain interiors. However, agreement exists that grain boundary region is enriched in olivine trace elements such as Ti, Ca and Al. These analytical methods can not resolve the thickness of this region. The properties of grain boundaries can be interrogated by experimentation, but the interpretation of the experimental results is tied to microphysical models. Models for diffusion creep predict a square grain size dependence for diffusion through grain interiors, and a cubic grain size dependence for diffusion along grain boundaries. Experimental observations for polycrystalline, Fe-bearing olivine are best fit with a cubic grain size dependence, indicating diffusion along grain boundaries. Similarly, models for small strain, transient creep predict that time-dependent, recoverable deformation involves diffusion along grain boundaries. For this process the models predict a linear grain size dependence. Forced torsional oscillation experiments can be employed to investigate the transient creep behaviour over a range of frequencies, temperatures and grain sizes. The observed grain size dependence for the same materials used for conventional, large strain deformation experiments is near linear, indicating diffusionally assisted grain boundary sliding. Both transient and steady state deformation therefore implicate diffusion along grain boundaries as the rate-controlling mechanism. Diffusion and viscous sliding along grain boundaries imply that they are a separate phase with a less ordered structure, consistent with their interpretation as a (narrow) region that is distinct from grain interiors. This region likely also accommodates

  19. Genetic structure of the Korean black scraper Thamnaconus modestus inferred from microsatellite marker analysis.

    PubMed

    An, Hye Suck; Lee, Jang Wook; Park, Jung Yeon; Jung, Hyung Taek

    2013-05-01

    The Korean black scraper, Thamnaconus modestus, is one of the most economically important maricultural fish species in Korea. However, the annual catch of this fish has been continuously declining over the past several decades. In this study, the genetic diversity and relationships among four wild populations and two hatchery stocks of Korean black scraper were assessed based on 16 microsatellite (MS) markers. A total of 319 different alleles were detected over all loci with an average of 19.94 alleles per locus. The hatchery stocks [mean number of alleles (N(A)) = 12, allelic richness (A(R)) = 12, expected heterozygosity (He) = 0.834] showed a slight reduction (P > 0.05) in genetic variability in comparison with wild populations (mean N(A) = 13.86, A(R) = 12.35, He = 0.844), suggesting a sufficient level of genetic variation in the hatchery populations. Similarly low levels of inbreeding and significant Hardy-Weinberg equilibrium deviations were detected in both wild and hatchery populations. The genetic subdivision among all six populations was low but significant (overall F(ST) = 0.008, P < 0.01). Pairwise F(ST), a phylogenetic tree, and multidimensional scaling analysis suggested the existence of three geographically structured populations based on different sea basin origins, although the isolation-by-distance model was rejected. This result was corroborated by an analysis of molecular variance. This genetic differentiation may result from the co-effects of various factors, such as historical dispersal, local environment and ocean currents. These three geographical groups can be considered as independent management units. Our results show that MS markers may be suitable not only for the genetic monitoring of hatchery stocks but also for revealing the population structure of Korean black scraper populations. These results will provide critical information for breeding programs, the management of cultured stocks and the conservation of this species. PMID

  20. A sequence-based survey of the complex structural organization of tumor genomes

    SciTech Connect

    Collins, Colin; Raphael, Benjamin J.; Volik, Stanislav; Yu, Peng; Wu, Chunxiao; Huang, Guiqing; Linardopoulou, Elena V.; Trask, Barbara J.; Waldman, Frederic; Costello, Joseph; Pienta, Kenneth J.; Mills, Gordon B.; Bajsarowicz, Krystyna; Kobayashi, Yasuko; Sridharan, Shivaranjani; Paris, Pamela; Tao, Quanzhou; Aerni, Sarah J.; Brown, Raymond P.; Bashir, Ali; Gray, Joe W.; Cheng, Jan-Fang; de Jong, Pieter; Nefedov, Mikhail; Ried, Thomas; Padilla-Nash, Hesed M.; Collins, Colin C.

    2008-04-03

    The genomes of many epithelial tumors exhibit extensive chromosomal rearrangements. All classes of genome rearrangements can be identified using End Sequencing Profiling (ESP), which relies on paired-end sequencing of cloned tumor genomes. In this study, brain, breast, ovary and prostate tumors along with three breast cancer cell lines were surveyed with ESP yielding the largest available collection of sequence-ready tumor genome breakpoints and providing evidence that some rearrangements may be recurrent. Sequencing and fluorescence in situ hybridization (FISH) confirmed translocations and complex tumor genome structures that include coamplification and packaging of disparate genomic loci with associated molecular heterogeneity. Comparison of the tumor genomes suggests recurrent rearrangements. Some are likely to be novel structural polymorphisms, whereas others may be bona fide somatic rearrangements. A recurrent fusion transcript in breast tumors and a constitutional fusion transcript resulting from a segmental duplication were identified. Analysis of end sequences for single nucleotide polymorphisms (SNPs) revealed candidate somatic mutations and an elevated rate of novel SNPs in an ovarian tumor. These results suggest that the genomes of many epithelial tumors may be far more dynamic and complex than previously appreciated and that genomic fusions including fusion transcripts and proteins may be common, possibly yielding tumor-specific biomarkers and therapeutic targets.

  1. Computational inference of the structure and regulation of the lignin pathway in Panicum virgatum

    SciTech Connect

    Faraji, Mojdeh; Fonseca, Luis L.; Escamilla-Treviño, Luis; Dixon, Richard A.; Voit, Eberhard O.

    2015-09-17

    Switchgrass is a prime target for biofuel production from inedible plant parts and has been the subject of numerous investigations in recent years. Yet, one of the main obstacles to effective biofuel production remains to be the major problem of recalcitrance. Recalcitrance emerges in part from the 3-D structure of lignin as a polymer in the secondary cell wall. Lignin limits accessibility of the sugars in the cellulose and hemicellulose polymers to enzymes and ultimately decreases ethanol yield. Monolignols, the building blocks of lignin polymers, are synthesized in the cytosol and translocated to the plant cell wall, where they undergo polymerization. The biosynthetic pathway leading to monolignols in switchgrass is not completely known, and difficulties associated with in vivo measurements of these intermediates pose a challenge for a true understanding of the functioning of the pathway. In this study, a systems biological modeling approach is used to address this challenge and to elucidate the structure and regulation of the lignin pathway through a computational characterization of alternate candidate topologies. The analysis is based on experimental data characterizing stem and tiller tissue of four transgenic lines (knock-downs of genes coding for key enzymes in the pathway) as well as wild-type switchgrass plants. These data consist of the observed content and composition of monolignols. The possibility of a G-lignin specific metabolic channel associated with the production and degradation of coniferaldehyde is examined, and the results support previous findings from another plant species. The computational analysis suggests regulatory mechanisms of product inhibition and enzyme competition, which are well known in biochemistry, but so far had not been reported in switchgrass. By including these mechanisms, the pathway model is able to represent all observations. In conclusion, the results show that the presence of the coniferaldehyde channel is necessary

  2. Computational inference of the structure and regulation of the lignin pathway in Panicum virgatum

    DOE PAGESBeta

    Faraji, Mojdeh; Fonseca, Luis L.; Escamilla-Treviño, Luis; Dixon, Richard A.; Voit, Eberhard O.

    2015-09-17

    Switchgrass is a prime target for biofuel production from inedible plant parts and has been the subject of numerous investigations in recent years. Yet, one of the main obstacles to effective biofuel production remains to be the major problem of recalcitrance. Recalcitrance emerges in part from the 3-D structure of lignin as a polymer in the secondary cell wall. Lignin limits accessibility of the sugars in the cellulose and hemicellulose polymers to enzymes and ultimately decreases ethanol yield. Monolignols, the building blocks of lignin polymers, are synthesized in the cytosol and translocated to the plant cell wall, where they undergomore » polymerization. The biosynthetic pathway leading to monolignols in switchgrass is not completely known, and difficulties associated with in vivo measurements of these intermediates pose a challenge for a true understanding of the functioning of the pathway. In this study, a systems biological modeling approach is used to address this challenge and to elucidate the structure and regulation of the lignin pathway through a computational characterization of alternate candidate topologies. The analysis is based on experimental data characterizing stem and tiller tissue of four transgenic lines (knock-downs of genes coding for key enzymes in the pathway) as well as wild-type switchgrass plants. These data consist of the observed content and composition of monolignols. The possibility of a G-lignin specific metabolic channel associated with the production and degradation of coniferaldehyde is examined, and the results support previous findings from another plant species. The computational analysis suggests regulatory mechanisms of product inhibition and enzyme competition, which are well known in biochemistry, but so far had not been reported in switchgrass. By including these mechanisms, the pathway model is able to represent all observations. In conclusion, the results show that the presence of the coniferaldehyde channel is

  3. The First Complete Chloroplast Genome Sequences in Actinidiaceae: Genome Structure and Comparative Analysis

    PubMed Central

    Yao, Xiaohong; Tang, Ping; Li, Zuozhou; Li, Dawei; Liu, Yifei; Huang, Hongwen

    2015-01-01

    Actinidia chinensis is an important economic plant belonging to the basal lineage of the asterids. Availability of a complete Actinidia chloroplast genome sequence is crucial to understanding phylogenetic relationships among major lineages of angiosperms and facilitates kiwifruit genetic improvement. We report here the complete nucleotide sequences of the chloroplast genomes for Actinidia chinensis and A. chinensis var deliciosa obtained through de novo assembly of Illumina paired-end reads produced by total DNA sequencing. The total genome size ranges from 155,446 to 157,557 bp, with an inverted repeat (IR) of 24,013 to 24,391 bp, a large single copy region (LSC) of 87,984 to 88,337 bp and a small single copy region (SSC) of 20,332 to 20,336 bp. The genome encodes 113 different genes, including 79 unique protein-coding genes, 30 tRNA genes and 4 ribosomal RNA genes, with 16 duplicated in the inverted repeats, and a tRNA gene (trnfM-CAU) duplicated once in the LSC region. Comparisons of IR boundaries among four asterid species showed that IR/LSC borders were extended into the 5’ portion of the psbA gene and IR contraction occurred in Actinidia. The clap gene has been lost from the chloroplast genome in Actinidia, and may have been transferred to the nucleus during chloroplast evolution. Twenty-seven polymorphic simple sequence repeat (SSR) loci were identified in the Actinidia chloroplast genome. Maximum parsimony analyses of a 72-gene, 16 taxa angiosperm dataset strongly support the placement of Actinidiaceae in Ericales within the basal asterids. PMID:26046631

  4. Genetic Structure and Inferences on Potential Source Areas for Bactrocera dorsalis (Hendel) Based on Mitochondrial and Microsatellite Markers

    PubMed Central

    Shi, Wei; Kerdelhué, Carole; Ye, Hui

    2012-01-01

    Bactrocera dorsalis (Diptera: Tephritidae) is mainly distributed in tropical and subtropical Asia and in the Pacific region. Despite its economic importance, very few studies have addressed the question of the wide genetic structure and potential source area of this species. This pilot study attempts to infer the native region of this pest and its colonization pathways in Asia. Combining mitochondrial and microsatellite markers, we evaluated the level of genetic diversity, genetic structure, and the gene flow among fly populations collected across Southeast Asia and China. A complex and significant genetic structure corresponding to the geographic pattern was found with both types of molecular markers. However, the genetic structure found was rather weak in both cases, and no pattern of isolation by distance was identified. Multiple long-distance dispersal events and miscellaneous host selection by this species may explain the results. These complex patterns may have been influenced by human-mediated transportation of the pest from one area to another and the complex topography of the study region. For both mitochondrial and microsatellite data, no signs of bottleneck or founder events could be identified. Nonetheless, maximal genetic diversity was observed in Myanmar, Vietnam and Guangdong (China) and asymmetric migration patterns were found. These results provide indirect evidence that the tropical regions of Southeast Asia and southern coast of China may be considered as the native range of the species and the population expansion is northward. Yunnan (China) is a contact zone that has been colonized from different sources. Regions along the southern coast of Vietnam and China probably served to colonize mainly the southern region of China. Southern coastal regions of China may also have colonized central parts of China and of central Yunnan. PMID:22615898

  5. Inferring social network structure in ecological systems from spatio-temporal data streams

    PubMed Central

    Psorakis, Ioannis; Roberts, Stephen J.; Rezek, Iead; Sheldon, Ben C.

    2012-01-01

    We propose a methodology for extracting social network structure from spatio-temporal datasets that describe timestamped occurrences of individuals. Our approach identifies temporal regions of dense agent activity and links are drawn between individuals based on their co-occurrences across these ‘gathering events’. The statistical significance of these connections is then tested against an appropriate null model. Such a framework allows us to exploit the wealth of analytical and computational tools of network analysis in settings where the underlying connectivity pattern between interacting agents (commonly termed the adjacency matrix) is not given a priori. We perform experiments on two large-scale datasets (greater than 106 points) of great tit Parus major wild bird foraging records and illustrate the use of this approach by examining the temporal dynamics of pairing behaviour, a process that was previously very hard to observe. We show that established pair bonds are maintained continuously, whereas new pair bonds form at variable times before breeding, but are characterized by a rapid development of network proximity. The method proposed here is general, and can be applied to any system with information about the temporal co-occurrence of interacting agents. PMID:22696481

  6. Emplacement history of Granophyre dikes in the Vredefort Impact Structure, South Africa, inferred from geochemical evidence

    NASA Astrophysics Data System (ADS)

    Lieger, Daniel; Riller, Ulrich

    2012-05-01

    The central Vredefort Impact Structure is characterised by impact melt rocks, known as the Vredefort Granophyre dikes, the mode of emplacement of which is not well known. Whole-rock and petrographic analyses of two dikes were conducted and compared to published geochemical data to elucidate the mode and timing of dike formation. The dikes are characterised by compositional and textural heterogeneity between, and within, individual dikes. Specifically, central dike portions are felsic and rich in wall rock fragments, whereas marginal dike phases are more mafic and fragment-poor. Collectively, this suggests that melt was derived from compositionally different parental melts and emplaced in at least two pulses. In addition, the chemical heterogeneity between fragment-rich and fragment-poor dike zones can be explained by variable assimilation of a mafic component, notably Ventersdorp basalt, at the base of the impact melt sheet, from which melt of the Granophyre dikes is derived. This scenario accounts for the mafic and fragment-poor character of melt emplaced first in the dikes and the more felsic and fragment-rich nature of melts of the following emplacement pulse, i.e., when the impact melt was less hot and thus less capable of digesting large quantities of (mafic) wall rock fragments. Differences in geometrical, textural, chemical and fragment characteristics between the Granophyre dikes and pseudotachylite bodies can be explained by the same process, i.e., impact melt drainage, but operating at different times after impact.

  7. Phylogeographic Structure in Anastrepha ludens (Diptera: Tephritidae) Populations Inferred With mtDNA Sequencing.

    PubMed

    Ruiz-Arce, Raul; Owen, Christopher L; Thomas, Donald B; Barr, Norman B; McPheron, Bruce A

    2015-06-01

    Anastrepha ludens (Loew) (Diptera: Tephritidae), the Mexican fruit fly, is a major pest of citrus and mango. It has a wide distribution in Mexico and Central America, with infestations occurring in Texas, California, and Florida with origins believed to have been centered in northeastern Mexico. This research evaluates the utility of a sequence-based approach for two mitochondrial (COI and ND6) gene regions. We use these markers to examine genetic diversity, estimate population structure, and identify diagnostic information for A. ludens populations. We analyzed 543 individuals from 67 geographic collections and found one predominant haplotype occurring in the majority of specimens. We observed 68 haplotypes in all and see differences among haplotypes belonging to northern and southern collections. Mexico haplotypes differ by few bases possibly as a result of a recent bottleneck event. In contrast to the hypothesis suggesting northeastern Mexico as the origin of this species, we see that specimens from two southern collections show high genetic variability delineating three mitochondrial groups. These data suggest that Central America is the origin for A. ludens. We show that COI and ND6 are useful for phylogeographic studies of A. ludens. PMID:26470261

  8. Population genetic structure of sexual and parthenogenetic damselflies inferred from mitochondrial and nuclear markers.

    PubMed

    Lorenzo-Carballa, M O; Hadrys, H; Cordero-Rivera, A; Andrés, J A

    2012-04-01

    It has been postulated that obligate asexual lineages may persist in the long term if they escape from negative interactions with either sexual lineages or biological enemies; and thus, parthenogenetic populations will be more likely to occur in places that are difficult for sexuals to colonize, or those in which biological interactions are rare, such as islands or island-like habitats. Ischnura hastata is the only known example of natural parthenogenesis within the insect order Odonata, and it represents also a typical example of geographic parthenogenesis, as sexual populations are widely distributed in North America, whereas parthenogenetic populations of this species have only been found at the Azores archipelago. In order to gain insight in the origin and distribution of parthenogenetic I. hastata lineages, we have used microsatellites, mitochondrial and nuclear DNA sequence data, to examine the population genetic structure of this species over a wide geographic area. Our results suggest that sexual populations of I. hastata in North America conform to a large subdivided population that has gone through a recent spatial expansion. A recent single long distance dispersal event, followed by a demographic expansion, is the most parsimonious hypothesis explaining the origin of the parthenogenetic population of this species in the Azores islands. PMID:21915148

  9. Methodological assessment of 2b-RAD genotyping technique for population structure inferences in yellowfin tuna (Thunnus albacares).

    PubMed

    Pecoraro, Carlo; Babbucci, Massimiliano; Villamor, Adriana; Franch, Rafaella; Papetti, Chiara; Leroy, Bruno; Ortega-Garcia, Sofia; Muir, Jeff; Rooker, Jay; Arocha, Freddy; Murua, Hilario; Zudaire, Iker; Chassot, Emmanuel; Bodin, Nathalie; Tinti, Fausto; Bargelloni, Luca; Cariani, Alessia

    2016-02-01

    Global population genetic structure of yellowfin tuna (Thunnus albacares) is still poorly understood despite its relevance for the tuna fishery industry. Low levels of genetic differentiation among oceans speak in favour of the existence of a single panmictic population worldwide of this highly migratory fish. However, recent studies indicated genetic structuring at a much smaller geographic scales than previously considered, pointing out that YFT population genetic structure has not been properly assessed so far. In this study, we demonstrated for the first time, the utility of 2b-RAD genotyping technique for investigating population genetic diversity and differentiation in high gene-flow species. Running de novo pipeline in Stacks, a total of 6772 high-quality genome-wide SNPs were identified across Atlantic, Indian and Pacific population samples representing all major distribution areas. Preliminary analyses showed shallow but significant population structure among oceans (FST=0.0273; P-value<0.01). Discriminant Analysis of Principal Components endorsed the presence of genetically discrete yellowfin tuna populations among three oceanic pools. Although such evidence needs to be corroborated by increasing sample size, these results showed the efficiency of this genotyping technique in assessing genetic divergence in a marine fish with high dispersal potential. PMID:26711352

  10. Structures of Metal-Substituted Human Histone Deacetylase 8 Provide Mechanistic Inferences on Biological Function

    SciTech Connect

    Dowling, Daniel P.; Gattis, Samuel G.; Fierke, Carol A.; Christianson, David W.

    2010-08-23

    The metal-dependent histone deacetylases (HDACs) adopt an {alpha}/{beta} protein fold first identified in rat liver arginase. Despite insignificant overall amino acid sequence identity, these enzymes share a strictly conserved metal binding site with divergent metal specificity and stoichiometry. HDAC8, originally thought to be a Zn{sup 2+}-metallohydrolase, exhibits increased activity with Co{sup 2+} and Fe{sup 2+} cofactors based on k{sub cat}/K{sub M} (Gantt, S. L., Gattis, S. G., and Fierke, C. A. (2006) Biochemistry 45, 6170-6178). Here, we report the first X-ray crystal structures of metallo-substituted HDAC8, Co{sup 2+}-HDAC8, D101L Co{sup 2+}-HDAC8, D101L Mn{sup 2+}-HDAC8, and D101L Fe{sup 2+}-HDAC8, each complexed with the inhibitor M344. Metal content of protein samples in solution is confirmed by inductively coupled plasma mass spectrometry. For the crystalline enzymes, peaks in Bijvoet difference Fourier maps calculated from X-ray diffraction data collected near the respective elemental absorption edges confirm metal substitution. Additional solution studies confirm incorporation of Cu{sup 2+}; Fe{sup 3+} and Ni{sup 2+} do not bind under conditions tested. The metal dependence of the substrate K{sub M} values and the K{sub i} values of hydroxamate inhibitors that chelate the active site metal are consistent with substrate-metal coordination in the precatalytic Michaelis complex that enhances catalysis. Additionally, although HDAC8 binds Zn{sup 2+} nearly 106-fold more tightly than Fe{sup 2+}, the affinities for both metal ions are comparable to the readily exchangeable metal concentrations estimated in living cells, suggesting that HDAC8 could bind either or both Fe{sup 2+} or Zn{sup 2+} in vivo.

  11. Deep structure of the Argentine margin inferred from 3D gravity and temperature modelling, Colorado Basin

    NASA Astrophysics Data System (ADS)

    Autin, J.; Scheck-Wenderoth, M.; Götze, H.-J.; Reichert, C.; Marchal, D.

    2016-04-01

    Following previous work on the Colorado Basin using a 3D crustal structural model, we now investigate the presence of lower crustal bodies at the base of the crust using 3D lithospheric gravity modelling and calculations of the conductive thermal field. Our first study highlighted two fault directions and depocentres associated with thinned crust (NW-SE in the West and NE-SW at the distal margin). Fault relative chronology argues for two periods of extension: (1) NW-SE faulting and thinning in the western Colorado Basin and (2) NE-SW faulting and thinning related to the continental breakup and formation of the NE-SW-striking volcanic margins of the Atlantic Ocean. In this study, the geometry of modelled high-density Lower Crustal Bodies (LCBs) enables the reproduction of the gravimetric field as well as of the temperature measured in wells down to 4500 m. The modelled LCBs correlate with geological observations: (1) NW-SE LCBs below the deepest depocentres in the West, (2) NE-SW LCBs below the distal margin faults and the seaward dipping reflectors. Thus the proposed poly-phased evolution of the margin could as well correspond to two emplacement phases of the LCBs. The calculated conductive thermal field fits the measured temperatures best if the thermal properties (thermal conductivity and radiogenic heat production) assigned to the LCBs correspond to either high-grade metamorphic rocks or to mafic magmatic intrusions. To explain the possible lithology of the LCBs, we propose that the two successive phases of extension are accompanied by magma supply, emplaced (1) in the thinnest crust below the older NW-SE depocentres, then (2) along the NE-SW continentward boundary of the distal margin and below the volcanic seaward dipping reflectors. The South African conjugate margin records only the second NE-SW event and we discuss hypotheses which could explain these differences between the conjugate margins.

  12. Crustal and uppermost mantle structures of Atlas Mountains of Morocco inferred from electromagnetic imaging

    NASA Astrophysics Data System (ADS)

    Kiyan, D.; Jones, A. G.; Fullea, J.; Ledo, J.; Siniscalchi, A.; Romano, G.

    2012-12-01

    The second phase of the PICASSO (Program to Investigate Convective Alboran Sea System Overturn) project and the concomitant TopoMed (Plate re-organization in the western Mediterranean: Lithospheric causes and topographic consequences - an ESF EUROCORES TOPO-EUROPE Collaborative Research Project) is designed to determine the internal structure of the crust and lithosphere of the Atlas Mountains of Morocco. A multi-institutional magnetotelluric (MT) experiment across the Atlas Mountains region comprises the acquisition of broadband and long period MT data along two profiles: a N-S oriented profile through Middle Atlas to the east and a NE-SW profile through Marrakech to the west. The preliminary results of interpretation of the MT data collected over the first profile were presented in the paper by Ledo et al. (2011). In this study, we present the results from 3D MT inversion using the codes WSINV3DMT (Siripunvaraporn et al., 2005) and Modular system for Electromagnetic Inversion (ModEM; Egbert and Kelbert, 2012). There is a general good agreement between the main features obtained from the 2D models and the new results of the 3D modelling. Models inverting for only off-diagonal tensor components showed a distinct conductivity contrast between Middle-High Atlas and Anti Atlas correlates with the South Atlas Front fault, the depth extent of which appears to be limited to uppermost mantle (approximately 55 km). The resistivity of the lithosphere is gradually increasing towards Anti Atlas. Beside this, a prominent conducting anomaly at the lower crust/uppermost mantle is imaged west of the profile in the junction between the High and Middle Atlas (Moulouya plain). The conductive body, which extends from the southern boundary of Middle Atlas to the northern boundary of High Atlas, is interpreted as due to the presence of partial melt and/or migrated fluids.

  13. Structure of the magnetopause inferred from one-dimensional hybrid simulations

    SciTech Connect

    Omidi, N.; Winske, D. |

    1995-07-01

    The structure of the magnetopause is investigated by studying the interaction between two plasmas with solar wind and magnetospheric properties. Both Hall-MHD and hybrid (particle ions and fluid electrons) simulations are performed to compare and contrast the nature of the solutions in the fluid and the kinetic limits. It is shown that, in accordance with previous studies, the fluid solutions consist of multiple discontinuities and waves such as slow shocks and waves, as well as contact and rotational discontinuities. In contrast, the kinetic solutions consist of fewer discontinuities and include non-MHD boundaries. The difference between the two types of solutions are attributed to the absence of contact surfaces in collisionless plasmas and the possible Landau damping of slow waves. The kinetic solutions are found to be in a much better agreement with magnetopause observations, which have shown little evidence for the presence of slow shocks or contact discontinuities. The results of kinetic calculations suggest that the presence of a small but finite normal component of the magnetic field allows for the mixing of magnetosheath and magnetospheric plasmas and may, in part, be responsible for the formation of the boundary layer even during the periods of northward interplanetary magnetic field. These results also show that the necessary changes in the fields and the plasma parameters across the magnetopause do not occur simultaneously (i.e., in the same spatial location). As such, the magnetopause cannot be viewed as a single boundary but instead is a region which includes both abrupt and gradual changes in the fields and plasma parameters.

  14. On how to avoid input and structural uncertainties corrupt the inference of hydrological parameters using a Bayesian framework

    NASA Astrophysics Data System (ADS)

    Hernández, Mario R.; Francés, Félix

    2015-04-01

    One phase of the hydrological models implementation process, significantly contributing to the hydrological predictions uncertainty, is the calibration phase in which values of the unknown model parameters are tuned by optimizing an objective function. An unsuitable error model (e.g. Standard Least Squares or SLS) introduces noise into the estimation of the parameters. The main sources of this noise are the input errors and the hydrological model structural deficiencies. Thus, the biased calibrated parameters cause the divergence model phenomenon, where the errors variance of the (spatially and temporally) forecasted flows far exceeds the errors variance in the fitting period, and provoke the loss of part or all of the physical meaning of the modeled processes. In other words, yielding a calibrated hydrological model which works well, but not for the right reasons. Besides, an unsuitable error model yields a non-reliable predictive uncertainty assessment. Hence, with the aim of prevent all these undesirable effects, this research focuses on the Bayesian joint inference (BJI) of both the hydrological and error model parameters, considering a general additive (GA) error model that allows for correlation, non-stationarity (in variance and bias) and non-normality of model residuals. As hydrological model, it has been used a conceptual distributed model called TETIS, with a particular split structure of the effective model parameters. Bayesian inference has been performed with the aid of a Markov Chain Monte Carlo (MCMC) algorithm called Dream-ZS. MCMC algorithm quantifies the uncertainty of the hydrological and error model parameters by getting the joint posterior probability distribution, conditioned on the observed flows. The BJI methodology is a very powerful and reliable tool, but it must be used correctly this is, if non-stationarity in errors variance and bias is modeled, the Total Laws must be taken into account. The results of this research show that the

  15. Shear-wave Velocity Structure of Surabaya, Indonesia, Inferred from Microtremor Observation

    NASA Astrophysics Data System (ADS)

    Deng, X.; Megawati, K.; Yamanaka, H.

    2010-12-01

    Surabaya, the second-largest city of Indonesia, is located on the northern shore of eastern Java at the mouth of the Mas River and along the edge of the Madura Strait. Though there are not many occurrences of large earthquakes, the region has been exposed to moderate earthquakes frequently. Moreover, Surabaya lies over the alluvium deposit that can be critical for the amplification of seismic waves. Towards the southwestern part of the study area, there is the exposure of sedimentary rocks. Shear wave velocity (Vs) is an important parameter for evaluating the dynamic soil behavior of subsurface structures. Small array measurements of microtremor at 36 sites in the city of Surabaya were carried to estimate the shallow Vs profiles for site effect analysis. The spatial autocorrelation coefficient (SPAC) method was used to determine the phase velocity in the period ranging from 0.1 s to 0.5 s and the iterative inversion technique was subsequently applied to determine the Vs profiles at each site. Large array measurements of microtremor at 11 sites were also carried out to estimate deep Vs profiles. Based on the previous shallow Vs profiles from SPAC, the frequency wavenumber (FK) method was subsequently used to determine the phase velocity in the period from 0.5 s to 3 s and depth of soil profiles down to 800 m. Vs30, the average shear wave velocity in the top 30 m of soil profiles, is a representative value to classify the site condition. It was calculated at each site according to definition in the International Building Code (IBC 2006). The maps of site classification are proposed with Kriging interpolations. Site class D is assigned in southwest corner part while site class E is assigned to the rest of Surabaya. The low Vs values confirms well with the loose soil deposit of the area. Additionally, the regression analysis between topographic elevation and Vs in the logarithm shows a linear relationship. However, it is difficult to conclude the relationship between

  16. Local chromatin structure of heterochromatin regulates repeatedDNA stability, nucleolus structure, and genome integrity

    SciTech Connect

    Peng, Jamy C.

    2007-05-05

    Heterochromatin constitutes a significant portion of the genome in higher eukaryotes; approximately 30% in Drosophila and human. Heterochromatin contains a high repeat DNA content and a low density of protein-encoding genes. In contrast, euchromatin is composed mostly of unique sequences and contains the majority of single-copy genes. Genetic and cytological studies demonstrated that heterochromatin exhibits regulatory roles in chromosome organization, centromere function and telomere protection. As an epigenetically regulated structure, heterochromatin formation is not defined by any DNA sequence consensus. Heterochromatin is characterized by its association with nucleosomes containing methylated-lysine 9 of histone H3 (H3K9me), heterochromatin protein 1 (HP1) that binds H3K9me, and Su(var)3-9, which methylates H3K9 and binds HP1. Heterochromatin formation and functions are influenced by HP1, Su(var)3-9, and the RNA interference (RNAi) pathway. My thesis project investigates how heterochromatin formation and function impact nuclear architecture, repeated DNA organization, and genome stability in Drosophila melanogaster. H3K9me-based chromatin reduces extrachromosomal DNA formation; most likely by restricting the access of repair machineries to repeated DNAs. Reducing extrachromosomal ribosomal DNA stabilizes rDNA repeats and the nucleolus structure. H3K9me-based chromatin also inhibits DNA damage in heterochromatin. Cells with compromised heterochromatin structure, due to Su(var)3-9 or dcr-2 (a component of the RNAi pathway) mutations, display severe DNA damage in heterochromatin compared to wild type. In these mutant cells, accumulated DNA damage leads to chromosomal defects such as translocations, defective DNA repair response, and activation of the G2-M DNA repair and mitotic checkpoints that ensure cellular and animal viability. My thesis research suggests that DNA replication, repair, and recombination mechanisms in heterochromatin differ from those in

  17. Comparative Chloroplast Genome Analyses of Streptophyte Green Algae Uncover Major Structural Alterations in the Klebsormidiophyceae, Coleochaetophyceae and Zygnematophyceae.

    PubMed

    Lemieux, Claude; Otis, Christian; Turmel, Monique

    2016-01-01

    The Streptophyta comprises all land plants and six main lineages of freshwater green algae: Mesostigmatophyceae, Chlorokybophyceae, Klebsormidiophyceae, Charophyceae, Coleochaetophyceae and Zygnematophyceae. Previous comparisons of the chloroplast genome from nine streptophyte algae (including four zygnematophyceans) revealed that, although land plant chloroplast DNAs (cpDNAs) inherited most of their highly conserved structural features from green algal ancestors, considerable cpDNA changes took place during the evolution of the Zygnematophyceae, the sister group of land plants. To gain deeper insights into the evolutionary dynamics of the chloroplast genome in streptophyte algae, we sequenced the cpDNAs of nine additional taxa: two klebsormidiophyceans (Entransia fimbriata and Klebsormidium sp. SAG 51.86), one coleocheatophycean (Coleochaete scutata) and six zygnematophyceans (Cylindrocystis brebissonii, Netrium digitus, Roya obtusa, Spirogyra maxima, Cosmarium botrytis and Closterium baillyanum). Our comparative analyses of these genomes with their streptophyte algal counterparts indicate that the large inverted repeat (IR) encoding the rDNA operon experienced loss or expansion/contraction in all three sampled classes and that genes were extensively shuffled in both the Klebsormidiophyceae and Zygnematophyceae. The klebsormidiophycean genomes boast greatly expanded IRs, with the Entransia 60,590-bp IR being the largest known among green algae. The 206,025-bp Entransia cpDNA, which is one of the largest genome among streptophytes, encodes 118 standard genes, i.e., four additional genes compared to its Klebsormidium flaccidum homolog. We inferred that seven of the 21 group II introns usually found in land plants were already present in the common ancestor of the Klebsormidiophyceae and its sister lineages. At 107,236 bp and with 117 standard genes, the Coleochaete IR-less genome is both the smallest and most compact among the streptophyte algal cpDNAs analyzed thus

  18. Comparative Chloroplast Genome Analyses of Streptophyte Green Algae Uncover Major Structural Alterations in the Klebsormidiophyceae, Coleochaetophyceae and Zygnematophyceae

    PubMed Central

    Lemieux, Claude; Otis, Christian; Turmel, Monique

    2016-01-01

    The Streptophyta comprises all land plants and six main lineages of freshwater green algae: Mesostigmatophyceae, Chlorokybophyceae, Klebsormidiophyceae, Charophyceae, Coleochaetophyceae and Zygnematophyceae. Previous comparisons of the chloroplast genome from nine streptophyte algae (including four zygnematophyceans) revealed that, although land plant chloroplast DNAs (cpDNAs) inherited most of their highly conserved structural features from green algal ancestors, considerable cpDNA changes took place during the evolution of the Zygnematophyceae, the sister group of land plants. To gain deeper insights into the evolutionary dynamics of the chloroplast genome in streptophyte algae, we sequenced the cpDNAs of nine additional taxa: two klebsormidiophyceans (Entransia fimbriata and Klebsormidium sp. SAG 51.86), one coleocheatophycean (Coleochaete scutata) and six zygnematophyceans (Cylindrocystis brebissonii, Netrium digitus, Roya obtusa, Spirogyra maxima, Cosmarium botrytis and Closterium baillyanum). Our comparative analyses of these genomes with their streptophyte algal counterparts indicate that the large inverted repeat (IR) encoding the rDNA operon experienced loss or expansion/contraction in all three sampled classes and that genes were extensively shuffled in both the Klebsormidiophyceae and Zygnematophyceae. The klebsormidiophycean genomes boast greatly expanded IRs, with the Entransia 60,590-bp IR being the largest known among green algae. The 206,025-bp Entransia cpDNA, which is one of the largest genome among streptophytes, encodes 118 standard genes, i.e., four additional genes compared to its Klebsormidium flaccidum homolog. We inferred that seven of the 21 group II introns usually found in land plants were already present in the common ancestor of the Klebsormidiophyceae and its sister lineages. At 107,236 bp and with 117 standard genes, the Coleochaete IR-less genome is both the smallest and most compact among the streptophyte algal cpDNAs analyzed thus

  19. The discrepancies in the results of bioinformatics tools for genomic structural annotation

    NASA Astrophysics Data System (ADS)

    Pawełkowicz, Magdalena; Nowak, Robert; Osipowski, Paweł; Rymuszka, Jacek; Świerkula, Katarzyna; Wojcieszek, Michał; Przybecki, Zbigniew

    2014-11-01

    A major focus of sequencing project is to identify genes in genomes. However it is necessary to define the variety of genes and the criteria for identifying them. In this work we present discrepancies and dependencies from the application of different bioinformatic programs for structural annotation performed on the cucumber data set from Polish Consortium of Cucumber Genome Sequencing. We use Fgenesh, GenScan and GeneMark to automated structural annotation, the results have been compared to reference annotation.

  20. Computational structural variation discovery in genomes: state of the art and challenges

    NASA Astrophysics Data System (ADS)

    Osipowski, Paweł; Pawełkowicz, Magdalena; Przybecki, Zbigniew

    2014-11-01

    Identifying structural variations is crucial to obtain comprehensive knowledge on genomic differentiation. Massive data generated by present technologies determines researchers to make use of computational methods for variation discovery in genomes. Focusing on results and trying to specify challenges remained and possible solutions for the future, here we give a review of state-of-the-art methods and software utilized for structural variation discovery.

  1. Inferring social structure and its drivers from refuge use in the desert tortoise, a relatively solitary species

    USGS Publications Warehouse

    Sah, Pratha; Nussear, Kenneth E.; Esque, Todd C.; Aiello, Christina M.; Hudson, Peter J.; Bansal, Shweta

    2016-01-01

    For several species, refuges (such as burrows, dens, roosts, nests) are an essential resource for protection from predators and extreme environmental conditions. Refuges also serve as focal sites for social interactions including mating, courtship and aggression. Knowledge of refuge use patterns can therefore provide information about social structure, mating and foraging success, as well as the robustness and health of wildlife populations, especially for species considered to be relatively solitary. In this study, we construct networks of burrow use to infer social associations in a threatened wildlife species typically considered solitary - the desert tortoise. We show that tortoise social networks are significantly different than null networks of random associations, and have moderate spatial constraints. We next use statistical models to identify major mechanisms behind individual-level variation in tortoise burrow use, popularity of burrows in desert tortoise habitat and test for stressor-driven changes in refuge use patterns. We show that seasonal variation has a strong impact on tortoise burrow switching behavior. On the other hand, burrow age and topographical condition influence the number of tortoises visiting a burrow in desert tortoise habitat. Of three major population stressors affecting this species (translocation, drought, disease), translocation alters tortoise burrow switching behavior, with translocated animals visiting fewer unique burrows than residents. In a species that is not social, our study highlights the importance of leveraging refuge use behavior to study the presence of and mechanisms behind non-random social structure and individual-level variation. Our analysis of the impact of stressors on refuge-based social structure further emphasizes the potential of this method to detect environmental or anthropogenic disturbances.

  2. Diversity of Genome Structure in Salmonella enterica Serovar Typhi Populations†

    PubMed Central

    Kothapalli, Sushma; Nair, Satheesh; Alokam, Suneetha; Pang, Tikki; Khakhria, Rasik; Woodward, David; Johnson, Wendy; Stocker, Bruce A. D.; Sanderson, Kenneth E.; Liu, Shu-Lin

    2005-01-01

    The genomes of most strains of Salmonella and Escherichia coli are highly conserved. In contrast, all 136 wild-type strains of Salmonella enterica serovar Typhi analyzed by partial digestion with I-CeuI (an endonuclease which cuts within the rrn operons) and pulsed-field gel electrophoresis and by PCR have rearrangements due to homologous recombination between the rrn operons leading to inversions and translocations. Recombination between rrn operons in culture is known to be equally frequent in S. enterica serovar Typhi and S. enterica serovar Typhimurium; thus, the recombinants in S. enterica serovar Typhi, but not those in S. enterica serovar Typhimurium, are able to survive in nature. However, even in S. enterica serovar Typhi the need for genome balance and the need for gene dosage impose limits on rearrangements. Of 100 strains of genome types 1 to 6, 72 were only 25.5 kb off genome balance (the relative lengths of the replichores during bidirectional replication from oriC to the termination of replication [Ter]), while 28 strains were less balanced (41 kb off balance), indicating that the survival of the best-balanced strains was greater. In addition, the need for appropriate gene dosage apparently selected against rearrangements which moved genes from their accustomed distance from oriC. Although rearrangements involving the seven rrn operons are very common in S. enterica serovar Typhi, other duplicated regions, such as the 25 IS200 elements, are very rarely involved in rearrangements. Large deletions and insertions in the genome are uncommon, except for deletions of Salmonella pathogenicity island 7 (usually 134 kb) from fragment I-CeuI-G and 40-kb insertions, possibly a prophage, in fragment I-CeuI-E. The phage types were determined, and the origins of the phage types appeared to be independent of the origins of the genome types. PMID:15805510

  3. Genome structure of introgressive lines Triticum aestivum/Aegilops sharonensis.

    PubMed

    Antonyuk, M Z; Bodylyova, M V; Ternovskaya, T K

    2009-01-01

    The lines Triticum aestivum/Aegilops sharonensis were explored in regard to the presence of introgressions in the line genomes, their amount and belonging to definite homoeologic group. The results of studying of chromosome associations in M1 of pollen mother celles in the hybrids between the lines with each other and with recurrent common wheat genotype Avrora were compared with the data of the line assessment for the chromosomal biochemical and morphological markers. 26 lines were distinguished between six groups with specific genome rearrangement regard to recurrent genotype. PMID:20458978

  4. Inferring Population Genetic Structure in Widely and Continuously Distributed Carnivores: The Stone Marten (Martes foina) as a Case Study.

    PubMed

    Vergara, María; Basto, Mafalda P; Madeira, María José; Gómez-Moliner, Benjamín J; Santos-Reis, Margarida; Fernandes, Carlos; Ruiz-González, Aritz

    2015-01-01

    The stone marten is a widely distributed mustelid in the Palaearctic region that exhibits variable habitat preferences in different parts of its range. The species is a Holocene immigrant from southwest Asia which, according to fossil remains, followed the expansion of the Neolithic farming cultures into Europe and possibly colonized the Iberian Peninsula during the Early Neolithic (ca. 7,000 years BP). However, the population genetic structure and historical biogeography of this generalist carnivore remains essentially unknown. In this study we have combined mitochondrial DNA (mtDNA) sequencing (621 bp) and microsatellite genotyping (23 polymorphic markers) to infer the population genetic structure of the stone marten within the Iberian Peninsula. The mtDNA data revealed low haplotype and nucleotide diversities and a lack of phylogeographic structure, most likely due to a recent colonization of the Iberian Peninsula by a few mtDNA lineages during the Early Neolithic. The microsatellite data set was analysed with a) spatial and non-spatial Bayesian individual-based clustering (IBC) approaches (STRUCTURE, TESS, BAPS and GENELAND), and b) multivariate methods [discriminant analysis of principal components (DAPC) and spatial principal component analysis (sPCA)]. Additionally, because isolation by distance (IBD) is a common spatial genetic pattern in mobile and continuously distributed species and it may represent a challenge to the performance of the above methods, the microsatellite data set was tested for its presence. Overall, the genetic structure of the stone marten in the Iberian Peninsula was characterized by a NE-SW spatial pattern of IBD, and this may explain the observed disagreement between clustering solutions obtained by the different IBC methods. However, there was significant indication for contemporary genetic structuring, albeit weak, into at least three different subpopulations. The detected subdivision could be attributed to the influence of the

  5. Inferring Population Genetic Structure in Widely and Continuously Distributed Carnivores: The Stone Marten (Martes foina) as a Case Study

    PubMed Central

    Vergara, María; Basto, Mafalda P.; Madeira, María José; Gómez-Moliner, Benjamín J.; Santos-Reis, Margarida; Fernandes, Carlos; Ruiz-González, Aritz

    2015-01-01

    The stone marten is a widely distributed mustelid in the Palaearctic region that exhibits variable habitat preferences in different parts of its range. The species is a Holocene immigrant from southwest Asia which, according to fossil remains, followed the expansion of the Neolithic farming cultures into Europe and possibly colonized the Iberian Peninsula during the Early Neolithic (ca. 7,000 years BP). However, the population genetic structure and historical biogeography of this generalist carnivore remains essentially unknown. In this study we have combined mitochondrial DNA (mtDNA) sequencing (621 bp) and microsatellite genotyping (23 polymorphic markers) to infer the population genetic structure of the stone marten within the Iberian Peninsula. The mtDNA data revealed low haplotype and nucleotide diversities and a lack of phylogeographic structure, most likely due to a recent colonization of the Iberian Peninsula by a few mtDNA lineages during the Early Neolithic. The microsatellite data set was analysed with a) spatial and non-spatial Bayesian individual-based clustering (IBC) approaches (STRUCTURE, TESS, BAPS and GENELAND), and b) multivariate methods [discriminant analysis of principal components (DAPC) and spatial principal component analysis (sPCA)]. Additionally, because isolation by distance (IBD) is a common spatial genetic pattern in mobile and continuously distributed species and it may represent a challenge to the performance of the above methods, the microsatellite data set was tested for its presence. Overall, the genetic structure of the stone marten in the Iberian Peninsula was characterized by a NE-SW spatial pattern of IBD, and this may explain the observed disagreement between clustering solutions obtained by the different IBC methods. However, there was significant indication for contemporary genetic structuring, albeit weak, into at least three different subpopulations. The detected subdivision could be attributed to the influence of the

  6. Wavelet phase analysis of two velocity components to infer the structure of interscale transfers in a turbulent boundary-layer

    NASA Astrophysics Data System (ADS)

    Keylock, Christopher J.; Nishimura, Kouichi

    2016-04-01

    Scale-dependent phase analysis of velocity time series measured in a zero pressure gradient boundary layer shows that phase coupling between longitudinal and vertical velocity components is strong at both large and small scales, but minimal in the middle of the inertial regime. The same general pattern is observed at all vertical positions studied, but there is stronger phase coherence as the vertical coordinate, y, increases. The phase difference histograms evolve from a unimodal shape at small scales to the development of significant bimodality at the integral scale and above. The asymmetry in the off-diagonal couplings changes sign at the midpoint of the inertial regime, with the small scale relation consistent with intense ejections followed by a more prolonged sweep motion. These results may be interpreted in a manner that is consistent with the action of low speed streaks and hairpin vortices near the wall, with large scale motions further from the wall, the effect of which penetrates to smaller scales. Hence, a measure of phase coupling, when combined with a scale-by-scale decomposition of perpendicular velocity components, is a useful tool for investigating boundary-layer structure and inferring process from single-point measurements.

  7. Duplex stem-loop-containing quadruplex motifs in the human genome: a combined genomic and structural study

    PubMed Central

    Lim, Kah Wai; Jenjaroenpun, Piroon; Low, Zhen Jie; Khong, Zi Jian; Ng, Yi Siang; Kuznetsov, Vladimir Andreevich; Phan, Anh Tuân

    2015-01-01

    Duplex stem-loops and four-stranded G-quadruplexes have been implicated in (patho)biological processes. Overlap of stem-loop- and quadruplex-forming sequences could give rise to quadruplex–duplex hybrids (QDH), which combine features of both structural forms and could exhibit unique properties. Here, we present a combined genomic and structural study of stem-loop-containing quadruplex sequences (SLQS) in the human genome. Based on a maximum loop length of 20 nt, our survey identified 80 307 SLQS, embedded within 60 172 unique clusters. Our analysis suggested that these should cover close to half of total SLQS in the entire genome. Among these, 48 508 SLQS were strand-specifically located in genic/promoter regions, with the majority of genes displaying a low number of SLQS. Notably, genes containing abundant SLQS clusters were strongly associated with brain tissues. Enrichment analysis of SLQS-positive genes and mapping of SLQS onto transcriptional/mutagenesis hotspots and cancer-associated genes, provided a statistical framework supporting the biological involvements of SLQS. In vitro formation of diverse QDH by selective SLQS hits were successfully verified by nuclear magnetic resonance spectroscopy. Folding topologies of two SLQS were elucidated in detail. We also demonstrated that sequence changes at mutation/single-nucleotide polymorphism loci could affect the structural conformations adopted by SLQS. Thus, our predicted SLQS offer novel insights into the potential involvement of QDH in diverse (patho)biological processes and could represent novel regulatory signals. PMID:25958397

  8. Sector structure of the interplanetary magnetic field in the second half of the 19th century inferred from ground-based magnetometers

    NASA Astrophysics Data System (ADS)

    Vokhmyanin, M.; Ponyavin, D. I.

    2012-12-01

    Interplanetary magnetic field (IMF) polarities can be inferred in the pre-satellite era using Svalgaard-Mansurov effect, according to which different IMF directions lead to different geomagnetic variations at polar stations. Basing on this effect we propose a method to derive a sector structure of the IMF when only ground based data are available. Details of the method and results have been presented in our recent paper: Vokhmyanin, M. V., and D. I. Ponyavin (2012), Inferring interplanetary magnetic field polarities from geomagnetic variations, J. Geophys. Res., 117, A06102, doi:10.1029/2011JA017060. Using data from eight stations: Sitka, Sodankyla, Godhavn, Lerwick, Thule, Baker Lake, Vostok and Mirny, we reconstructed sector structure back to 1905. The quality of inferring from 1965 to 2005 ranges between 78% and 90% depending on the used set of stations. Our results show both high success rate and good agreement with the well-known Russell-McPherron and Rosenberg-Coleman effects. In the current study we applied the technique to historical data of Helsinki observatory where digital versions of hourly geomagnetic components are available from 1844 to 1897. Helsinki station stopped operates at the beginning of 20th century. Thus, to create a model describing the local Svalgaard-Mansurov effect we analyzed data from Nurmijarvi station located near the same region. The success rate of reconstruction from 1965 to 2005 is around 82%. So we assume that the IMF polarities obtained for the period 1869-1889 have sufficient quality. Inferred sector structure at this time consists of two sectors typically for all declining phases of solar activity cycle. Catalogue of IMF proxies seem to be important in analyzing structure and dynamics of solar magnetic fields in the past.; Left: Bartels diagram of IMF sector structure inferred from Helsinki data. Right: sunspot number indicating solar cycles.

  9. Divergence of the mitochondrial genome structure in the apicomplexan parasites, Babesia and Theileria.

    PubMed

    Hikosaka, Kenji; Watanabe, Yoh-Ichi; Tsuji, Naotoshi; Kita, Kiyoshi; Kishine, Hiroe; Arisue, Nobuko; Palacpac, Nirianne Marie Q; Kawazu, Shin-Ichiro; Sawai, Hiromi; Horii, Toshihiro; Igarashi, Ikuo; Tanabe, Kazuyuki

    2010-05-01

    Mitochondrial (mt) genomes from diverse phylogenetic groups vary considerably in size, structure, and organization. The genus Plasmodium, causative agent of malaria, of the phylum Apicomplexa, has the smallest mt genome in the form of a circular and/or tandemly repeated linear element of 6 kb, encoding only three protein genes (cox1, cox3, and cob). The closely related genera Babesia and Theileria also have small mt genomes (6.6 kb) that are monomeric linear with an organization distinct from Plasmodium. To elucidate the structural divergence and evolution of mt genomes between Babesia/Theileria and Plasmodium, we determined five new sequences from Babesia bigemina, B. caballi, B. gibsoni, Theileria orientalis, and T. equi. Together with previously reported sequences of B. bovis, T. annulata, and T. parva, all eight Babesia and Theileria mt genomes are linear molecules with terminal inverted repeats (TIRs) on both ends containing three protein-coding genes (cox1, cox3, and cob) and six large subunit (LSU) ribosomal RNA (rRNA) gene fragments. The organization and transcriptional direction of protein-coding genes and the rRNA gene fragments were completely conserved in the four Babesia species. In contrast, notable variation occurred in the four Theileria species. Although the genome structures of T. annulata and T. parva were nearly identical to those of Babesia, an inversion in the 3-kb central region was found in T. orientalis. Moreover, the T. equi mt genome is the largest (8.2 kb) and most divergent with unusually long TIR sequences, in which cox3 and two LSU rRNA gene fragments are located. The T. equi mt genome showed little synteny to the other species. These results suggest that the Theileria mt genome is highly diverse with lineage-specific evolution in two Theileria species: genome inversion in T. orientalis and gene-embedded long TIR in T. equi. PMID:20034997

  10. Mosaic Structure Of Foot-And-Mouth Disease Virus Genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We report the results of a simple pairwise scanning analysis designed to identify inter-serotype recombination events applied to genome data from 144 isolates of foot-and-mouth disease virus (FMDV) representing all seven serotypes. We identify large numbers of candidate recombinant fragments from a...

  11. Mosaic Structure of Foot-and-Mouth Disease Virus Genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We report the results of a simple pairwise scanning analysis designed to identify inter-serotype recombination events applied to genome data from 144 isolates of foot-and-mouth disease virus (FMDV) representing all seven serotypes. We identify large numbers of candidate recombinant fragments from al...

  12. Studying Cattle Genomic Structural Variations in the Green Economy Era

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Transgenic cattle carrying multiple genomic modifications have been produced by serial rounds of somatic cell chromatin transfer (cloning) of sequentially genetically targeted somatic cells. However, cloning efficiency tends to decline with the increase of rounds of cloning. It is possible that mult...

  13. Training set optimization under population structure in genomic selection

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The optimization of the training set (TRS) in genomic selection (GS) has received much interest in both animal and plant breeding, because it is critical to the accuracy of the prediction models. In this study, five different TRS sampling algorithms, stratified sampling, mean of the Coefficient of D...

  14. Training set optimization under population structure in genomic selection

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The optimization of the training set (TRS) in genomic selection has received much interest in both animal and plant breeding, because it is critical to the accuracy of the prediction models. In this study, five different TRS sampling algorithms, stratified sampling, mean of the coefficient of determ...

  15. Model-based verification of hypotheses on the origin of modern Japanese revisited by Bayesian inference based on genome-wide SNP data.

    PubMed

    Nakagome, Shigeki; Sato, Takehiro; Ishida, Hajime; Hanihara, Tsunehiko; Yamaguchi, Tetsutaro; Kimura, Ryosuke; Mano, Shuhei; Oota, Hiroki

    2015-06-01

    Various hypotheses for the peopling of the Japanese archipelago have been proposed, which can be classified into three models: transformation, replacement, and hybridization. In recent years, one of the hybridization models ("dual-structure model") has been widely accepted. According to this model, Neolithic hunter-gatherers known as Jomon, who are assumed to have originated in southeast Asia and lived in the Japanese archipelago greater than 10,000 years ago, admixed with an agricultural people known as Yayoi, whom were migrants from the East Asian continent 2,000-3,000 years ago. Meanwhile, some anthropologists propose that rather, morphological differences between the Jomon and Yayoi people can be explained by microevolution following the lifestyle change. To resolve this controversy, we compared three demographic models by approximate Bayesian computation using genome-wide single nucleotide polymorphism (gwSNP) data from the Ainu people who are thought to be direct descendants of indigenous Jomon. If we assume Chinese people sampled in Beijing from HapMap have the same ancestry as Yayoi, then the hybridization model is predicted to be between 29 and 63 times more likely than the replacement and transformation models, respectively. Furthermore, our data provide strong support for a model in which the Jomon lineages had population structure diversified in local areas before the admixture event. Initial divergence between the Jomon and Yayoi ancestries was dated to late Pleistocene, followed by the divergence of Jomon lineages at early Holocene. These results suggest gwSNP data provides a detailed picture of the complex hybridization model for Japanese population history. PMID:25758010

  16. Coverage of whole proteome by structural genomics observed through protein homology modeling database

    PubMed Central

    Yamaguchi, Akihiro; Go, Mitiko

    2006-01-01

    We have been developing FAMSBASE, a protein homology-modeling database of whole ORFs predicted from genome sequences. The latest update of FAMSBASE (http://daisy.nagahama-i-bio.ac.jp/Famsbase/), which is based on the protein three-dimensional (3D) structures released by November 2003, contains modeled 3D structures for 368,724 open reading frames (ORFs) derived from genomes of 276 species, namely 17 archaebacterial, 130 eubacterial, 18 eukaryotic and 111 phage genomes. Those 276 genomes are predicted to have 734,193 ORFs in total and the current FAMSBASE contains protein 3D structure of approximately 50% of the ORF products. However, cases that a modeled 3D structure covers the whole part of an ORF product are rare. When portion of an ORF with 3D structure is compared in three kingdoms of life, in archaebacteria and eubacteria, approximately 60% of the ORFs have modeled 3D structures covering almost the entire amino acid sequences, however, the percentage falls to about 30% in eukaryotes. When annual differences in the number of ORFs with modeled 3D structure are calculated, the fraction of modeled 3D structures of soluble protein for archaebacteria is increased by 5%, and that for eubacteria by 7% in the last 3 years. Assuming that this rate would be maintained and that determination of 3D structures for predicted disordered regions is unattainable, whole soluble protein model structures of prokaryotes without the putative disordered regions will be in hand within 15 years. For eukaryotic proteins, they will be in hand within 25 years. The 3D structures we will have at those times are not the 3D structure of the entire proteins encoded in single ORFs, but the 3D structures of separate structural domains. Measuring or predicting spatial arrangements of structural domains in an ORF will then be a coming issue of structural genomics. PMID:17146617

  17. Inferring the Structure of the Solar Corona and Inner Heliosphere During the Maunder Minimum Using Global Thermodynamic Magnetohydrodynamic Simulations

    NASA Astrophysics Data System (ADS)

    Riley, Pete; Lionello, Roberto; Linker, Jon A.; Cliver, Ed; Balogh, Andre; Beer, Jürg; Charbonneau, Paul; Crooker, Nancy; DeRosa, Marc; Lockwood, Mike; Owens, Matt; McCracken, Ken; Usoskin, Ilya; Koutchmy, S.

    2015-04-01

    Observations of the Sun’s corona during the space era have led to a picture of relatively constant, but cyclically varying solar output and structure. Longer-term, more indirect measurements, such as from 10Be, coupled by other albeit less reliable contemporaneous reports, however, suggest periods of significant departure from this standard. The Maunder Minimum was one such epoch where: (1) sunspots effectively disappeared for long intervals during a 70 yr period; (2) eclipse observations suggested the distinct lack of a visible K-corona but possible appearance of the F-corona; (3) reports of aurora were notably reduced; and (4) cosmic ray intensities at Earth were inferred to be substantially higher. Using a global thermodynamic MHD model, we have constructed a range of possible coronal configurations for the Maunder Minimum period and compared their predictions with these limited observational constraints. We conclude that the most likely state of the corona during—at least—the later portion of the Maunder Minimum was not merely that of the 2008/2009 solar minimum, as has been suggested recently, but rather a state devoid of any large-scale structure, driven by a photospheric field composed of only ephemeral regions, and likely substantially reduced in strength. Moreover, we suggest that the Sun evolved from a 2008/2009-like configuration at the start of the Maunder Minimum toward an ephemeral-only configuration by the end of it, supporting a prediction that we may be on the cusp of a new grand solar minimum.

  18. The influence of landscape configuration and environment on population genetic structure in a sedentary passerine: insights from loci located in different genomic regions.

    PubMed

    Ferrer, E S; García-Navas, V; Bueno-Enciso, J; Barrientos, R; Serrano-Davies, E; Cáliz-Campal, C; Sanz, J J; Ortego, J

    2016-01-01

    The study of the factors structuring genetic variation can help to infer the neutral and adaptive processes shaping the demographic and evolutionary trajectories of natural populations. Here, we analyse the role of isolation by distance (IBD), isolation by resistance (IBR, defined by landscape composition) and isolation by environment (IBE, estimated as habitat and elevation dissimilarity) in structuring genetic variation in 25 blue tit (Cyanistes caeruleus) populations. We typed 1385 individuals at 26 microsatellite loci classified into two groups by considering whether they are located into genomic regions that are actively (TL; 12 loci) or not (NTL; 14 loci) transcribed to RNA. Population genetic differentiation was mostly detected using the panel of NTL. Landscape genetic analyses showed a pattern of IBD for all loci and the panel of NTL, but genetic differentiation estimated at TL was only explained by IBR models considering high resistance for natural vegetation and low resistance for agricultural lands. Finally, the absence for IBE suggests a lack of divergent selection pressures associated with differences in habitat and elevation. Overall, our study shows that markers located in different genomic regions can yield contrasting inferences on landscape-level patterns of realized gene flow in natural populations. PMID:26492434

  19. Cell-of-Origin-Specific 3D Genome Structure Acquired during Somatic Cell Reprogramming

    PubMed Central

    Krijger, Peter Hugo Lodewijk; Di Stefano, Bruno; de Wit, Elzo; Limone, Francesco; van Oevelen, Chris; de Laat, Wouter; Graf, Thomas

    2016-01-01

    Summary Forced expression of reprogramming factors can convert somatic cells into induced pluripotent stem cells (iPSCs). Here we studied genome topology dynamics during reprogramming of different somatic cell types with highly distinct genome conformations. We find large-scale topologically associated domain (TAD) repositioning and alterations of tissue-restricted genomic neighborhoods and chromatin loops, effectively erasing the somatic-cell-specific genome structures while establishing an embryonic stem-cell-like 3D genome. Yet, early passage iPSCs carry topological hallmarks that enable recognition of their cell of origin. These hallmarks are not remnants of somatic chromosome topologies. Instead, the distinguishing topological features are acquired during reprogramming, as we also find for cell-of-origin-dependent gene expression patterns. PMID:26971819

  20. Genome sequence, comparative analysis and haplotype structure of the domestic dog.

    PubMed

    Lindblad-Toh, Kerstin; Wade, Claire M; Mikkelsen, Tarjei S; Karlsson, Elinor K; Jaffe, David B; Kamal, Michael; Clamp, Michele; Chang, Jean L; Kulbokas, Edward J; Zody, Michael C; Mauceli, Evan; Xie, Xiaohui; Breen, Matthew; Wayne, Robert K; Ostrander, Elaine A; Ponting, Chris P; Galibert, Francis; Smith, Douglas R; DeJong, Pieter J; Kirkness, Ewen; Alvarez, Pablo; Biagi, Tara; Brockman, William; Butler, Jonathan; Chin, Chee-Wye; Cook, April; Cuff, James; Daly, Mark J; DeCaprio, David; Gnerre, Sante; Grabherr, Manfred; Kellis, Manolis; Kleber, Michael; Bardeleben, Carolyne; Goodstadt, Leo; Heger, Andreas; Hitte, Christophe; Kim, Lisa; Koepfli, Klaus-Peter; Parker, Heidi G; Pollinger, John P; Searle, Stephen M J; Sutter, Nathan B; Thomas, Rachael; Webber, Caleb; Baldwin, Jennifer; Abebe, Adal; Abouelleil, Amr; Aftuck, Lynne; Ait-Zahra, Mostafa; Aldredge, Tyler; Allen, Nicole; An, Peter; Anderson, Scott; Antoine, Claudel; Arachchi, Harindra; Aslam, Ali; Ayotte, Laura; Bachantsang, Pasang; Barry, Andrew; Bayul, Tashi; Benamara, Mostafa; Berlin, Aaron; Bessette, Daniel; Blitshteyn, Berta; Bloom, Toby; Blye, Jason; Boguslavskiy, Leonid; Bonnet, Claude; Boukhgalter, Boris; Brown, Adam; Cahill, Patrick; Calixte, Nadia; Camarata, Jody; Cheshatsang, Yama; Chu, Jeffrey; Citroen, Mieke; Collymore, Alville; Cooke, Patrick; Dawoe, Tenzin; Daza, Riza; Decktor, Karin; DeGray, Stuart; Dhargay, Norbu; Dooley, Kimberly; Dooley, Kathleen; Dorje, Passang; Dorjee, Kunsang; Dorris, Lester; Duffey, Noah; Dupes, Alan; Egbiremolen, Osebhajajeme; Elong, Richard; Falk, Jill; Farina, Abderrahim; Faro, Susan; Ferguson, Diallo; Ferreira, Patricia; Fisher, Sheila; FitzGerald, Mike; Foley, Karen; Foley, Chelsea; Franke, Alicia; Friedrich, Dennis; Gage, Diane; Garber, Manuel; Gearin, Gary; Giannoukos, Georgia; Goode, Tina; Goyette, Audra; Graham, Joseph; Grandbois, Edward; Gyaltsen, Kunsang; Hafez, Nabil; Hagopian, Daniel; Hagos, Birhane; Hall, Jennifer; Healy, Claire; Hegarty, Ryan; Honan, Tracey; Horn, Andrea; Houde, Nathan; Hughes, Leanne; Hunnicutt, Leigh; Husby, M; Jester, Benjamin; Jones, Charlien; Kamat, Asha; Kanga, Ben; Kells, Cristyn; Khazanovich, Dmitry; Kieu, Alix Chinh; Kisner, Peter; Kumar, Mayank; Lance, Krista; Landers, Thomas; Lara, Marcia; Lee, William; Leger, Jean-Pierre; Lennon, Niall; Leuper, Lisa; LeVine, Sarah; Liu, Jinlei; Liu, Xiaohong; Lokyitsang, Yeshi; Lokyitsang, Tashi; Lui, Annie; Macdonald, Jan; Major, John; Marabella, Richard; Maru, Kebede; Matthews, Charles; McDonough, Susan; Mehta, Teena; Meldrim, James; Melnikov, Alexandre; Meneus, Louis; Mihalev, Atanas; Mihova, Tanya; Miller, Karen; Mittelman, Rachel; Mlenga, Valentine; Mulrain, Leonidas; Munson, Glen; Navidi, Adam; Naylor, Jerome; Nguyen, Tuyen; Nguyen, Nga; Nguyen, Cindy; Nguyen, Thu; Nicol, Robert; Norbu, Nyima; Norbu, Choe; Novod, Nathaniel; Nyima, Tenchoe; Olandt, Peter; O'Neill, Barry; O'Neill, Keith; Osman, Sahal; Oyono, Lucien; Patti, Christopher; Perrin, Danielle; Phunkhang, Pema; Pierre, Fritz; Priest, Margaret; Rachupka, Anthony; Raghuraman, Sujaa; Rameau, Rayale; Ray, Verneda; Raymond, Christina; Rege, Filip; Rise, Cecil; Rogers, Julie; Rogov, Peter; Sahalie, Julie; Settipalli, Sampath; Sharpe, Theodore; Shea, Terrance; Sheehan, Mechele; Sherpa, Ngawang; Shi, Jianying; Shih, Diana; Sloan, Jessie; Smith, Cherylyn; Sparrow, Todd; Stalker, John; Stange-Thomann, Nicole; Stavropoulos, Sharon; Stone, Catherine; Stone, Sabrina; Sykes, Sean; Tchuinga, Pierre; Tenzing, Pema; Tesfaye, Senait; Thoulutsang, Dawa; Thoulutsang, Yama; Topham, Kerri; Topping, Ira; Tsamla, Tsamla; Vassiliev, Helen; Venkataraman, Vijay; Vo, Andy; Wangchuk, Tsering; Wangdi, Tsering; Weiand, Michael; Wilkinson, Jane; Wilson, Adam; Yadav, Shailendra; Yang, Shuli; Yang, Xiaoping; Young, Geneva; Yu, Qing; Zainoun, Joanne; Zembek, Lisa; Zimmer, Andrew; Lander, Eric S

    2005-12-01

    Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health. PMID:16341006

  1. Asymmetric cryo-EM reconstruction of phage MS2 reveals genome structure in situ.

    PubMed

    Koning, Roman I; Gomez-Blanco, Josue; Akopjana, Inara; Vargas, Javier; Kazaks, Andris; Tars, Kaspars; Carazo, José María; Koster, Abraham J

    2016-01-01

    In single-stranded ribonucleic acid (RNA) viruses, virus capsid assembly and genome packaging are intertwined processes. Using cryo-electron microscopy and single particle analysis we determined the asymmetric virion structure of bacteriophage MS2, which includes 178 copies of the coat protein, a single copy of the A-protein and the RNA genome. This reveals that in situ, the viral RNA genome can adopt a defined conformation. The RNA forms a branched network of stem-loops that almost all allocate near the capsid inner surface, while predominantly binding to coat protein dimers that are located in one-half of the capsid. This suggests that genomic RNA is highly involved in genome packaging and virion assembly. PMID:27561669

  2. The emerging biofuel crop Camelina sativa retains a highly undifferentiated hexaploid genome structure.

    PubMed

    Kagale, Sateesh; Koh, Chushin; Nixon, John; Bollina, Venkatesh; Clarke, Wayne E; Tuteja, Reetu; Spillane, Charles; Robinson, Stephen J; Links, Matthew G; Clarke, Carling; Higgins, Erin E; Huebert, Terry; Sharpe, Andrew G; Parkin, Isobel A P

    2014-01-01

    Camelina sativa is an oilseed with desirable agronomic and oil-quality attributes for a viable industrial oil platform crop. Here we generate the first chromosome-scale high-quality reference genome sequence for C. sativa and annotated 89,418 protein-coding genes, representing a whole-genome triplication event relative to the crucifer model Arabidopsis thaliana. C. sativa represents the first crop species to be sequenced from lineage I of the Brassicaceae. The well-preserved hexaploid genome structure of C. sativa surprisingly mirrors those of economically important amphidiploid Brassica crop species from lineage II as well as wheat and cotton. The three genomes of C. sativa show no evidence of fractionation bias and limited expression-level bias, both characteristics commonly associated with polyploid evolution. The highly undifferentiated polyploid genome of C. sativa presents significant consequences for breeding and genetic manipulation of this industrial oil crop. PMID:24759634

  3. The emerging biofuel crop Camelina sativa retains a highly undifferentiated hexaploid genome structure

    PubMed Central

    Kagale, Sateesh; Koh, Chushin; Nixon, John; Bollina, Venkatesh; Clarke, Wayne E.; Tuteja, Reetu; Spillane, Charles; Robinson, Stephen J.; Links, Matthew G.; Clarke, Carling; Higgins, Erin E.; Huebert, Terry; Sharpe, Andrew G.; Parkin, Isobel A. P.

    2014-01-01

    Camelina sativa is an oilseed with desirable agronomic and oil-quality attributes for a viable industrial oil platform crop. Here we generate the first chromosome-scale high-quality reference genome sequence for C. sativa and annotated 89,418 protein-coding genes, representing a whole-genome triplication event relative to the crucifer model Arabidopsis thaliana. C. sativa represents the first crop species to be sequenced from lineage I of the Brassicaceae. The well-preserved hexaploid genome structure of C. sativa surprisingly mirrors those of economically important amphidiploid Brassica crop species from lineage II as well as wheat and cotton. The three genomes of C. sativa show no evidence of fractionation bias and limited expression-level bias, both characteristics commonly associated with polyploid evolution. The highly undifferentiated polyploid genome of C. sativa presents significant consequences for breeding and genetic manipulation of this industrial oil crop. PMID:24759634

  4. Asymmetric cryo-EM reconstruction of phage MS2 reveals genome structure in situ

    PubMed Central

    Koning, Roman I; Gomez-Blanco, Josue; Akopjana, Inara; Vargas, Javier; Kazaks, Andris; Tars, Kaspars; Carazo, José María; Koster, Abraham J.

    2016-01-01

    In single-stranded ribonucleic acid (RNA) viruses, virus capsid assembly and genome packaging are intertwined processes. Using cryo-electron microscopy and single particle analysis we determined the asymmetric virion structure of bacteriophage MS2, which includes 178 copies of the coat protein, a single copy of the A-protein and the RNA genome. This reveals that in situ, the viral RNA genome can adopt a defined conformation. The RNA forms a branched network of stem-loops that almost all allocate near the capsid inner surface, while predominantly binding to coat protein dimers that are located in one-half of the capsid. This suggests that genomic RNA is highly involved in genome packaging and virion assembly. PMID:27561669

  5. Viscosity structure of Earth's mantle inferred from rotational variations due to GIA process and recent melting events

    NASA Astrophysics Data System (ADS)

    Nakada, Masao; Okuno, Jun'ichi; Lambeck, Kurt; Purcell, Anthony

    2015-08-01

    We examine the geodetically derived rotational variations for the rate of change of degree-two harmonics of Earth's geopotential, skew5dot J_2, and true polar wander, combining a recent melting model of glaciers and the Greenland and Antarctic ice sheets taken from the IPCC 2013 Report (AR5) with two representative GIA ice models describing the last deglaciation, ICE5G and the ANU model developed at the Australian National University. Geodetically derived observations of skew4dot J_2 are characterized by temporal changes of -(3.7 ± 0.1) × 10-11 yr-1 for the period 1976-1990 and -(0.3 ± 0.1) × 10-11 yr-1 after ˜2000. The AR5 results make it possible to evaluate the recent melting of the major ice sheets and glaciers for three periods, 1900-1990, 1991-2001 and after 2002. The observed skew4dot J_2 and the component of skew4dot J_2 due to recent melting for different periods indicate a long-term change in skew4dot J_2-attributed to the Earth's response to the last glacial cycle-of -(6.0-6.5) × 10-11 yr-1, significantly different from the values adopted to infer the viscosity structure of the mantle in most previous studies. This is a main conclusion of this study. We next compare this estimate with the values of skew4dot J_2 predicted by GIA ice models to infer the viscosity structure of the mantle, and consequently obtain two permissible solutions for the lower mantle viscosity (ηlm), ˜1022 and (5-10) × 1022 Pa s, for both adopted ice models. These two solutions are largely insensitive to the lithospheric thickness and upper mantle viscosity as indicated by previous studies and relatively insensitive to the viscosity structure of the D″ layer. The ESL contributions from the Antarctic ice sheet since the last glacial maximum (LGM) for ICE5G and ANU are about 20 and 30 m, respectively, but glaciological reconstructions of the Antarctic LGM ice sheet have suggested that its ESL contribution may have been less than ˜10 m. The GIA-induced skew4dot J_2 for GIA

  6. SV-Bay: structural variant detection in cancer genomes using a Bayesian approach with correction for GC-content and read mappability

    PubMed Central

    Iakovishina, Daria; Janoueix-Lerosey, Isabelle; Barillot, Emmanuel; Regnier, Mireille; Boeva, Valentina

    2016-01-01

    Motivation: Whole genome sequencing of paired-end reads can be applied to characterize the landscape of large somatic rearrangements of cancer genomes. Several methods for detecting structural variants with whole genome sequencing data have been developed. So far, none of these methods has combined information about abnormally mapped read pairs connecting rearranged regions and associated global copy number changes automatically inferred from the same sequencing data file. Our aim was to create a computational method that could use both types of information, i.e. normal and abnormal reads, and demonstrate that by doing so we can highly improve both sensitivity and specificity rates of structural variant prediction. Results: We developed a computational method, SV-Bay, to detect structural variants from whole genome sequencing mate-pair or paired-end data using a probabilistic Bayesian approach. This approach takes into account depth of coverage by normal reads and abnormalities in read pair mappings. To estimate the model likelihood, SV-Bay considers GC-content and read mappability of the genome, thus making important corrections to the expected read count. For the detection of somatic variants, SV-Bay makes use of a matched normal sample when it is available. We validated SV-Bay on simulated datasets and an experimental mate-pair dataset for the CLB-GA neuroblastoma cell line. The comparison of SV-Bay with several other methods for structural variant detection demonstrated that SV-Bay has better prediction accuracy both in terms of sensitivity and false-positive detection rate. Availability and implementation: https://github.com/InstitutCurie/SV-Bay Contact: valentina.boeva@inserm.fr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26740523

  7. Mining 3D genome structure populations identifies major factors governing the stability of regulatory communities

    PubMed Central

    Dai, Chao; Li, Wenyuan; Tjong, Harianto; Hao, Shengli; Zhou, Yonggang; Li, Qingjiao; Chen, Lin; Zhu, Bing; Alber, Frank; Jasmine Zhou, Xianghong

    2016-01-01

    Three-dimensional (3D) genome structures vary from cell to cell even in an isogenic sample. Unlike protein structures, genome structures are highly plastic, posing a significant challenge for structure-function mapping. Here we report an approach to comprehensively identify 3D chromatin clusters that each occurs frequently across a population of genome structures, either deconvoluted from ensemble-averaged Hi-C data or from a collection of single-cell Hi-C data. Applying our method to a population of genome structures (at the macrodomain resolution) of lymphoblastoid cells, we identify an atlas of stable inter-chromosomal chromatin clusters. A large number of these clusters are enriched in binding of specific regulatory factors and are therefore defined as ‘Regulatory Communities.' We reveal two major factors, centromere clustering and transcription factor binding, which significantly stabilize such communities. Finally, we show that the regulatory communities differ substantially from cell to cell, indicating that expression variability could be impacted by genome structures. PMID:27240697

  8. Hemipteran Mitochondrial Genomes: Features, Structures and Implications for Phylogeny

    PubMed Central

    Wang, Yuan; Chen, Jing; Jiang, Li-Yun; Qiao, Ge-Xia

    2015-01-01

    The study of Hemipteran mitochondrial genomes (mitogenomes) began with the Chagas disease vector, Triatoma dimidiata, in 2001. At present, 90 complete Hemipteran mitogenomes have been sequenced and annotated. This review examines the history of Hemipteran mitogenomes research and summarizes the main features of them including genome organization, nucleotide composition, protein-coding genes, tRNAs and rRNAs, and non-coding regions. Special attention is given to the comparative analysis of repeat regions. Gene rearrangements are an additional data type for a few families, and most mitogenomes are arranged in the same order to the proposed ancestral insect. We also discuss and provide insights on the phylogenetic analyses of a variety of taxonomic levels. This review is expected to further expand our understanding of research in this field and serve as a valuable reference resource. PMID:26039239

  9. Nanopatterned structures for biomolecular analysis toward genomic and proteomic applications

    NASA Astrophysics Data System (ADS)

    Chou, Chia-Fu; Gu, Jian; Wei, Qihuo; Liu, Yingjie; Gupta, Ravi; Nishio, Takeyoshi; Zenhausern, Frederic

    2005-01-01

    We report our fabrication of nanoscale devices using electron beam and nanoimprint lithography (NIL). We focus our study in the emerging fields of NIL, nanophotonics and nanobiotechnology and give a few examples as to how these nanodevices may be applied toward genomic and proteomic applications for molecular analysis. The examples include reverse NIL-fabricated nanofluidic channels for DNA stretching, nanoscale molecular traps constructed from dielectric constrictions for DNA or protein focusing by dielectrophoresis, multi-layer nanoburger and nanoburger multiplets for optimized surface-plasma enhanced Raman scattering for protein detection, and biomolecular motor-based nanosystems. The development of advanced nanopatterning techniques promises reliable and high-throughput manufacturing of nanodevices which could impact significantly on the areas of genomics, proteomics, drug discovery and molecular clinical diagnostics.

  10. Phylogenetic relationship and virulence inference of Streptococcus Anginosus Group: curated annotation and whole-genome comparative analysis support distinct species designation

    PubMed Central

    2013-01-01

    Background The Streptococcus Anginosus Group (SAG) represents three closely related species of the viridans group streptococci recognized as commensal bacteria of the oral, gastrointestinal and urogenital tracts. The SAG also cause severe invasive infections, and are pathogens during cystic fibrosis (CF) pulmonary exacerbation. Little genomic information or description of virulence mechanisms is currently available for SAG. We conducted intra and inter species whole-genome comparative analyses with 59 publically available Streptococcus genomes and seven in-house closed high quality finished SAG genomes; S. constellatus (3), S. intermedius (2), and S. anginosus (2). For each SAG species, we sequenced at least one numerically dominant strain from CF airways recovered during acute exacerbation and an invasive, non-lung isolate. We also evaluated microevolution that occurred within two isolates that were cultured from one individual one year apart. Results The SAG genomes were most closely related to S. gordonii and S. sanguinis, based on shared orthologs and harbor a similar number of proteins within each COG category as other Streptococcus species. Numerous characterized streptococcus virulence factor homologs were identified within the SAG genomes including; adherence, invasion, spreading factors, LPxTG cell wall proteins, and two component histidine kinases known to be involved in virulence gene regulation. Mobile elements, primarily integrative conjugative elements and bacteriophage, account for greater than 10% of the SAG genomes. S. anginosus was the most variable species sequenced in this study, yielding both the smallest and the largest SAG genomes containing multiple genomic rearrangements, insertions and deletions. In contrast, within the S. constellatus and S. intermedius species, there was extensive continuous synteny, with only slight differences in genome size between strains. Within S. constellatus we were able to determine important SNPs and changes in

  11. Unsupervised pattern discovery in human chromatin structure through genomic segmentation.

    PubMed

    Hoffman, Michael M; Buske, Orion J; Wang, Jie; Weng, Zhiping; Bilmes, Jeff A; Noble, William Stafford

    2012-05-01

    We trained Segway, a dynamic Bayesian network method, simultaneously on chromatin data from multiple experiments, including positions of histone modifications, transcription-factor binding and open chromatin, all derived from a human chronic myeloid leukemia cell line. In an unsupervised fashion, we identified patterns associated with transcription start sites, gene ends, enhancers, transcriptional regulator CTCF-binding regions and repressed regions. Software and genome browser tracks are at http://noble.gs.washington.edu/proj/segway/. PMID:22426492

  12. Physical and genetic structure of the maize genome reflects its complex evolutionary history.

    PubMed

    Wei, Fusheng; Coe, Ed; Nelson, William; Bharti, Arvind K; Engler, Fred; Butler, Ed; Kim, HyeRan; Goicoechea, Jose Luis; Chen, Mingsheng; Lee, Seunghee; Fuks, Galina; Sanchez-Villeda, Hector; Schroeder, Steven; Fang, Zhiwei; McMullen, Michael; Davis, Georgia; Bowers, John E; Paterson, Andrew H; Schaeffer, Mary; Gardiner, Jack; Cone, Karen; Messing, Joachim; Soderlund, Carol; Wing, Rod A

    2007-07-01

    Maize (Zea mays L.) is one of the most important cereal crops and a model for the study of genetics, evolution, and domestication. To better understand maize genome organization and to build a framework for genome sequencing, we constructed a sequence-ready fingerprinted contig-based physical map that covers 93.5% of the genome, of which 86.1% is aligned to the genetic map. The fingerprinted contig map contains 25,908 genic markers that enabled us to align nearly 73% of the anchored maize genome to the rice genome. The distribution pattern of expressed sequence tags correlates to that of recombination. In collinear regions, 1 kb in rice corresponds to an average of 3.2 kb in maize, yet maize has a 6-fold genome size expansion. This can be explained by the fact that most rice regions correspond to two regions in maize as a result of its recent polyploid origin. Inversions account for the majority of chromosome structural variations during subsequent maize diploidization. We also find clear evidence of ancient genome duplication predating the divergence of the progenitors of maize and rice. Reconstructing the paleoethnobotany of the maize genome indicates that the progenitors of modern maize contained ten chromosomes. PMID:17658954

  13. Informational structure of two closely related eukaryotic genomes

    NASA Astrophysics Data System (ADS)

    Dehnert, Manuel; Helm, Werner E.; Hütt, Marc-Thorsten

    2006-08-01

    Attempts to identify a species on the basis of its DNA sequence on purely statistical grounds have been formulated for more than a decade. The most prominent of such genome signatures relies on neighborhood correlations (i.e., dinucleotide frequencies) and, consequently, attributes species identification to mechanisms operating on the dinucleotide level (e.g., neighbor-dependent mutations). For the examples of Mus musculus and Rattus norvegicus we analyze short- and intermediate-range statistical correlations in DNA sequences. These correlation profiles are computed for all chromosomes of the two species. We find that with increasing range of correlations the capacity to distinguish between the species on the basis of this correlation profile is getting better and requires ever shorter sequence segments for obtaining a full species separation. This finding suggests that distinctive traits within the sequence are situated beyond the level of few nucleotides. The large-scale statistical patterning of DNA sequences on which such genome signatures are based is thus substantially determined by mobile elements (e.g., transposons and retrotransposons). The study and interspecies comparison of such correlation profiles can, therefore, reveal features of retrotransposition, segmental duplications, and other processes of genome evolution.

  14. Three-dimensional structure of a viral genome-delivery portal vertex

    PubMed Central

    Olia, Adam S.; Prevelige, Peter E.; Johnson, John E.; Cingolani, Gino

    2011-01-01

    DNA viruses such as bacteriophages and herpesviruses deliver their genome into and out of the capsid through large proteinaceous assemblies, known as portal proteins. Here we report two snapshots of the dodecameric portal protein of bacteriophage P22. The 3.25 Å resolution structure of the portal protein core bound to twelve copies of gp4 reveals a ~1.1 MDa assembly formed by 24 proteins. Unexpectedly, a lower resolution structure of the full length portal protein unveils the unique topology of the C-terminal domain, which forms a ~200 Å long, α-helical barrel. This domain inserts deeply into the virion and is highly conserved in the Podoviridae family. We propose that the barrel domain facilitates genome spooling onto the interior surface of the capsid during genome packaging and, in analogy to a rifle barrel, increases the accuracy of genome ejection into the host cell. PMID:21499245

  15. Three-dimensional Structure of a Viral Genome-delivery Portal Vertex

    SciTech Connect

    A Olia; P Prevelige Jr.; J Johnson; G Cingolani

    2011-12-31

    DNA viruses such as bacteriophages and herpesviruses deliver their genome into and out of the capsid through large proteinaceous assemblies, known as portal proteins. Here, we report two snapshots of the dodecameric portal protein of bacteriophage P22. The 3.25-{angstrom}-resolution structure of the portal-protein core bound to 12 copies of gene product 4 (gp4) reveals a {approx}1.1-MDa assembly formed by 24 proteins. Unexpectedly, a lower-resolution structure of the full-length portal protein unveils the unique topology of the C-terminal domain, which forms a {approx}200-{angstrom}-long {alpha}-helical barrel. This domain inserts deeply into the virion and is highly conserved in the Podoviridae family. We propose that the barrel domain facilitates genome spooling onto the interior surface of the capsid during genome packaging and, in analogy to a rifle barrel, increases the accuracy of genome ejection into the host cell.

  16. Detection of Genomic Structural Variants from Next-Generation Sequencing Data

    PubMed Central

    Tattini, Lorenzo; D’Aurizio, Romina; Magi, Alberto

    2015-01-01

    Structural variants are genomic rearrangements larger than 50 bp accounting for around 1% of the variation among human genomes. They impact on phenotypic diversity and play a role in various diseases including neurological/neurocognitive disorders and cancer development and progression. Dissecting structural variants from next-generation sequencing data presents several challenges and a number of approaches have been proposed in the literature. In this mini review, we describe and summarize the latest tools – and their underlying algorithms – designed for the analysis of whole-genome sequencing, whole-exome sequencing, custom captures, and amplicon sequencing data, pointing out the major advantages/drawbacks. We also report a summary of the most recent applications of third-generation sequencing platforms. This assessment provides a guided indication – with particular emphasis on human genetics and copy number variants – for researchers involved in the investigation of these genomic events. PMID:26161383

  17. Crustal shear wave velocity structure of the western United States inferred from ambient seismic noise and earthquake data

    NASA Astrophysics Data System (ADS)

    Moschetti, M. P.; Ritzwoller, M. H.; Lin, F.-C.; Yang, Y.

    2010-10-01

    heterogeneity. We argue that these lower crustal structures are predominantly caused by interactions with the uppermost mantle, including the intrusion and underplating of mafic mantle materials and the thermal depression of wave speeds caused by conductive heating from the mantle. Upper and middle crustal wave speeds are generally correlated, and notable anomalies are inferred to result from terrane accretion at the continental margin and volcanic intrusions.

  18. Visualizing the global secondary structure of a viral RNA genome with cryo-electron microscopy

    PubMed Central

    Garmann, Rees F.; Gopal, Ajaykumar; Athavale, Shreyas S.; Knobler, Charles M.; Gelbart, William M.; Harvey, Stephen C.

    2015-01-01

    The lifecycle, and therefore the virulence, of single-stranded (ss)-RNA viruses is regulated not only by their particular protein gene products, but also by the secondary and tertiary structure of their genomes. The secondary structure of the entire genomic RNA of satellite tobacco mosaic virus (STMV) was recently determined by selective 2′-hydroxyl acylation analyzed by primer extension (SHAPE). The SHAPE analysis suggested a single highly extended secondary structure with much less branching than occurs in the ensemble of structures predicted by purely thermodynamic algorithms. Here we examine the solution-equilibrated STMV genome by direct visualization with cryo-electron microscopy (cryo-EM), using an RNA of similar length transcribed from the yeast genome as a control. The cryo-EM data reveal an ensemble of branching patterns that are collectively consistent with the SHAPE-derived secondary structure model. Thus, our results both elucidate the statistical nature of the secondary structure of large ss-RNAs and give visual support for modern RNA structure determination methods. Additionally, this work introduces cryo-EM as a means to distinguish between competing secondary structure models if the models differ significantly in terms of the number and/or length of branches. Furthermore, with the latest advances in cryo-EM technology, we suggest the possibility of developing methods that incorporate restraints from cryo-EM into the next generation of algorithms for the determination of RNA secondary and tertiary structures. PMID:25752599

  19. The complete genome sequence and genome structure of passion fruit mosaic virus.

    PubMed

    Song, Yeon Sook; Ryu, Ki Hyun

    2011-06-01

    In this study, we determined the complete sequence of the genomic RNA of a Florida isolate of maracuja mosaic virus (MarMV-FL) and compared it to that of a Peru isolate of the virus (MarMV-P) and those of other known tobamoviruses. Complete sequence analysis revealed that the isolate should be considered a member of a new species and named passion fruit mosaic virus (PafMV). The genomic RNA of PafMV consists of 6,791 nucleotides and encodes four open reading frames (ORFs) coding for proteins of 125 kDa (1,101 aa), 184 kDa (1,612 aa), 34 kDa (311 aa) and 18 kDa (164 aa) in consecutive order from the 5' to the 3' end. The sequence homologies of the four ORFs of PafMV were from 78.8% to 81.6% to those of MarMV-P at the amino acid level. The sequence homologies of the four ORFs of PafMV ranged from 36.0% to 77.9% and from 21.7% to 81.6% to those of other tobamoviruses, at the nucleotide and amino acid level, respectively. Phylogenetic analysis revealed that these PafMV-encoded proteins are closely related to those of MarMV-P. In conclusion, the results indicate that PafMV and MarMV-P belong to different species within the genus Tobamovirus. PMID:21547441