Gene context analysis in the Integrated Microbial Genomes (IMG) data management system.
Mavromatis, Konstantinos; Chu, Ken; Ivanova, Natalia; Hooper, Sean D; Markowitz, Victor M; Kyrpides, Nikos C
2009-11-24
Computational methods for determining the function of genes in newly sequenced genomes have been traditionally based on sequence similarity to genes whose function has been identified experimentally. Function prediction methods can be extended using gene context analysis approaches such as examining the conservation of chromosomal gene clusters, gene fusion events and co-occurrence profiles across genomes. Context analysis is based on the observation that functionally related genes are often having similar gene context and relies on the identification of such events across phylogenetically diverse collection of genomes. We have used the data management system of the Integrated Microbial Genomes (IMG) as the framework to implement and explore the power of gene context analysis methods because it provides one of the largest available genome integrations. Visualization and search tools to facilitate gene context analysis have been developed and applied across all publicly available archaeal and bacterial genomes in IMG. These computations are now maintained as part of IMG's regular genome content update cycle. IMG is available at: http://img.jgi.doe.gov.
IMG: the integrated microbial genomes database and comparative analysis system
Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Grechkin, Yuri; Ratner, Anna; Jacob, Biju; Huang, Jinghua; Williams, Peter; Huntemann, Marcel; Anderson, Iain; Mavromatis, Konstantinos; Ivanova, Natalia N.; Kyrpides, Nikos C.
2012-01-01
The Integrated Microbial Genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG integrates publicly available draft and complete genomes from all three domains of life with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. IMG's data content and analytical capabilities have been continuously extended through regular updates since its first release in March 2005. IMG is available at http://img.jgi.doe.gov. Companion IMG systems provide support for expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er), teaching courses and training in microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu) and analysis of genomes related to the Human Microbiome Project (IMG/HMP: http://www.hmpdacc-resources.org/img_hmp). PMID:22194640
IMG: the Integrated Microbial Genomes database and comparative analysis system.
Markowitz, Victor M; Chen, I-Min A; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Grechkin, Yuri; Ratner, Anna; Jacob, Biju; Huang, Jinghua; Williams, Peter; Huntemann, Marcel; Anderson, Iain; Mavromatis, Konstantinos; Ivanova, Natalia N; Kyrpides, Nikos C
2012-01-01
The Integrated Microbial Genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG integrates publicly available draft and complete genomes from all three domains of life with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. IMG's data content and analytical capabilities have been continuously extended through regular updates since its first release in March 2005. IMG is available at http://img.jgi.doe.gov. Companion IMG systems provide support for expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er), teaching courses and training in microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu) and analysis of genomes related to the Human Microbiome Project (IMG/HMP: http://www.hmpdacc-resources.org/img_hmp).
Ensembl Genomes: an integrative resource for genome-scale data from non-vertebrate species.
Kersey, Paul J; Staines, Daniel M; Lawson, Daniel; Kulesha, Eugene; Derwent, Paul; Humphrey, Jay C; Hughes, Daniel S T; Keenan, Stephan; Kerhornou, Arnaud; Koscielny, Gautier; Langridge, Nicholas; McDowall, Mark D; Megy, Karine; Maheswari, Uma; Nuhn, Michael; Paulini, Michael; Pedro, Helder; Toneva, Iliana; Wilson, Derek; Yates, Andrew; Birney, Ewan
2012-01-01
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrative resource for genome-scale data from non-vertebrate species. The project exploits and extends technology (for genome annotation, analysis and dissemination) developed in the context of the (vertebrate-focused) Ensembl project and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. Since its launch in 2009, Ensembl Genomes has undergone rapid expansion, with the goal of providing coverage of all major experimental organisms, and additionally including taxonomic reference points to provide the evolutionary context in which genes can be understood. Against the backdrop of a continuing increase in genome sequencing activities in all parts of the tree of life, we seek to work, wherever possible, with the communities actively generating and using data, and are participants in a growing range of collaborations involved in the annotation and analysis of genomes.
Ensembl Genomes 2013: scaling up access to genome-wide data
USDA-ARS?s Scientific Manuscript database
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provi...
Comparative genome analysis in the integrated microbial genomes (IMG) system.
Markowitz, Victor M; Kyrpides, Nikos C
2007-01-01
Comparative genome analysis is critical for the effective exploration of a rapidly growing number of complete and draft sequences for microbial genomes. The Integrated Microbial Genomes (IMG) system (img.jgi.doe.gov) has been developed as a community resource that provides support for comparative analysis of microbial genomes in an integrated context. IMG allows users to navigate the multidimensional microbial genome data space and focus their analysis on a subset of genes, genomes, and functions of interest. IMG provides graphical viewers, summaries, and occurrence profile tools for comparing genes, pathways, and functions (terms) across specific genomes. Genes can be further examined using gene neighborhoods and compared with sequence alignment tools.
Vallenet, David; Belda, Eugeni; Calteau, Alexandra; Cruveiller, Stéphane; Engelen, Stefan; Lajus, Aurélie; Le Fèvre, François; Longin, Cyrille; Mornico, Damien; Roche, David; Rouy, Zoé; Salvignol, Gregory; Scarpelli, Claude; Thil Smith, Adam Alexander; Weiman, Marion; Médigue, Claudine
2013-01-01
MicroScope is an integrated platform dedicated to both the methodical updating of microbial genome annotation and to comparative analysis. The resource provides data from completed and ongoing genome projects (automatic and expert annotations), together with data sources from post-genomic experiments (i.e. transcriptomics, mutant collections) allowing users to perfect and improve the understanding of gene functions. MicroScope (http://www.genoscope.cns.fr/agc/microscope) combines tools and graphical interfaces to analyse genomes and to perform the manual curation of gene annotations in a comparative context. Since its first publication in January 2006, the system (previously named MaGe for Magnifying Genomes) has been continuously extended both in terms of data content and analysis tools. The last update of MicroScope was published in 2009 in the Database journal. Today, the resource contains data for >1600 microbial genomes, of which ∼300 are manually curated and maintained by biologists (1200 personal accounts today). Expert annotations are continuously gathered in the MicroScope database (∼50 000 a year), contributing to the improvement of the quality of microbial genomes annotations. Improved data browsing and searching tools have been added, original tools useful in the context of expert annotation have been developed and integrated and the website has been significantly redesigned to be more user-friendly. Furthermore, in the context of the European project Microme (Framework Program 7 Collaborative Project), MicroScope is becoming a resource providing for the curation and analysis of both genomic and metabolic data. An increasing number of projects are related to the study of environmental bacterial (meta)genomes that are able to metabolize a large variety of chemical compounds that may be of high industrial interest. PMID:23193269
Prabha, Ratna; Singh, Dhananjaya P; Sinha, Swati; Ahmad, Khurshid; Rai, Anil
2017-04-01
With the increasing accumulation of genomic sequence information of prokaryotes, the study of codon usage bias has gained renewed attention. The purpose of this study was to examine codon selection pattern within and across cyanobacterial species belonging to diverse taxonomic orders and habitats. We performed detailed comparative analysis of cyanobacterial genomes with respect to codon bias. Our analysis reflects that in cyanobacterial genomes, A- and/or T-ending codons were used predominantly in the genes whereas G- and/or C-ending codons were largely avoided. Variation in the codon context usage of cyanobacterial genes corresponded to the clustering of cyanobacteria as per their GC content. Analysis of codon adaptation index (CAI) and synonymous codon usage order (SCUO) revealed that majority of genes are associated with low codon bias. Codon selection pattern in cyanobacterial genomes reflected compositional constraints as major influencing factor. It is also identified that although, mutational constraint may play some role in affecting codon usage bias in cyanobacteria, compositional constraint in terms of genomic GC composition coupled with environmental factors affected codon selection pattern in cyanobacterial genomes. Copyright © 2016 Elsevier B.V. All rights reserved.
Genome-wide screening and identification of antigens for rickettsial vaccine development
USDA-ARS?s Scientific Manuscript database
The capacity to identify immunogens for vaccine development by genome-wide screening has been markedly enhanced by the availability of complete microbial genome sequences coupled to rapid proteomic and bioinformatic analysis. Critical to this genome-wide screening is in vivo testing in the context o...
[Genome-scale sequence data processing and epigenetic analysis of DNA methylation].
Wang, Ting-Zhang; Shan, Gao; Xu, Jian-Hong; Xue, Qing-Zhong
2013-06-01
A new approach recently developed for detecting cytosine DNA methylation (mC) and analyzing the genome-scale DNA methylation profiling, is called BS-Seq which is based on bisulfite conversion of genomic DNA combined with next-generation sequencing. The method can not only provide an insight into the difference of genome-scale DNA methylation among different organisms, but also reveal the conservation of DNA methylation in all contexts and nucleotide preference for different genomic regions, including genes, exons, and repetitive DNA sequences. It will be helpful to under-stand the epigenetic impacts of cytosine DNA methylation on the regulation of gene expression and maintaining silence of repetitive sequences, such as transposable elements. In this paper, we introduce the preprocessing steps of DNA methylation data, by which cytosine (C) and guanine (G) in the reference sequence are transferred to thymine (T) and adenine (A), and cytosine in reads is transferred to thymine, respectively. We also comprehensively review the main content of the DNA methylation analysis on the genomic scale: (1) the cytosine methylation under the context of different sequences; (2) the distribution of genomic methylcytosine; (3) DNA methylation context and the preference for the nucleotides; (4) DNA- protein interaction sites of DNA methylation; (5) degree of methylation of cytosine in the different structural elements of genes. DNA methylation analysis technique provides a powerful tool for the epigenome study in human and other species, and genes and environment interaction, and founds the theoretical basis for further development of disease diagnostics and therapeutics in human.
Phanerochaete chrysosporium genomics
Luis F. Larrondo; Rafael Vicuna; Dan Cullen
2005-01-01
A high quality draft genome sequence has been generated for the lignocellulose-degrading basidiomycete Phanerochaete chrysosporium (Martinez et al. 2004). Analysis of the genome in the context of previously established genetics and physiology is presented. Transposable elements and their potential relationship to genes involved in lignin degradation are systematically...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peterson, Elena S.; McCue, Lee Ann; Rutledge, Alexandra C.
2012-04-25
Visual Exploration and Statistics to Promote Annotation (VESPA) is an interactive visual analysis software tool that facilitates the discovery of structural mis-annotations in prokaryotic genomes. VESPA integrates high-throughput peptide-centric proteomics data and oligo-centric or RNA-Seq transcriptomics data into a genomic context. The data may be interrogated via visual analysis across multiple levels of genomic resolution, linked searches, exports and interaction with BLAST to rapidly identify location of interest within the genome and evaluate potential mis-annotations.
Uddin, Mohammed; Woodbury-Smith, Marc; Chan, Ada J S; Albanna, Ammar; Minassian, Berge; Boelman, Cyrus; Scherer, Stephen W
2018-03-28
Mutations within STXBP1 have been associated with a range of neurodevelopmental disorders implicating the pleotropic impact of this gene. Although the frequency of de novo mutations within STXBP1 for selective cohorts with early onset epileptic encephalopathy is more than 1%, there is no evidence for a hotspot within the gene. In this study, we analyzed the genomic context of de novo STXBP1 mutations to examine whether certain motifs indicated a greater risk of mutation. Through a comprehensive context analysis of 136 de novo /rare mutation (SNV/Indels) sites in this gene, strikingly 26.92% of all SNV mutations occurred within 5bp upstream or downstream of a 'GTA' motif ( P < 0.0005). This implies a genomic context modulated mutagenesis. Moreover, 51.85% (14 out of 27) of the 'GTA' mutations are splicing compared to 14.70% (20 out of 136) of all reported mutations within STXBP1 We also noted that 11 of these 14 'GTA' associated mutations are de novo in origin. Our analysis provides strong evidence of DNA motif modulated mutagenesis for STXBP1 de novo splicing mutations. Copyright © 2018 Uddin et al.
tRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes.
Lowe, Todd M; Chan, Patricia P
2016-07-08
High-throughput genome sequencing continues to grow the need for rapid, accurate genome annotation and tRNA genes constitute the largest family of essential, ever-present non-coding RNA genes. Newly developed tRNAscan-SE 2.0 has advanced the state-of-the-art methodology in tRNA gene detection and functional prediction, captured by rich new content of the companion Genomic tRNA Database. Previously, web-server tRNA detection was isolated from knowledge of existing tRNAs and their annotation. In this update of the tRNAscan-SE On-line resource, we tie together improvements in tRNA classification with greatly enhanced biological context via dynamically generated links between web server search results, the most relevant genes in the GtRNAdb and interactive, rich genome context provided by UCSC genome browsers. The tRNAscan-SE On-line web server can be accessed at http://trna.ucsc.edu/tRNAscan-SE/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Global analysis of bacterial transcription factors to predict cellular target processes.
Doerks, Tobias; Andrade, Miguel A; Lathe, Warren; von Mering, Christian; Bork, Peer
2004-03-01
Whole-genome sequences are now available for >100 bacterial species, giving unprecedented power to comparative genomics approaches. We have applied genome-context methods to predict target processes that are regulated by transcription factors (TFs). Of 128 orthologous groups of proteins annotated as TFs, to date, 36 are functionally uncharacterized; in our analysis we predict a probable cellular target process or biochemical pathway for half of these functionally uncharacterized TFs.
MycoCosm, an Integrated Fungal Genomics Resource
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shabalov, Igor; Grigoriev, Igor
2012-03-16
MycoCosm is a web-based interactive fungal genomics resource, which was first released in March 2010, in response to an urgent call from the fungal community for integration of all fungal genomes and analytical tools in one place (Pan-fungal data resources meeting, Feb 21-22, 2010, Alexandria, VA). MycoCosm integrates genomics data and analysis tools to navigate through over 100 fungal genomes sequenced at JGI and elsewhere. This resource allows users to explore fungal genomes in the context of both genome-centric analysis and comparative genomics, and promotes user community participation in data submission, annotation and analysis. MycoCosm has over 4500 unique visitors/monthmore » or 35000+ visitors/year as well as hundreds of registered users contributing their data and expertise to this resource. Its scalable architecture allows significant expansion of the data expected from JGI Fungal Genomics Program, its users, and integration with external resources used by fungal community.« less
IMG 4 version of the integrated microbial genomes comparative analysis system
Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Woyke, Tanja; Huntemann, Marcel; Anderson, Iain; Billis, Konstantinos; Varghese, Neha; Mavromatis, Konstantinos; Pati, Amrita; Ivanova, Natalia N.; Kyrpides, Nikos C.
2014-01-01
The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG’s data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG’s annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu). PMID:24165883
IMG 4 version of the integrated microbial genomes comparative analysis system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna
The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG’s data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG’s annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Finally, different IMG datamarts providemore » support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu).« less
IMG 4 version of the integrated microbial genomes comparative analysis system.
Markowitz, Victor M; Chen, I-Min A; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Woyke, Tanja; Huntemann, Marcel; Anderson, Iain; Billis, Konstantinos; Varghese, Neha; Mavromatis, Konstantinos; Pati, Amrita; Ivanova, Natalia N; Kyrpides, Nikos C
2014-01-01
The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG's data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG's annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu).
TEA: the epigenome platform for Arabidopsis methylome study.
Su, Sheng-Yao; Chen, Shu-Hwa; Lu, I-Hsuan; Chiang, Yih-Shien; Wang, Yu-Bin; Chen, Pao-Yang; Lin, Chung-Yen
2016-12-22
Bisulfite sequencing (BS-seq) has become a standard technology to profile genome-wide DNA methylation at single-base resolution. It allows researchers to conduct genome-wise cytosine methylation analyses on issues about genomic imprinting, transcriptional regulation, cellular development and differentiation. One single data from a BS-Seq experiment is resolved into many features according to the sequence contexts, making methylome data analysis and data visualization a complex task. We developed a streamlined platform, TEA, for analyzing and visualizing data from whole-genome BS-Seq (WGBS) experiments conducted in the model plant Arabidopsis thaliana. To capture the essence of the genome methylation level and to meet the efficiency for running online, we introduce a straightforward method for measuring genome methylation in each sequence context by gene. The method is scripted in Java to process BS-Seq mapping results. Through a simple data uploading process, the TEA server deploys a web-based platform for deep analysis by linking data to an updated Arabidopsis annotation database and toolkits. TEA is an intuitive and efficient online platform for analyzing the Arabidopsis genomic DNA methylation landscape. It provides several ways to help users exploit WGBS data. TEA is freely accessible for academic users at: http://tea.iis.sinica.edu.tw .
Yoshida, Catherine E; Kruczkiewicz, Peter; Laing, Chad R; Lingohr, Erika J; Gannon, Victor P J; Nash, John H E; Taboada, Eduardo N
2016-01-01
For nearly 100 years serotyping has been the gold standard for the identification of Salmonella serovars. Despite the increasing adoption of DNA-based subtyping approaches, serotype information remains a cornerstone in food safety and public health activities aimed at reducing the burden of salmonellosis. At the same time, recent advances in whole-genome sequencing (WGS) promise to revolutionize our ability to perform advanced pathogen characterization in support of improved source attribution and outbreak analysis. We present the Salmonella In Silico Typing Resource (SISTR), a bioinformatics platform for rapidly performing simultaneous in silico analyses for several leading subtyping methods on draft Salmonella genome assemblies. In addition to performing serovar prediction by genoserotyping, this resource integrates sequence-based typing analyses for: Multi-Locus Sequence Typing (MLST), ribosomal MLST (rMLST), and core genome MLST (cgMLST). We show how phylogenetic context from cgMLST analysis can supplement the genoserotyping analysis and increase the accuracy of in silico serovar prediction to over 94.6% on a dataset comprised of 4,188 finished genomes and WGS draft assemblies. In addition to allowing analysis of user-uploaded whole-genome assemblies, the SISTR platform incorporates a database comprising over 4,000 publicly available genomes, allowing users to place their isolates in a broader phylogenetic and epidemiological context. The resource incorporates several metadata driven visualizations to examine the phylogenetic, geospatial and temporal distribution of genome-sequenced isolates. As sequencing of Salmonella isolates at public health laboratories around the world becomes increasingly common, rapid in silico analysis of minimally processed draft genome assemblies provides a powerful approach for molecular epidemiology in support of public health investigations. Moreover, this type of integrated analysis using multiple sequence-based methods of sub-typing allows for continuity with historical serotyping data as we transition towards the increasing adoption of genomic analyses in epidemiology. The SISTR platform is freely available on the web at https://lfz.corefacility.ca/sistr-app/.
APPLaUD: access for patients and participants to individual level uninterpreted genomic data.
Thorogood, Adrian; Bobe, Jason; Prainsack, Barbara; Middleton, Anna; Scott, Erick; Nelson, Sarah; Corpas, Manuel; Bonhomme, Natasha; Rodriguez, Laura Lyman; Murtagh, Madeleine; Kleiderman, Erika
2018-02-17
There is a growing support for the stance that patients and research participants should have better and easier access to their raw (uninterpreted) genomic sequence data in both clinical and research contexts. We review legal frameworks and literature on the benefits, risks, and practical barriers of providing individuals access to their data. We also survey genomic sequencing initiatives that provide or plan to provide individual access. Many patients and research participants expect to be able to access their health and genomic data. Individuals have a legal right to access their genomic data in some countries and contexts. Moreover, increasing numbers of participatory research projects, direct-to-consumer genetic testing companies, and now major national sequencing initiatives grant individuals access to their genomic sequence data upon request. Drawing on current practice and regulatory analysis, we outline legal, ethical, and practical guidance for genomic sequencing initiatives seeking to offer interested patients and participants access to their raw genomic data.
Ensembl Genomes 2016: more genomes, more complexity.
Kersey, Paul Julian; Allen, James E; Armean, Irina; Boddu, Sanjay; Bolt, Bruce J; Carvalho-Silva, Denise; Christensen, Mikkel; Davis, Paul; Falin, Lee J; Grabmueller, Christoph; Humphrey, Jay; Kerhornou, Arnaud; Khobova, Julia; Aranganathan, Naveen K; Langridge, Nicholas; Lowy, Ernesto; McDowall, Mark D; Maheswari, Uma; Nuhn, Michael; Ong, Chuang Kee; Overduin, Bert; Paulini, Michael; Pedro, Helder; Perry, Emily; Spudich, Giulietta; Tapanari, Electra; Walts, Brandon; Williams, Gareth; Tello-Ruiz, Marcela; Stein, Joshua; Wei, Sharon; Ware, Doreen; Bolser, Daniel M; Howe, Kevin L; Kulesha, Eugene; Lawson, Daniel; Maslen, Gareth; Staines, Daniel M
2016-01-04
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Ensembl Genomes 2016: more genomes, more complexity
Kersey, Paul Julian; Allen, James E.; Armean, Irina; Boddu, Sanjay; Bolt, Bruce J.; Carvalho-Silva, Denise; Christensen, Mikkel; Davis, Paul; Falin, Lee J.; Grabmueller, Christoph; Humphrey, Jay; Kerhornou, Arnaud; Khobova, Julia; Aranganathan, Naveen K.; Langridge, Nicholas; Lowy, Ernesto; McDowall, Mark D.; Maheswari, Uma; Nuhn, Michael; Ong, Chuang Kee; Overduin, Bert; Paulini, Michael; Pedro, Helder; Perry, Emily; Spudich, Giulietta; Tapanari, Electra; Walts, Brandon; Williams, Gareth; Tello–Ruiz, Marcela; Stein, Joshua; Wei, Sharon; Ware, Doreen; Bolser, Daniel M.; Howe, Kevin L.; Kulesha, Eugene; Lawson, Daniel; Maslen, Gareth; Staines, Daniel M.
2016-01-01
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces. PMID:26578574
Improving Microbial Genome Annotations in an Integrated Database Context
Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; Anderson, Iain; Mavromatis, Konstantinos; Kyrpides, Nikos C.; Ivanova, Natalia N.
2013-01-01
Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG) family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/. PMID:23424620
McNeil, Leslie Klis; Reich, Claudia; Aziz, Ramy K; Bartels, Daniela; Cohoon, Matthew; Disz, Terry; Edwards, Robert A; Gerdes, Svetlana; Hwang, Kaitlyn; Kubal, Michael; Margaryan, Gohar Rem; Meyer, Folker; Mihalo, William; Olsen, Gary J; Olson, Robert; Osterman, Andrei; Paarmann, Daniel; Paczian, Tobias; Parrello, Bruce; Pusch, Gordon D; Rodionov, Dmitry A; Shi, Xinghua; Vassieva, Olga; Vonstein, Veronika; Zagnitko, Olga; Xia, Fangfang; Zinner, Jenifer; Overbeek, Ross; Stevens, Rick
2007-01-01
The National Microbial Pathogen Data Resource (NMPDR) (http://www.nmpdr.org) is a National Institute of Allergy and Infections Disease (NIAID)-funded Bioinformatics Resource Center that supports research in selected Category B pathogens. NMPDR contains the complete genomes of approximately 50 strains of pathogenic bacteria that are the focus of our curators, as well as >400 other genomes that provide a broad context for comparative analysis across the three phylogenetic Domains. NMPDR integrates complete, public genomes with expertly curated biological subsystems to provide the most consistent genome annotations. Subsystems are sets of functional roles related by a biologically meaningful organizing principle, which are built over large collections of genomes; they provide researchers with consistent functional assignments in a biologically structured context. Investigators can browse subsystems and reactions to develop accurate reconstructions of the metabolic networks of any sequenced organism. NMPDR provides a comprehensive bioinformatics platform, with tools and viewers for genome analysis. Results of precomputed gene clustering analyses can be retrieved in tabular or graphic format with one-click tools. NMPDR tools include Signature Genes, which finds the set of genes in common or that differentiates two groups of organisms. Essentiality data collated from genome-wide studies have been curated. Drug target identification and high-throughput, in silico, compound screening are in development.
Alonso, Conchita; Pérez, Ricardo; Bazaga, Pilar; Medrano, Mónica; Herrera, Carlos M
2016-01-01
Methylation of DNA cytosines affects whether transposons are silenced and genes are expressed, and is a major epigenetic mechanism whereby plants respond to environmental change. Analyses of methylation-sensitive amplification polymorphism (MS-AFLP or MSAP) have been often used to assess methyl-cytosine changes in response to stress treatments and, more recently, in ecological studies of wild plant populations. MSAP technique does not require a sequenced reference genome and provides many anonymous loci randomly distributed over the genome for which the methylation status can be ascertained. Scoring of MSAP data, however, is not straightforward, and efforts are still required to standardize this step to make use of the potential to distinguish between methylation at different nucleotide contexts. Furthermore, it is not known how accurately MSAP infers genome-wide cytosine methylation levels in plants. Here, we analyse the relationship between MSAP results and the percentage of global cytosine methylation in genomic DNA obtained by HPLC analysis. A screening of literature revealed that methylation of cytosines at cleavage sites assayed by MSAP was greater than genome-wide estimates obtained by HPLC, and percentages of methylation at different nucleotide contexts varied within and across species. Concurrent HPLC and MSAP analyses of DNA from 200 individuals of the perennial herb Helleborus foetidus confirmed that methyl-cytosine was more frequent in CCGG contexts than in the genome as a whole. In this species, global methylation was unrelated to methylation at the inner CG site. We suggest that global HPLC and context-specific MSAP methylation estimates provide complementary information whose combination can improve our current understanding of methylation-based epigenetic processes in nonmodel plants. © 2015 John Wiley & Sons Ltd.
GCView: the genomic context viewer for protein homology searches
Grin, Iwan; Linke, Dirk
2011-01-01
Genomic neighborhood can provide important insights into evolution and function of a protein or gene. When looking at operons, changes in operon structure and composition can only be revealed by looking at the operon as a whole. To facilitate the analysis of the genomic context of a query in multiple organisms we have developed Genomic Context Viewer (GCView). GCView accepts results from one or multiple protein homology searches such as BLASTp as input. For each hit, the neighboring protein-coding genes are extracted, the regions of homology are labeled for each input and the results are presented as a clear, interactive graphical output. It is also possible to add more searches to iteratively refine the output. GCView groups outputs by the hits for different proteins. This allows for easy comparison of different operon compositions and structures. The tool is embedded in the framework of the Bioinformatics Toolkit of the Max-Planck Institute for Developmental Biology (MPI Toolkit). Job results from the homology search tools inside the MPI Toolkit can be forwarded to GCView and results can be subsequently analyzed by sequence analysis tools. Results are stored online, allowing for later reinspection. GCView is freely available at http://toolkit.tuebingen.mpg.de/gcview. PMID:21609955
Using the Saccharomyces Genome Database (SGD) for analysis of genomic information
Skrzypek, Marek S.; Hirschman, Jodi
2011-01-01
Analysis of genomic data requires access to software tools that place the sequence-derived information in the context of biology. The Saccharomyces Genome Database (SGD) integrates functional information about budding yeast genes and their products with a set of analysis tools that facilitate exploring their biological details. This unit describes how the various types of functional data available at SGD can be searched, retrieved, and analyzed. Starting with the guided tour of the SGD Home page and Locus Summary page, this unit highlights how to retrieve data using YeastMine, how to visualize genomic information with GBrowse, how to explore gene expression patterns with SPELL, and how to use Gene Ontology tools to characterize large-scale datasets. PMID:21901739
Visualizing conserved gene location across microbe genomes
NASA Astrophysics Data System (ADS)
Shaw, Chris D.
2009-01-01
This paper introduces an analysis-based zoomable visualization technique for displaying the location of genes across many related species of microbes. The purpose of this visualizatiuon is to enable a biologist to examine the layout of genes in the organism of interest with respect to the gene organization of related organisms. During the genomic annotation process, the ability to observe gene organization in common with previously annotated genomes can help a biologist better confirm the structure and function of newly analyzed microbe DNA sequences. We have developed a visualization and analysis tool that enables the biologist to observe and examine gene organization among genomes, in the context of the primary sequence of interest. This paper describes the visualization and analysis steps, and presents a case study using a number of Rickettsia genomes.
Oud, Bart; Maris, Antonius J A; Daran, Jean-Marc; Pronk, Jack T
2012-01-01
Successful reverse engineering of mutants that have been obtained by nontargeted strain improvement has long presented a major challenge in yeast biotechnology. This paper reviews the use of genome-wide approaches for analysis of Saccharomyces cerevisiae strains originating from evolutionary engineering or random mutagenesis. On the basis of an evaluation of the strengths and weaknesses of different methods, we conclude that for the initial identification of relevant genetic changes, whole genome sequencing is superior to other analytical techniques, such as transcriptome, metabolome, proteome, or array-based genome analysis. Key advantages of this technique over gene expression analysis include the independency of genome sequences on experimental context and the possibility to directly and precisely reproduce the identified changes in naive strains. The predictive value of genome-wide analysis of strains with industrially relevant characteristics can be further improved by classical genetics or simultaneous analysis of strains derived from parallel, independent strain improvement lineages. PMID:22152095
Oud, Bart; van Maris, Antonius J A; Daran, Jean-Marc; Pronk, Jack T
2012-03-01
Successful reverse engineering of mutants that have been obtained by nontargeted strain improvement has long presented a major challenge in yeast biotechnology. This paper reviews the use of genome-wide approaches for analysis of Saccharomyces cerevisiae strains originating from evolutionary engineering or random mutagenesis. On the basis of an evaluation of the strengths and weaknesses of different methods, we conclude that for the initial identification of relevant genetic changes, whole genome sequencing is superior to other analytical techniques, such as transcriptome, metabolome, proteome, or array-based genome analysis. Key advantages of this technique over gene expression analysis include the independency of genome sequences on experimental context and the possibility to directly and precisely reproduce the identified changes in naive strains. The predictive value of genome-wide analysis of strains with industrially relevant characteristics can be further improved by classical genetics or simultaneous analysis of strains derived from parallel, independent strain improvement lineages. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.
Zhu, Zhou; Ihle, Nathan T; Rejto, Paul A; Zarrinkar, Patrick P
2016-06-13
Genome-scale functional genomic screens across large cell line panels provide a rich resource for discovering tumor vulnerabilities that can lead to the next generation of targeted therapies. Their data analysis typically has focused on identifying genes whose knockdown enhances response in various pre-defined genetic contexts, which are limited by biological complexities as well as the incompleteness of our knowledge. We thus introduce a complementary data mining strategy to identify genes with exceptional sensitivity in subsets, or outlier groups, of cell lines, allowing an unbiased analysis without any a priori assumption about the underlying biology of dependency. Genes with outlier features are strongly and specifically enriched with those known to be associated with cancer and relevant biological processes, despite no a priori knowledge being used to drive the analysis. Identification of exceptional responders (outliers) may not lead only to new candidates for therapeutic intervention, but also tumor indications and response biomarkers for companion precision medicine strategies. Several tumor suppressors have an outlier sensitivity pattern, supporting and generalizing the notion that tumor suppressors can play context-dependent oncogenic roles. The novel application of outlier analysis described here demonstrates a systematic and data-driven analytical strategy to decipher large-scale functional genomic data for oncology target and precision medicine discoveries.
Gundogdu, Aycan; Nalbantoglu, Ufuk
2017-04-01
A short while ago, the human genome and microbiome were analysed simultaneously for the first time as a multi-omic approach. The analyses of heterogeneous population cohorts showed that microbiome components were associated with human genome variations. In-depth analysis of these results reveals that the majority of those relationships are between immune pathways and autoimmune disease-associated microbiome components. Thus, it can be hypothesized that autoimmunity may be associated with homeostatic disequilibrium of the human-microbiome interactome. Further analysis of human genome-human microbiome relationships in disease contexts with tailored systems biology approaches may yield insights into disease pathogenesis and prognosis.
A dictionary based informational genome analysis
2012-01-01
Background In the post-genomic era several methods of computational genomics are emerging to understand how the whole information is structured within genomes. Literature of last five years accounts for several alignment-free methods, arisen as alternative metrics for dissimilarity of biological sequences. Among the others, recent approaches are based on empirical frequencies of DNA k-mers in whole genomes. Results Any set of words (factors) occurring in a genome provides a genomic dictionary. About sixty genomes were analyzed by means of informational indexes based on genomic dictionaries, where a systemic view replaces a local sequence analysis. A software prototype applying a methodology here outlined carried out some computations on genomic data. We computed informational indexes, built the genomic dictionaries with different sizes, along with frequency distributions. The software performed three main tasks: computation of informational indexes, storage of these in a database, index analysis and visualization. The validation was done by investigating genomes of various organisms. A systematic analysis of genomic repeats of several lengths, which is of vivid interest in biology (for example to compute excessively represented functional sequences, such as promoters), was discussed, and suggested a method to define synthetic genetic networks. Conclusions We introduced a methodology based on dictionaries, and an efficient motif-finding software application for comparative genomics. This approach could be extended along many investigation lines, namely exported in other contexts of computational genomics, as a basis for discrimination of genomic pathologies. PMID:22985068
Visualization of RNA structure models within the Integrative Genomics Viewer.
Busan, Steven; Weeks, Kevin M
2017-07-01
Analyses of the interrelationships between RNA structure and function are increasingly important components of genomic studies. The SHAPE-MaP strategy enables accurate RNA structure probing and realistic structure modeling of kilobase-length noncoding RNAs and mRNAs. Existing tools for visualizing RNA structure models are not suitable for efficient analysis of long, structurally heterogeneous RNAs. In addition, structure models are often advantageously interpreted in the context of other experimental data and gene annotation information, for which few tools currently exist. We have developed a module within the widely used and well supported open-source Integrative Genomics Viewer (IGV) that allows visualization of SHAPE and other chemical probing data, including raw reactivities, data-driven structural entropies, and data-constrained base-pair secondary structure models, in context with linear genomic data tracks. We illustrate the usefulness of visualizing RNA structure in the IGV by exploring structure models for a large viral RNA genome, comparing bacterial mRNA structure in cells with its structure under cell- and protein-free conditions, and comparing a noncoding RNA structure modeled using SHAPE data with a base-pairing model inferred through sequence covariation analysis. © 2017 Busan and Weeks; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
GIANT 2.0: genome-scale integrated analysis of gene networks in tissues.
Wong, Aaron K; Krishnan, Arjun; Troyanskaya, Olga G
2018-05-25
GIANT2 (Genome-wide Integrated Analysis of gene Networks in Tissues) is an interactive web server that enables biomedical researchers to analyze their proteins and pathways of interest and generate hypotheses in the context of genome-scale functional maps of human tissues. The precise actions of genes are frequently dependent on their tissue context, yet direct assay of tissue-specific protein function and interactions remains infeasible in many normal human tissues and cell-types. With GIANT2, researchers can explore predicted tissue-specific functional roles of genes and reveal changes in those roles across tissues, all through interactive multi-network visualizations and analyses. Additionally, the NetWAS approach available through the server uses tissue-specific/cell-type networks predicted by GIANT2 to re-prioritize statistical associations from GWAS studies and identify disease-associated genes. GIANT2 predicts tissue-specific interactions by integrating diverse functional genomics data from now over 61 400 experiments for 283 diverse tissues and cell-types. GIANT2 does not require any registration or installation and is freely available for use at http://giant-v2.princeton.edu.
Médigue, Claudine; Calteau, Alexandra; Cruveiller, Stéphane; Gachet, Mathieu; Gautreau, Guillaume; Josso, Adrien; Lajus, Aurélie; Langlois, Jordan; Pereira, Hugo; Planel, Rémi; Roche, David; Rollin, Johan; Rouy, Zoe; Vallenet, David
2017-09-12
The overwhelming list of new bacterial genomes becoming available on a daily basis makes accurate genome annotation an essential step that ultimately determines the relevance of thousands of genomes stored in public databanks. The MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Starting from the results of our syntactic, functional and relational annotation pipelines, MicroScope provides an integrated environment for the expert annotation and comparative analysis of prokaryotic genomes. It combines tools and graphical interfaces to analyze genomes and to perform the manual curation of gene function in a comparative genomics and metabolic context. In this article, we describe the free-of-charge MicroScope services for the annotation and analysis of microbial (meta)genomes, transcriptomic and re-sequencing data. Then, the functionalities of the platform are presented in a way providing practical guidance and help to the nonspecialists in bioinformatics. Newly integrated analysis tools (i.e. prediction of virulence and resistance genes in bacterial genomes) and original method recently developed (the pan-genome graph representation) are also described. Integrated environments such as MicroScope clearly contribute, through the user community, to help maintaining accurate resources. © The Author 2017. Published by Oxford University Press.
Conserved noncoding sequences conserve biological networks and influence genome evolution.
Xie, Jianbo; Qian, Kecheng; Si, Jingna; Xiao, Liang; Ci, Dong; Zhang, Deqiang
2018-05-01
Comparative genomics approaches have identified numerous conserved cis-regulatory sequences near genes in plant genomes. Despite the identification of these conserved noncoding sequences (CNSs), our knowledge of their functional importance and selection remains limited. Here, we used a combination of DNA methylome analysis, microarray expression analyses, and functional annotation to study these sequences in the model tree Populus trichocarpa. Methylation in CG contexts and non-CG contexts was lower in CNSs, particularly CNSs in the 5'-upstream regions of genes, compared with other sites in the genome. We observed that CNSs are enriched in genes with transcription and binding functions, and this also associated with syntenic genes and those from whole-genome duplications, suggesting that cis-regulatory sequences play a key role in genome evolution. We detected a significant positive correlation between CNS number and protein interactions, suggesting that CNSs may have roles in the evolution and maintenance of biological networks. The divergence of CNSs indicates that duplication-degeneration-complementation drives the subfunctionalization of a proportion of duplicated genes from whole-genome duplication. Furthermore, population genomics confirmed that most CNSs are under strong purifying selection and only a small subset of CNSs shows evidence of adaptive evolution. These findings provide a foundation for future studies exploring these key genomic features in the maintenance of biological networks, local adaptation, and transcription.
A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing
Alioto, Tyler S.; Buchhalter, Ivo; Derdak, Sophia; Hutter, Barbara; Eldridge, Matthew D.; Hovig, Eivind; Heisler, Lawrence E.; Beck, Timothy A.; Simpson, Jared T.; Tonon, Laurie; Sertier, Anne-Sophie; Patch, Ann-Marie; Jäger, Natalie; Ginsbach, Philip; Drews, Ruben; Paramasivam, Nagarajan; Kabbe, Rolf; Chotewutmontri, Sasithorn; Diessl, Nicolle; Previti, Christopher; Schmidt, Sabine; Brors, Benedikt; Feuerbach, Lars; Heinold, Michael; Gröbner, Susanne; Korshunov, Andrey; Tarpey, Patrick S.; Butler, Adam P.; Hinton, Jonathan; Jones, David; Menzies, Andrew; Raine, Keiran; Shepherd, Rebecca; Stebbings, Lucy; Teague, Jon W.; Ribeca, Paolo; Giner, Francesc Castro; Beltran, Sergi; Raineri, Emanuele; Dabad, Marc; Heath, Simon C.; Gut, Marta; Denroche, Robert E.; Harding, Nicholas J.; Yamaguchi, Takafumi N.; Fujimoto, Akihiro; Nakagawa, Hidewaki; Quesada, Víctor; Valdés-Mas, Rafael; Nakken, Sigve; Vodák, Daniel; Bower, Lawrence; Lynch, Andrew G.; Anderson, Charlotte L.; Waddell, Nicola; Pearson, John V.; Grimmond, Sean M.; Peto, Myron; Spellman, Paul; He, Minghui; Kandoth, Cyriac; Lee, Semin; Zhang, John; Létourneau, Louis; Ma, Singer; Seth, Sahil; Torrents, David; Xi, Liu; Wheeler, David A.; López-Otín, Carlos; Campo, Elías; Campbell, Peter J.; Boutros, Paul C.; Puente, Xose S.; Gerhard, Daniela S.; Pfister, Stefan M.; McPherson, John D.; Hudson, Thomas J.; Schlesner, Matthias; Lichter, Peter; Eils, Roland; Jones, David T. W.; Gut, Ivo G.
2015-01-01
As whole-genome sequencing for cancer genome analysis becomes a clinical tool, a full understanding of the variables affecting sequencing analysis output is required. Here using tumour-normal sample pairs from two different types of cancer, chronic lymphocytic leukaemia and medulloblastoma, we conduct a benchmarking exercise within the context of the International Cancer Genome Consortium. We compare sequencing methods, analysis pipelines and validation methods. We show that using PCR-free methods and increasing sequencing depth to ∼100 × shows benefits, as long as the tumour:control coverage ratio remains balanced. We observe widely varying mutation call rates and low concordance among analysis pipelines, reflecting the artefact-prone nature of the raw data and lack of standards for dealing with the artefacts. However, we show that, using the benchmark mutation set we have created, many issues are in fact easy to remedy and have an immediate positive impact on mutation detection accuracy. PMID:26647970
Jakupciak, John P; Wells, Jeffrey M; Karalus, Richard J; Pawlowski, David R; Lin, Jeffrey S; Feldman, Andrew B
2013-01-01
Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations.
Jakupciak, John P.; Wells, Jeffrey M.; Karalus, Richard J.; Pawlowski, David R.; Lin, Jeffrey S.; Feldman, Andrew B.
2013-01-01
Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations. PMID:24455204
Vassy, Jason L; Davis, J Kelly; Kirby, Christine; Richardson, Ian J; Green, Robert C; McGuire, Amy L; Ubel, Peter A
2018-06-01
Genomics will play an increasingly prominent role in clinical medicine. To describe how primary care physicians (PCPs) discuss and make clinical recommendations about genome sequencing results. Qualitative analysis. PCPs and their generally healthy patients undergoing genome sequencing. Patients received clinical genome reports that included four categories of results: monogenic disease risk variants (if present), carrier status, five pharmacogenetics results, and polygenic risk estimates for eight cardiometabolic traits. Patients' office visits with their PCPs were audio-recorded, and summative content analysis was used to describe how PCPs discussed genomic results. For each genomic result discussed in 48 PCP-patient visits, we identified a "take-home" message (recommendation), categorized as continuing current management, further treatment, further evaluation, behavior change, remembering for future care, or sharing with family members. We analyzed how PCPs came to each recommendation by identifying 1) how they described the risk or importance of the given result and 2) the rationale they gave for translating that risk into a specific recommendation. Quantitative analysis showed that continuing current management was the most commonly coded recommendation across results overall (492/749, 66%) and for each individual result type except monogenic disease risk results. Pharmacogenetics was the most common result type to prompt a recommendation to remember for future care (94/119, 79%); carrier status was the most common type prompting a recommendation to share with family members (45/54, 83%); and polygenic results were the most common type prompting a behavior change recommendation (55/58, 95%). One-fifth of recommendation codes associated with monogenic results were for further evaluation (6/24, 25%). Rationales for these recommendations included patient context, family context, and scientific/clinical limitations of sequencing. PCPs distinguish substantive differences among categories of genome sequencing results and use clinical judgment to justify continuing current management in generally healthy patients with genomic results.
Genomic signal analysis of pathogen variability
NASA Astrophysics Data System (ADS)
Cristea, Paul Dan
2006-02-01
The paper presents results in the study of pathogen variability by using genomic signals. The conversion of symbolic nucleotide sequences into digital signals offers the possibility to apply signal processing methods to the analysis of genomic data. The method is particularly well suited to characterize small size genomic sequences, such as those found in viruses and bacteria, being a promising tool in tracking the variability of pathogens, especially in the context of developing drug resistance. The paper is based on data downloaded from GenBank [32], and comprises results on the variability of the eight segments of the influenza type A, subtype H5N1, virus genome, and of the Hemagglutinin (HA) gene, for the H1, H2, H3, H4, H5 and H16 types. Data from human and avian virus isolates are used.
Constraints on genes shape long-term conservation of macro-synteny in metazoan genomes.
Lv, Jie; Havlak, Paul; Putnam, Nicholas H
2011-10-05
Many metazoan genomes conserve chromosome-scale gene linkage relationships ("macro-synteny") from the common ancestor of multicellular animal life 1234, but the biological explanation for this conservation is still unknown. Double cut and join (DCJ) is a simple, well-studied model of neutral genome evolution amenable to both simulation and mathematical analysis 5, but as we show here, it is not sufficent to explain long-term macro-synteny conservation. We examine a family of simple (one-parameter) extensions of DCJ to identify models and choices of parameters consistent with the levels of macro- and micro-synteny conservation observed among animal genomes. Our software implements a flexible strategy for incorporating genomic context into the DCJ model to incorporate various types of genomic context ("DCJ-[C]"), and is available as open source software from http://github.com/putnamlab/dcj-c. A simple model of genome evolution, in which DCJ moves are allowed only if they maintain chromosomal linkage among a set of constrained genes, can simultaneously account for the level of macro-synteny conservation and for correlated conservation among multiple pairs of species. Simulations under this model indicate that a constraint on approximately 7% of metazoan genes is sufficient to constrain genome rearrangement to an average rate of 25 inversions and 1.7 translocations per million years.
Fungal Genes in Context: Genome Architecture Reflects Regulatory Complexity and Function
Noble, Luke M.; Andrianopoulos, Alex
2013-01-01
Gene context determines gene expression, with local chromosomal environment most influential. Comparative genomic analysis is often limited in scope to conserved or divergent gene and protein families, and fungi are well suited to this approach with low functional redundancy and relatively streamlined genomes. We show here that one aspect of gene context, the amount of potential upstream regulatory sequence maintained through evolution, is highly predictive of both molecular function and biological process in diverse fungi. Orthologs with large upstream intergenic regions (UIRs) are strongly enriched in information processing functions, such as signal transduction and sequence-specific DNA binding, and, in the genus Aspergillus, include the majority of experimentally studied, high-level developmental and metabolic transcriptional regulators. Many uncharacterized genes are also present in this class and, by implication, may be of similar importance. Large intergenic regions also share two novel sequence characteristics, currently of unknown significance: they are enriched for plus-strand polypyrimidine tracts and an information-rich, putative regulatory motif that was present in the last common ancestor of the Pezizomycotina. Systematic consideration of gene UIR in comparative genomics, particularly for poorly characterized species, could help reveal organisms’ regulatory priorities. PMID:23699226
Nalbantoglu, Ufuk
2017-01-01
A short while ago, the human genome and microbiome were analysed simultaneously for the first time as a multi-omic approach. The analyses of heterogeneous population cohorts showed that microbiome components were associated with human genome variations. In-depth analysis of these results reveals that the majority of those relationships are between immune pathways and autoimmune disease-associated microbiome components. Thus, it can be hypothesized that autoimmunity may be associated with homeostatic disequilibrium of the human-microbiome interactome. Further analysis of human genome–human microbiome relationships in disease contexts with tailored systems biology approaches may yield insights into disease pathogenesis and prognosis. PMID:28785422
Public consultation in ethics: an experiment in representative ethics.
Burgess, Michael M
2004-01-01
Genome Canada has funded a research project to evaluate the usefulness of different forms of ethical analysis for assessing the moral weight of public opinion in the governance of genomics. This paper will describe a role of public consultation for ethical analysis and a contribution of ethical analysis to public consultation and the governance of genomics/biotechnology. Public consultation increases the robustness of ethical analysis with a more diverse set of moral experiences. Consultation must be carefully and respectfully designed to generate sufficiently diverse and rich accounts of moral experiences. Since dominant groups tend to define ethical or policy issues in a manner that excludes some interests or perspectives, it is important to identify the range of interests that diverse publics hold before defining the issue and scope of the discussion and the premature foreclosure of ethical dialogue. Consequently, a significant contribution of ethical dialogue strengthened by social analysis is to consider the context and non-policy use of power to govern genomics and to sustain social debate on enduring ethical issues.
Singh, Vinod Kumar; Krishnamachari, Annangarachari
2016-09-01
Genome-wide experimental studies in Saccharomyces cerevisiae reveal that autonomous replicating sequence (ARS) requires an essential consensus sequence (ACS) for replication activity. Computational studies identified thousands of ACS like patterns in the genome. However, only a few hundreds of these sites act as replicating sites and the rest are considered as dormant or evolving sites. In a bid to understand the sequence makeup of replication sites, a content and context-based analysis was performed on a set of replicating ACS sequences that binds to origin-recognition complex (ORC) denoted as ORC-ACS and non-replicating ACS sequences (nrACS), that are not bound by ORC. In this study, DNA properties such as base composition, correlation, sequence dependent thermodynamic and DNA structural profiles, and their positions have been considered for characterizing ORC-ACS and nrACS. Analysis reveals that ORC-ACS depict marked differences in nucleotide composition and context features in its vicinity compared to nrACS. Interestingly, an A-rich motif was also discovered in ORC-ACS sequences within its nucleosome-free region. Profound changes in the conformational features, such as DNA helical twist, inclination angle and stacking energy between ORC-ACS and nrACS were observed. Distribution of ACS motifs in the non-coding segments points to the locations of ORC-ACS which are found far away from the adjacent gene start position compared to nrACS thereby enabling an accessible environment for ORC-proteins. Our attempt is novel in considering the contextual view of ACS and its flanking region along with nucleosome positioning in the S. cerevisiae genome and may be useful for any computational prediction scheme.
Caryoscope: An Open Source Java application for viewing microarray data in a genomic context
Awad, Ihab AB; Rees, Christian A; Hernandez-Boussard, Tina; Ball, Catherine A; Sherlock, Gavin
2004-01-01
Background Microarray-based comparative genome hybridization experiments generate data that can be mapped onto the genome. These data are interpreted more easily when represented graphically in a genomic context. Results We have developed Caryoscope, which is an open source Java application for visualizing microarray data from array comparative genome hybridization experiments in a genomic context. Caryoscope can read General Feature Format files (GFF files), as well as comma- and tab-delimited files, that define the genomic positions of the microarray reporters for which data are obtained. The microarray data can be browsed using an interactive, zoomable interface, which helps users identify regions of chromosomal deletion or amplification. The graphical representation of the data can be exported in a number of graphic formats, including publication-quality formats such as PostScript. Conclusion Caryoscope is a useful tool that can aid in the visualization, exploration and interpretation of microarray data in a genomic context. PMID:15488149
Advances and perspectives on the use of CRISPR/Cas9 systems in plant genomics research
Liu, Degao; Hu, Rongbin; Palla, Kaitlin J.; ...
2016-02-18
Genome editing with site-specific nucleases has become a powerful tool for functional characterization of plant genes and genetic improvement of agricultural crops. Among the various site-specific nuclease-based technologies available for genome editing, the clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein 9 (Cas9) systems have shown the greatest potential for rapid and efficient editing of genomes in plant species. Here, this article reviews the current status of application of CRISPR/Cas9 to plant genomics research, with a focus on loss-of-function and gain-of-function analysis of individual genes in the context of perennial plants and the potential application of CRISPR/Cas9 to perturbation ofmore » gene expression, as well as identification and analysis of gene modules as part of an accelerated domestication and synthetic biology effort.« less
Advances and perspectives on the use of CRISPR/Cas9 systems in plant genomics research
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Degao; Hu, Rongbin; Palla, Kaitlin J.
Genome editing with site-specific nucleases has become a powerful tool for functional characterization of plant genes and genetic improvement of agricultural crops. Among the various site-specific nuclease-based technologies available for genome editing, the clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein 9 (Cas9) systems have shown the greatest potential for rapid and efficient editing of genomes in plant species. Here, this article reviews the current status of application of CRISPR/Cas9 to plant genomics research, with a focus on loss-of-function and gain-of-function analysis of individual genes in the context of perennial plants and the potential application of CRISPR/Cas9 to perturbation ofmore » gene expression, as well as identification and analysis of gene modules as part of an accelerated domestication and synthetic biology effort.« less
Predicting Protein Function by Genomic Context: Quantitative Evaluation and Qualitative Inferences
Huynen, Martijn; Snel, Berend; Lathe, Warren; Bork, Peer
2000-01-01
Various new methods have been proposed to predict functional interactions between proteins based on the genomic context of their genes. The types of genomic context that they use are Type I: the fusion of genes; Type II: the conservation of gene-order or co-occurrence of genes in potential operons; and Type III: the co-occurrence of genes across genomes (phylogenetic profiles). Here we compare these types for their coverage, their correlations with various types of functional interaction, and their overlap with homology-based function assignment. We apply the methods to Mycoplasma genitalium, the standard benchmarking genome in computational and experimental genomics. Quantitatively, conservation of gene order is the technique with the highest coverage, applying to 37% of the genes. By combining gene order conservation with gene fusion (6%), the co-occurrence of genes in operons in absence of gene order conservation (8%), and the co-occurrence of genes across genomes (11%), significant context information can be obtained for 50% of the genes (the categories overlap). Qualitatively, we observe that the functional interactions between genes are stronger as the requirements for physical neighborhood on the genome are more stringent, while the fraction of potential false positives decreases. Moreover, only in cases in which gene order is conserved in a substantial fraction of the genomes, in this case six out of twenty-five, does a single type of functional interaction (physical interaction) clearly dominate (>80%). In other cases, complementary function information from homology searches, which is available for most of the genes with significant genomic context, is essential to predict the type of interaction. Using a combination of genomic context and homology searches, new functional features can be predicted for 10% of M. genitalium genes. PMID:10958638
Contemporary Network Proteomics and Its Requirements
Goh, Wilson Wen Bin; Wong, Limsoon; Sng, Judy Chia Ghee
2013-01-01
The integration of networks with genomics (network genomics) is a familiar field. Conventional network analysis takes advantage of the larger coverage and relative stability of gene expression measurements. Network proteomics on the other hand has to develop further on two critical factors: (1) expanded data coverage and consistency, and (2) suitable reference network libraries, and data mining from them. Concerning (1) we discuss several contemporary themes that can improve data quality, which in turn will boost the outcome of downstream network analysis. For (2), we focus on network analysis developments, specifically, the need for context-specific networks and essential considerations for localized network analysis. PMID:24833333
IMG/M-HMP: a metagenome comparative analysis system for the Human Microbiome Project.
Markowitz, Victor M; Chen, I-Min A; Chu, Ken; Szeto, Ernest; Palaniappan, Krishna; Jacob, Biju; Ratner, Anna; Liolios, Konstantinos; Pagani, Ioanna; Huntemann, Marcel; Mavromatis, Konstantinos; Ivanova, Natalia N; Kyrpides, Nikos C
2012-01-01
The Integrated Microbial Genomes and Metagenomes (IMG/M) resource is a data management system that supports the analysis of sequence data from microbial communities in the integrated context of all publicly available draft and complete genomes from the three domains of life as well as a large number of plasmids and viruses. IMG/M currently contains thousands of genomes and metagenome samples with billions of genes. IMG/M-HMP is an IMG/M data mart serving the US National Institutes of Health (NIH) Human Microbiome Project (HMP), focussed on HMP generated metagenome datasets, and is one of the central resources provided from the HMP Data Analysis and Coordination Center (DACC). IMG/M-HMP is available at http://www.hmpdacc-resources.org/imgm_hmp/.
Lightweight genome viewer: portable software for browsing genomics data in its chromosomal context
Faith, Jeremiah J; Olson, Andrew J; Gardner, Timothy S; Sachidanandam, Ravi
2007-01-01
Background Lightweight genome viewer (lwgv) is a web-based tool for visualization of sequence annotations in their chromosomal context. It performs most of the functions of larger genome browsers, while relying on standard flat-file formats and bypassing the database needs of most visualization tools. Visualization as an aide to discovery requires display of novel data in conjunction with static annotations in their chromosomal context. With database-based systems, displaying dynamic results requires temporary tables that need to be tracked for removal. Results lwgv simplifies the visualization of user-generated results on a local computer. The dynamic results of these analyses are written to transient files, which can import static content from a more permanent file. lwgv is currently used in many different applications, from whole genome browsers to single-gene RNAi design visualization, demonstrating its applicability in a large variety of contexts and scales. Conclusion lwgv provides a lightweight alternative to large genome browsers for visualizing biological annotations and dynamic analyses in their chromosomal context. It is particularly suited for applications ranging from short sequences to medium-sized genomes when the creation and maintenance of a large software and database infrastructure is not necessary or desired. PMID:17877794
Lightweight genome viewer: portable software for browsing genomics data in its chromosomal context.
Faith, Jeremiah J; Olson, Andrew J; Gardner, Timothy S; Sachidanandam, Ravi
2007-09-18
Lightweight genome viewer (lwgv) is a web-based tool for visualization of sequence annotations in their chromosomal context. It performs most of the functions of larger genome browsers, while relying on standard flat-file formats and bypassing the database needs of most visualization tools. Visualization as an aide to discovery requires display of novel data in conjunction with static annotations in their chromosomal context. With database-based systems, displaying dynamic results requires temporary tables that need to be tracked for removal. lwgv simplifies the visualization of user-generated results on a local computer. The dynamic results of these analyses are written to transient files, which can import static content from a more permanent file. lwgv is currently used in many different applications, from whole genome browsers to single-gene RNAi design visualization, demonstrating its applicability in a large variety of contexts and scales. lwgv provides a lightweight alternative to large genome browsers for visualizing biological annotations and dynamic analyses in their chromosomal context. It is particularly suited for applications ranging from short sequences to medium-sized genomes when the creation and maintenance of a large software and database infrastructure is not necessary or desired.
El Shanti, Hatem; Chouchane, Lotfi; Badii, Ramin; Gallouzi, Imed Eddine; Gasparini, Paolo
2015-11-14
In 2013 both Saudi Arabia and Qatar launched genome projects with the aim of providing information for better diagnosis, treatment and prevention of diseases and, ultimately to realize personalized medicine by sequencing hundred thousands samples. These population based genome activities raise a series of relevant ethical, legal and social issues general, related to the specific population structure as well as to the Islamic perspective on genomic analysis and genetic testing. To contribute to the debate, the Authors after reviewing the existing literature and taking advantage of their professional experience in the field and in the geographic area, discuss and provide their opinions. In particular, the Authors focus on the impact of consanguinity on population structure and disease frequency in the Arab world, on genetic testing and genomic analysis (i.e. technical aspects, impact, etc.) and on their regulations. A comparison between the Islamic perspective and the ethical, social and legal issues raised in other population contexts is also carried. In conclusion, this opinion article with an up-to-date contribution to the discussion on the relevance and impact of genomic analysis and genetic testing in the Arab world, might help in producing specific national guidelines on genetic testing and genomic analysis and help accelerate the implementation and roll out of genome projects in Muslim countries and more specifically in Qatar, and other countries of the Gulf.
Integration and visualization of systems biology data in context of the genome
2010-01-01
Background High-density tiling arrays and new sequencing technologies are generating rapidly increasing volumes of transcriptome and protein-DNA interaction data. Visualization and exploration of this data is critical to understanding the regulatory logic encoded in the genome by which the cell dynamically affects its physiology and interacts with its environment. Results The Gaggle Genome Browser is a cross-platform desktop program for interactively visualizing high-throughput data in the context of the genome. Important features include dynamic panning and zooming, keyword search and open interoperability through the Gaggle framework. Users may bookmark locations on the genome with descriptive annotations and share these bookmarks with other users. The program handles large sets of user-generated data using an in-process database and leverages the facilities of SQL and the R environment for importing and manipulating data. A key aspect of the Gaggle Genome Browser is interoperability. By connecting to the Gaggle framework, the genome browser joins a suite of interconnected bioinformatics tools for analysis and visualization with connectivity to major public repositories of sequences, interactions and pathways. To this flexible environment for exploring and combining data, the Gaggle Genome Browser adds the ability to visualize diverse types of data in relation to its coordinates on the genome. Conclusions Genomic coordinates function as a common key by which disparate biological data types can be related to one another. In the Gaggle Genome Browser, heterogeneous data are joined by their location on the genome to create information-rich visualizations yielding insight into genome organization, transcription and its regulation and, ultimately, a better understanding of the mechanisms that enable the cell to dynamically respond to its environment. PMID:20642854
The coffee genome hub: a resource for coffee genomes
Dereeper, Alexis; Bocs, Stéphanie; Rouard, Mathieu; Guignon, Valentin; Ravel, Sébastien; Tranchant-Dubreuil, Christine; Poncet, Valérie; Garsmeur, Olivier; Lashermes, Philippe; Droc, Gaëtan
2015-01-01
The whole genome sequence of Coffea canephora, the perennial diploid species known as Robusta, has been recently released. In the context of the C. canephora genome sequencing project and to support post-genomics efforts, we developed the Coffee Genome Hub (http://coffee-genome.org/), an integrative genome information system that allows centralized access to genomics and genetics data and analysis tools to facilitate translational and applied research in coffee. We provide the complete genome sequence of C. canephora along with gene structure, gene product information, metabolism, gene families, transcriptomics, syntenic blocks, genetic markers and genetic maps. The hub relies on generic software (e.g. GMOD tools) for easy querying, visualizing and downloading research data. It includes a Genome Browser enhanced by a Community Annotation System, enabling the improvement of automatic gene annotation through an annotation editor. In addition, the hub aims at developing interoperability among other existing South Green tools managing coffee data (phylogenomics resources, SNPs) and/or supporting data analyses with the Galaxy workflow manager. PMID:25392413
Behura, Susanta K.; Severson, David W.
2014-01-01
The mosquito Aedes aegypti is the primary vector of dengue virus (DENV) infection in most of the subtropical and tropical countries. Besides DENV, yellow fever virus (YFV) is also transmitted by A. aegypti. Susceptibility of A. aegypti to West Nile virus (WNV) has also been confirmed. Although studies have indicated correlation of codon bias between flaviviridae and their animal/insect hosts, it is not clear if codon sequences have any relation to susceptibility of A. aegypti to DENV, YFV and WNV. In the current study, usages of codon context sequences (codon pairs for neighboring amino acids) of the vector (A. aegypti) genome as well as the flaviviral genomes are investigated. We used bioinformatics methods to quantify codon context bias in a genome-wide manner of A. aegypti as well as DENV, WNV and YFV sequences. Mutual information statistics was applied to perform bicluster analysis of codon context bias between vector and flaviviral sequences. Functional relevance of the bicluster pattern was inferred from published microarray data. Our study shows that codon context bias of DENV, WNV and YFV sequences varies in a bicluster manner with that of specific sets of genes of A. aegypti. Many of these mosquito genes are known to be differentially expressed in response to flaviviral infection suggesting that codon context sequences of A. aegypti and the flaviviruses may play a role in the susceptible interaction between flaviviruses and this mosquito. The bias inusages of codon context sequences likely has a functional association with susceptibility of A. aegypti to flaviviral infection. The results from this study will allow us to conduct hypothesis driven tests to examine the role of codon contexts bias in evolution of vector-virus interactions at the molecular level. PMID:24838953
Education through Fiction: Acquiring Opinion-Forming Skills in the Context of Genomics
ERIC Educational Resources Information Center
Knippels, Marie-Christine P. J.; Severiens, Sabine E.; Klop, Tanja
2009-01-01
The present study examined the outcomes of a newly designed four-lesson science module on opinion-forming in the context of genomics in upper secondary education. The lesson plan aims to foster 16-year-old students' opinion-forming skills in the context of genomics and to test the effect of the use of fiction in the module. The basic hypothesis…
Identification of cis-suppression of human disease mutations by comparative genomics.
Jordan, Daniel M; Frangakis, Stephan G; Golzio, Christelle; Cassa, Christopher A; Kurtzberg, Joanne; Davis, Erica E; Sunyaev, Shamil R; Katsanis, Nicholas
2015-08-13
Patterns of amino acid conservation have served as a tool for understanding protein evolution. The same principles have also found broad application in human genomics, driven by the need to interpret the pathogenic potential of variants in patients. Here we performed a systematic comparative genomics analysis of human disease-causing missense variants. We found that an appreciable fraction of disease-causing alleles are fixed in the genomes of other species, suggesting a role for genomic context. We developed a model of genetic interactions that predicts most of these to be simple pairwise compensations. Functional testing of this model on two known human disease genes revealed discrete cis amino acid residues that, although benign on their own, could rescue the human mutations in vivo. This approach was also applied to ab initio gene discovery to support the identification of a de novo disease driver in BTG2 that is subject to protective cis-modification in more than 50 species. Finally, on the basis of our data and models, we developed a computational tool to predict candidate residues subject to compensation. Taken together, our data highlight the importance of cis-genomic context as a contributor to protein evolution; they provide an insight into the complexity of allele effect on phenotype; and they are likely to assist methods for predicting allele pathogenicity.
Sikora, Martin; Carpenter, Meredith L.; Moreno-Estrada, Andres; Henn, Brenna M.; Underhill, Peter A.; Sánchez-Quinto, Federico; Zara, Ilenia; Pitzalis, Maristella; Sidore, Carlo; Busonero, Fabio; Maschio, Andrea; Angius, Andrea; Jones, Chris; Mendoza-Revilla, Javier; Nekhrizov, Georgi; Dimitrova, Diana; Theodossiev, Nikola; Harkins, Timothy T.; Keller, Andreas; Maixner, Frank; Zink, Albert; Abecasis, Goncalo; Sanna, Serena; Cucca, Francesco; Bustamante, Carlos D.
2014-01-01
Genome sequencing of the 5,300-year-old mummy of the Tyrolean Iceman, found in 1991 on a glacier near the border of Italy and Austria, has yielded new insights into his origin and relationship to modern European populations. A key finding of that study was an apparent recent common ancestry with individuals from Sardinia, based largely on the Y chromosome haplogroup and common autosomal SNP variation. Here, we compiled and analyzed genomic datasets from both modern and ancient Europeans, including genome sequence data from over 400 Sardinians and two ancient Thracians from Bulgaria, to investigate this result in greater detail and determine its implications for the genetic structure of Neolithic Europe. Using whole-genome sequencing data, we confirm that the Iceman is, indeed, most closely related to Sardinians. Furthermore, we show that this relationship extends to other individuals from cultural contexts associated with the spread of agriculture during the Neolithic transition, in contrast to individuals from a hunter-gatherer context. We hypothesize that this genetic affinity of ancient samples from different parts of Europe with Sardinians represents a common genetic component that was geographically widespread across Europe during the Neolithic, likely related to migrations and population expansions associated with the spread of agriculture. PMID:24809476
Single genome retrieval of context-dependent variability in mutation rates for human germline.
Sahakyan, Aleksandr B; Balasubramanian, Shankar
2017-01-13
Accurate knowledge of the core components of substitution rates is of vital importance to understand genome evolution and dynamics. By performing a single-genome and direct analysis of 39,894 retrotransposon remnants, we reveal sequence context-dependent germline nucleotide substitution rates for the human genome. The rates are characterised through rate constants in a time-domain, and are made available through a dedicated program (Trek) and a stand-alone database. Due to the nature of the method design and the imposed stringency criteria, we expect our rate constants to be good estimates for the rates of spontaneous mutations. Benefiting from such data, we study the short-range nucleotide (up to 7-mer) organisation and the germline basal substitution propensity (BSP) profile of the human genome; characterise novel, CpG-independent, substitution prone and resistant motifs; confirm a decreased tendency of moieties with low BSP to undergo somatic mutations in a number of cancer types; and, produce a Trek-based estimate of the overall mutation rate in human. The extended set of rate constants we report may enrich our resources and help advance our understanding of genome dynamics and evolution, with possible implications for the role of spontaneous mutations in the emergence of pathological genotypes and neutral evolution of proteomes.
Peterson, Elena S; McCue, Lee Ann; Schrimpe-Rutledge, Alexandra C; Jensen, Jeffrey L; Walker, Hyunjoo; Kobold, Markus A; Webb, Samantha R; Payne, Samuel H; Ansong, Charles; Adkins, Joshua N; Cannon, William R; Webb-Robertson, Bobbie-Jo M
2012-04-05
The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates. VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data. VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at https://www.biopilot.org/docs/Software/Vespa.php.
2012-01-01
Background The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates. Results VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data. Conclusions VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at https://www.biopilot.org/docs/Software/Vespa.php. PMID:22480257
Viral to metazoan marine plankton nucleotide sequences from the Tara Oceans expedition
Alberti, Adriana; Poulain, Julie; Engelen, Stefan; Labadie, Karine; Romac, Sarah; Ferrera, Isabel; Albini, Guillaume; Aury, Jean-Marc; Belser, Caroline; Bertrand, Alexis; Cruaud, Corinne; Da Silva, Corinne; Dossat, Carole; Gavory, Frédérick; Gas, Shahinaz; Guy, Julie; Haquelle, Maud; Jacoby, E'krame; Jaillon, Olivier; Lemainque, Arnaud; Pelletier, Eric; Samson, Gaëlle; Wessner, Mark; Bazire, Pascal; Beluche, Odette; Bertrand, Laurie; Besnard-Gonnet, Marielle; Bordelais, Isabelle; Boutard, Magali; Dubois, Maria; Dumont, Corinne; Ettedgui, Evelyne; Fernandez, Patricia; Garcia, Espérance; Aiach, Nathalie Giordanenco; Guerin, Thomas; Hamon, Chadia; Brun, Elodie; Lebled, Sandrine; Lenoble, Patricia; Louesse, Claudine; Mahieu, Eric; Mairey, Barbara; Martins, Nathalie; Megret, Catherine; Milani, Claire; Muanga, Jacqueline; Orvain, Céline; Payen, Emilie; Perroud, Peggy; Petit, Emmanuelle; Robert, Dominique; Ronsin, Murielle; Vacherie, Benoit; Acinas, Silvia G.; Royo-Llonch, Marta; Cornejo-Castillo, Francisco M.; Logares, Ramiro; Fernández-Gómez, Beatriz; Bowler, Chris; Cochrane, Guy; Amid, Clara; Hoopen, Petra Ten; De Vargas, Colomban; Grimsley, Nigel; Desgranges, Elodie; Kandels-Lewis, Stefanie; Ogata, Hiroyuki; Poulton, Nicole; Sieracki, Michael E.; Stepanauskas, Ramunas; Sullivan, Matthew B.; Brum, Jennifer R.; Duhaime, Melissa B.; Poulos, Bonnie T.; Hurwitz, Bonnie L.; Acinas, Silvia G.; Bork, Peer; Boss, Emmanuel; Bowler, Chris; De Vargas, Colomban; Follows, Michael; Gorsky, Gabriel; Grimsley, Nigel; Hingamp, Pascal; Iudicone, Daniele; Jaillon, Olivier; Kandels-Lewis, Stefanie; Karp-Boss, Lee; Karsenti, Eric; Not, Fabrice; Ogata, Hiroyuki; Pesant, Stéphane; Raes, Jeroen; Sardet, Christian; Sieracki, Michael E.; Speich, Sabrina; Stemmann, Lars; Sullivan, Matthew B.; Sunagawa, Shinichi; Wincker, Patrick; Pesant, Stéphane; Karsenti, Eric; Wincker, Patrick
2017-01-01
A unique collection of oceanic samples was gathered by the Tara Oceans expeditions (2009–2013), targeting plankton organisms ranging from viruses to metazoans, and providing rich environmental context measurements. Thanks to recent advances in the field of genomics, extensive sequencing has been performed for a deep genomic analysis of this huge collection of samples. A strategy based on different approaches, such as metabarcoding, metagenomics, single-cell genomics and metatranscriptomics, has been chosen for analysis of size-fractionated plankton communities. Here, we provide detailed procedures applied for genomic data generation, from nucleic acids extraction to sequence production, and we describe registries of genomics datasets available at the European Nucleotide Archive (ENA, www.ebi.ac.uk/ena). The association of these metadata to the experimental procedures applied for their generation will help the scientific community to access these data and facilitate their analysis. This paper complements other efforts to provide a full description of experiments and open science resources generated from the Tara Oceans project, further extending their value for the study of the world’s planktonic ecosystems. PMID:28763055
Viral to metazoan marine plankton nucleotide sequences from the Tara Oceans expedition.
Alberti, Adriana; Poulain, Julie; Engelen, Stefan; Labadie, Karine; Romac, Sarah; Ferrera, Isabel; Albini, Guillaume; Aury, Jean-Marc; Belser, Caroline; Bertrand, Alexis; Cruaud, Corinne; Da Silva, Corinne; Dossat, Carole; Gavory, Frédérick; Gas, Shahinaz; Guy, Julie; Haquelle, Maud; Jacoby, E'krame; Jaillon, Olivier; Lemainque, Arnaud; Pelletier, Eric; Samson, Gaëlle; Wessner, Mark; Acinas, Silvia G; Royo-Llonch, Marta; Cornejo-Castillo, Francisco M; Logares, Ramiro; Fernández-Gómez, Beatriz; Bowler, Chris; Cochrane, Guy; Amid, Clara; Hoopen, Petra Ten; De Vargas, Colomban; Grimsley, Nigel; Desgranges, Elodie; Kandels-Lewis, Stefanie; Ogata, Hiroyuki; Poulton, Nicole; Sieracki, Michael E; Stepanauskas, Ramunas; Sullivan, Matthew B; Brum, Jennifer R; Duhaime, Melissa B; Poulos, Bonnie T; Hurwitz, Bonnie L; Pesant, Stéphane; Karsenti, Eric; Wincker, Patrick
2017-08-01
A unique collection of oceanic samples was gathered by the Tara Oceans expeditions (2009-2013), targeting plankton organisms ranging from viruses to metazoans, and providing rich environmental context measurements. Thanks to recent advances in the field of genomics, extensive sequencing has been performed for a deep genomic analysis of this huge collection of samples. A strategy based on different approaches, such as metabarcoding, metagenomics, single-cell genomics and metatranscriptomics, has been chosen for analysis of size-fractionated plankton communities. Here, we provide detailed procedures applied for genomic data generation, from nucleic acids extraction to sequence production, and we describe registries of genomics datasets available at the European Nucleotide Archive (ENA, www.ebi.ac.uk/ena). The association of these metadata to the experimental procedures applied for their generation will help the scientific community to access these data and facilitate their analysis. This paper complements other efforts to provide a full description of experiments and open science resources generated from the Tara Oceans project, further extending their value for the study of the world's planktonic ecosystems.
TabPath: interactive tables for metabolic pathway analysis.
Moraes, Lauro Ângelo Gonçalves de; Felestrino, Érica Barbosa; Assis, Renata de Almeida Barbosa; Matos, Diogo; Lima, Joubert de Castro; Lima, Leandro de Araújo; Almeida, Nalvo Franco; Setubal, João Carlos; Garcia, Camila Carrião Machado; Moreira, Leandro Marcio
2018-03-15
Information about metabolic pathways in a comparative context is one of the most powerful tool to help the understanding of genome-based differences in phenotypes among organisms. Although several platforms exist that provide a wealth of information on metabolic pathways of diverse organisms, the comparison among organisms using metabolic pathways is still a difficult task. We present TabPath (Tables for Metabolic Pathway), a web-based tool to facilitate comparison of metabolic pathways in genomes based on KEGG. From a selection of pathways and genomes of interest on the menu, TabPath generates user-friendly tables that facilitate analysis of variations in metabolism among the selected organisms. TabPath is available at http://200.239.132.160:8686. lmmorei@gmail.com.
Minari, Jusaku; Shirai, Tetsuya; Kato, Kazuto
2014-12-01
As evidenced by high-throughput sequencers, genomic technologies have recently undergone radical advances. These technologies enable comprehensive sequencing of personal genomes considerably more efficiently and less expensively than heretofore. These developments present a challenge to the conventional framework of biomedical ethics; under these changing circumstances, each research project has to develop a pragmatic research policy. Based on the experience with a new large-scale project-the Genome Science Project-this article presents a novel approach to conducting a specific policy for personal genome research in the Japanese context. In creating an original informed-consent form template for the project, we present a two-tiered process: making the draft of the template following an analysis of national and international policies; refining the draft template in conjunction with genome project researchers for practical application. Through practical use of the template, we have gained valuable experience in addressing challenges in the ethical review process, such as the importance of sharing details of the latest developments in genomics with members of research ethics committees. We discuss certain limitations of the conventional concept of informed consent and its governance system and suggest the potential of an alternative process using information technology.
Rueda, Manuel; Torkamani, Ali
2017-08-18
Whole genome and exome sequencing usually include reads containing mitochondrial DNA (mtDNA). Yet, state-of-the-art pipelines and services for human nuclear genome variant calling and annotation do not handle mitochondrial genome data appropriately. As a consequence, any researcher desiring to add mtDNA variant analysis to their investigations is forced to explore the literature for mtDNA pipelines, evaluate them, and implement their own instance of the desired tool. This task is far from trivial, and can be prohibitive for non-bioinformaticians. We have developed SG-ADVISER mtDNA, a web server to facilitate the analysis and interpretation of mtDNA genomic data coming from next generation sequencing (NGS) experiments. The server was built in the context of our SG-ADVISER framework and on top of the MtoolBox platform (Calabrese et al., Bioinformatics 30(21):3115-3117, 2014), and includes most of its functionalities (i.e., assembly of mitochondrial genomes, heteroplasmic fractions, haplogroup assignment, functional and prioritization analysis of mitochondrial variants) as well as a back-end and a front-end interface. The server has been tested with unpublished data from 200 individuals of a healthy aging cohort (Erikson et al., Cell 165(4):1002-1011, 2016) and their data is made publicly available here along with a preliminary analysis of the variants. We observed that individuals over ~90 years old carried low levels of heteroplasmic variants in their genomes. SG-ADVISER mtDNA is a fast and functional tool that allows for variant calling and annotation of human mtDNA data coming from NGS experiments. The server was built with simplicity in mind, and builds on our own experience in interpreting mtDNA variants in the context of sudden death and rare diseases. Our objective is to provide an interface for non-bioinformaticians aiming to acquire (or contrast) mtDNA annotations via MToolBox. SG-ADVISER web server is freely available to all users at https://genomics.scripps.edu/mtdna .
Genomic instability is a hallmark of human cancer, and results in widespread somatic copy number alterations. We used a genome-scale shRNA viability screen in human cancer cell lines to systematically identify genes that are essential in the context of particular copy-number alterations (copy-number associated gene dependencies). The most enriched class of copy-number associated gene dependencies was CYCLOPS (Copy-number alterations Yielding Cancer Liabilities Owing to Partial losS) genes, and spliceosome components were the most prevalent.
PSAT: A web tool to compare genomic neighborhoods of multiple prokaryotic genomes
Fong, Christine; Rohmer, Laurence; Radey, Matthew; Wasnick, Michael; Brittnacher, Mitchell J
2008-01-01
Background The conservation of gene order among prokaryotic genomes can provide valuable insight into gene function, protein interactions, or events by which genomes have evolved. Although some tools are available for visualizing and comparing the order of genes between genomes of study, few support an efficient and organized analysis between large numbers of genomes. The Prokaryotic Sequence homology Analysis Tool (PSAT) is a web tool for comparing gene neighborhoods among multiple prokaryotic genomes. Results PSAT utilizes a database that is preloaded with gene annotation, BLAST hit results, and gene-clustering scores designed to help identify regions of conserved gene order. Researchers use the PSAT web interface to find a gene of interest in a reference genome and efficiently retrieve the sequence homologs found in other bacterial genomes. The tool generates a graphic of the genomic neighborhood surrounding the selected gene and the corresponding regions for its homologs in each comparison genome. Homologs in each region are color coded to assist users with analyzing gene order among various genomes. In contrast to common comparative analysis methods that filter sequence homolog data based on alignment score cutoffs, PSAT leverages gene context information for homologs, including those with weak alignment scores, enabling a more sensitive analysis. Features for constraining or ordering results are designed to help researchers browse results from large numbers of comparison genomes in an organized manner. PSAT has been demonstrated to be useful for helping to identify gene orthologs and potential functional gene clusters, and detecting genome modifications that may result in loss of function. Conclusion PSAT allows researchers to investigate the order of genes within local genomic neighborhoods of multiple genomes. A PSAT web server for public use is available for performing analyses on a growing set of reference genomes through any web browser with no client side software setup or installation required. Source code is freely available to researchers interested in setting up a local version of PSAT for analysis of genomes not available through the public server. Access to the public web server and instructions for obtaining source code can be found at . PMID:18366802
Inferring genome-wide interplay landscape between DNA methylation and transcriptional regulation.
Tang, Binhua; Wang, Xin
2015-01-01
DNA methylation and transcriptional regulation play important roles in cancer cell development and differentiation processes. Based on the currently available cell line profiling information from the ENCODE Consortium, we propose a Bayesian inference model to infer and construct genome-wide interaction landscape between DNA methylation and transcriptional regulation, which sheds light on the underlying complex functional mechanisms important within the human cancer and disease context. For the first time, we select all the currently available cell lines (>=20) and transcription factors (>=80) profiling information from the ENCODE Consortium portal. Through the integration of those genome-wide profiling sources, our genome-wide analysis detects multiple functional loci of interest, and indicates that DNA methylation is cell- and region-specific, due to the interplay mechanisms with transcription regulatory activities. We validate our analysis results with the corresponding RNA-sequencing technique for those detected genomic loci. Our results provide novel and meaningful insights for the interplay mechanisms of transcriptional regulation and gene expression for the human cancer and disease studies.
Gupta, Amit Kumar; Kaur, Karambir; Rajput, Akanksha; Dhanda, Sandeep Kumar; Sehgal, Manika; Khan, Md. Shoaib; Monga, Isha; Dar, Showkat Ahmad; Singh, Sandeep; Nagpal, Gandharva; Usmani, Salman Sadullah; Thakur, Anamika; Kaur, Gazaldeep; Sharma, Shivangi; Bhardwaj, Aman; Qureshi, Abid; Raghava, Gajendra Pal Singh; Kumar, Manoj
2016-01-01
Current Zika virus (ZIKV) outbreaks that spread in several areas of Africa, Southeast Asia, and in pacific islands is declared as a global health emergency by World Health Organization (WHO). It causes Zika fever and illness ranging from severe autoimmune to neurological complications in humans. To facilitate research on this virus, we have developed an integrative multi-omics platform; ZikaVR (http://bioinfo.imtech.res.in/manojk/zikavr/), dedicated to the ZIKV genomic, proteomic and therapeutic knowledge. It comprises of whole genome sequences, their respective functional information regarding proteins, genes, and structural content. Additionally, it also delivers sophisticated analysis such as whole-genome alignments, conservation and variation, CpG islands, codon context, usage bias and phylogenetic inferences at whole genome and proteome level with user-friendly visual environment. Further, glycosylation sites and molecular diagnostic primers were also analyzed. Most importantly, we also proposed potential therapeutically imperative constituents namely vaccine epitopes, siRNAs, miRNAs, sgRNAs and repurposing drug candidates. PMID:27633273
COGNAT: a web server for comparative analysis of genomic neighborhoods.
Klimchuk, Olesya I; Konovalov, Kirill A; Perekhvatov, Vadim V; Skulachev, Konstantin V; Dibrova, Daria V; Mulkidjanian, Armen Y
2017-11-22
In prokaryotic genomes, functionally coupled genes can be organized in conserved gene clusters enabling their coordinated regulation. Such clusters could contain one or several operons, which are groups of co-transcribed genes. Those genes that evolved from a common ancestral gene by speciation (i.e. orthologs) are expected to have similar genomic neighborhoods in different organisms, whereas those copies of the gene that are responsible for dissimilar functions (i.e. paralogs) could be found in dissimilar genomic contexts. Comparative analysis of genomic neighborhoods facilitates the prediction of co-regulated genes and helps to discern different functions in large protein families. We intended, building on the attribution of gene sequences to the clusters of orthologous groups of proteins (COGs), to provide a method for visualization and comparative analysis of genomic neighborhoods of evolutionary related genes, as well as a respective web server. Here we introduce the COmparative Gene Neighborhoods Analysis Tool (COGNAT), a web server for comparative analysis of genomic neighborhoods. The tool is based on the COG database, as well as the Pfam protein families database. As an example, we show the utility of COGNAT in identifying a new type of membrane protein complex that is formed by paralog(s) of one of the membrane subunits of the NADH:quinone oxidoreductase of type 1 (COG1009) and a cytoplasmic protein of unknown function (COG3002). This article was reviewed by Drs. Igor Zhulin, Uri Gophna and Igor Rogozin.
Genome-wide network analysis of Wnt signaling in three pediatric cancers
NASA Astrophysics Data System (ADS)
Bao, Ju; Lee, Ho-Jin; Zheng, Jie J.
2013-10-01
Genomic structural alteration is common in pediatric cancers, and analysis of data generated by the Pediatric Cancer Genome Project reveals such tumor-related alterations in many Wnt signaling-associated genes. Most pediatric cancers are thought to arise within developing tissues that undergo substantial expansion during early organ formation, growth and maturation, and Wnt signaling plays an important role in this development. We examined three pediatric tumors--medullobastoma, early T-cell precursor acute lymphoblastic leukemia, and retinoblastoma--that show multiple genomic structural variations within Wnt signaling pathways. We mathematically modeled this pathway to investigate the effects of cancer-related structural variations on Wnt signaling. Surprisingly, we found that an outcome measure of canonical Wnt signaling was consistently similar in matched cancer cells and normal cells, even in the context of different cancers, different mutations, and different Wnt-related genes. Our results suggest that the cancer cells maintain a normal level of Wnt signaling by developing multiple mutations.
Tebel, Katrin; Boldt, Vivien; Steininger, Anne; Port, Matthias; Ebert, Grit; Ullmann, Reinhard
2017-01-06
The analysis of DNA copy number variants (CNV) has increasing impact in the field of genetic diagnostics and research. However, the interpretation of CNV data derived from high resolution array CGH or NGS platforms is complicated by the considerable variability of the human genome. Therefore, tools for multidimensional data analysis and comparison of patient cohorts are needed to assist in the discrimination of clinically relevant CNVs from others. We developed GenomeCAT, a standalone Java application for the analysis and integrative visualization of CNVs. GenomeCAT is composed of three modules dedicated to the inspection of single cases, comparative analysis of multidimensional data and group comparisons aiming at the identification of recurrent aberrations in patients sharing the same phenotype, respectively. Its flexible import options ease the comparative analysis of own results derived from microarray or NGS platforms with data from literature or public depositories. Multidimensional data obtained from different experiment types can be merged into a common data matrix to enable common visualization and analysis. All results are stored in the integrated MySQL database, but can also be exported as tab delimited files for further statistical calculations in external programs. GenomeCAT offers a broad spectrum of visualization and analysis tools that assist in the evaluation of CNVs in the context of other experiment data and annotations. The use of GenomeCAT does not require any specialized computer skills. The various R packages implemented for data analysis are fully integrated into GenomeCATs graphical user interface and the installation process is supported by a wizard. The flexibility in terms of data import and export in combination with the ability to create a common data matrix makes the program also well suited as an interface between genomic data from heterogeneous sources and external software tools. Due to the modular architecture the functionality of GenomeCAT can be easily extended by further R packages or customized plug-ins to meet future requirements.
Network analysis of transcriptomics expands regulatory landscapes in Synechococcus sp. PCC 7002
DOE Office of Scientific and Technical Information (OSTI.GOV)
McClure, Ryan S.; Overall, Christopher C.; McDermott, Jason E.
Cyanobacterial regulation of gene expression must contend with a genome organization that lacks apparent functional context, as the majority of cellular processes and metabolic pathways are encoded by genes found at disparate locations across the genome. In addition, the fact that coordinated regulation of cyanobacterial cellular machinery takes place with significantly fewer transcription factors, compared to other Eubacteria, suggests the involvement of post-transcriptional mechanisms and regulatory adaptations which are not fully understood. Global transcript abundance from model cyanobacterium Synechococcus sp. PCC 7002 grown under 42 different conditions was analyzed using context-likelihood of relatedness. The resulting 903-gene network, which was organizedmore » into 11 modules, not only allowed classification of cyanobacterial responses to specific environmental variables but provided insight into the transcriptional network topology and led to the expansion of predicted regulons. When used in conjunction with genome sequence, the global transcript abundance allowed identification of putative post-transcriptional changes in expression as well as novel potential targets of both DNA binding proteins and asRNA regulators. The results offer a new perspective into the multi-level regulation that governs cellular adaptations of fast-growing physiologically robust cyanobacterium Synechococcus sp. PCC 7002 to changing environmental variables. It also extends a methodological knowledge-based framework for studying multi-scale regulatory mechanisms that operate in cyanobacteria. Finally, it provides valuable context for integrating systems-level data to enhance evidence-driven genomic annotation, especially in organisms where traditional context analyses cannot be implemented due to lack of operon-based functional organization.« less
Variation resources at UC Santa Cruz.
Thomas, Daryl J; Trumbower, Heather; Kern, Andrew D; Rhead, Brooke L; Kuhn, Robert M; Haussler, David; Kent, W James
2007-01-01
The variation resources within the University of California Santa Cruz Genome Browser include polymorphism data drawn from public collections and analyses of these data, along with their display in the context of other genomic annotations. Primary data from dbSNP is included for many organisms, with added information including genomic alleles and orthologous alleles for closely related organisms. Display filtering and coloring is available by variant type, functional class or other annotations. Annotation of potential errors is highlighted and a genomic alignment of the variant's flanking sequence is displayed. HapMap allele frequencies and linkage disequilibrium (LD) are available for each HapMap population, along with non-human primate alleles. The browsing and analysis tools, downloadable data files and links to documentation and other information can be found at http://genome.ucsc.edu/.
Ensembl Genomes 2013: scaling up access to genome-wide data.
Kersey, Paul Julian; Allen, James E; Christensen, Mikkel; Davis, Paul; Falin, Lee J; Grabmueller, Christoph; Hughes, Daniel Seth Toney; Humphrey, Jay; Kerhornou, Arnaud; Khobova, Julia; Langridge, Nicholas; McDowall, Mark D; Maheswari, Uma; Maslen, Gareth; Nuhn, Michael; Ong, Chuang Kee; Paulini, Michael; Pedro, Helder; Toneva, Iliana; Tuli, Mary Ann; Walts, Brandon; Williams, Gareth; Wilson, Derek; Youens-Clark, Ken; Monaco, Marcela K; Stein, Joshua; Wei, Xuehong; Ware, Doreen; Bolser, Daniel M; Howe, Kevin Lee; Kulesha, Eugene; Lawson, Daniel; Staines, Daniel Michael
2014-01-01
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. This article provides an update to the previous publications about the resource, with a focus on recent developments. These include the addition of important new genomes (and related data sets) including crop plants, vectors of human disease and eukaryotic pathogens. In addition, the resource has scaled up its representation of bacterial genomes, and now includes the genomes of over 9000 bacteria. Specific extensions to the web and programmatic interfaces have been developed to support users in navigating these large data sets. Looking forward, analytic tools to allow targeted selection of data for visualization and download are likely to become increasingly important in future as the number of available genomes increases within all domains of life, and some of the challenges faced in representing bacterial data are likely to become commonplace for eukaryotes in future.
WholePathwayScope: a comprehensive pathway-based analysis tool for high-throughput data
Yi, Ming; Horton, Jay D; Cohen, Jonathan C; Hobbs, Helen H; Stephens, Robert M
2006-01-01
Background Analysis of High Throughput (HTP) Data such as microarray and proteomics data has provided a powerful methodology to study patterns of gene regulation at genome scale. A major unresolved problem in the post-genomic era is to assemble the large amounts of data generated into a meaningful biological context. We have developed a comprehensive software tool, WholePathwayScope (WPS), for deriving biological insights from analysis of HTP data. Result WPS extracts gene lists with shared biological themes through color cue templates. WPS statistically evaluates global functional category enrichment of gene lists and pathway-level pattern enrichment of data. WPS incorporates well-known biological pathways from KEGG (Kyoto Encyclopedia of Genes and Genomes) and Biocarta, GO (Gene Ontology) terms as well as user-defined pathways or relevant gene clusters or groups, and explores gene-term relationships within the derived gene-term association networks (GTANs). WPS simultaneously compares multiple datasets within biological contexts either as pathways or as association networks. WPS also integrates Genetic Association Database and Partial MedGene Database for disease-association information. We have used this program to analyze and compare microarray and proteomics datasets derived from a variety of biological systems. Application examples demonstrated the capacity of WPS to significantly facilitate the analysis of HTP data for integrative discovery. Conclusion This tool represents a pathway-based platform for discovery integration to maximize analysis power. The tool is freely available at . PMID:16423281
Macas, Jiří; Novák, Petr; Pellicer, Jaume; Čížková, Jana; Koblížková, Andrea; Neumann, Pavel; Fuková, Iva; Doležel, Jaroslav; Kelly, Laura J; Leitch, Ilia J
2015-01-01
The differential accumulation and elimination of repetitive DNA are key drivers of genome size variation in flowering plants, yet there have been few studies which have analysed how different types of repeats in related species contribute to genome size evolution within a phylogenetic context. This question is addressed here by conducting large-scale comparative analysis of repeats in 23 species from four genera of the monophyletic legume tribe Fabeae, representing a 7.6-fold variation in genome size. Phylogenetic analysis and genome size reconstruction revealed that this diversity arose from genome size expansions and contractions in different lineages during the evolution of Fabeae. Employing a combination of low-pass genome sequencing with novel bioinformatic approaches resulted in identification and quantification of repeats making up 55-83% of the investigated genomes. In turn, this enabled an analysis of how each major repeat type contributed to the genome size variation encountered. Differential accumulation of repetitive DNA was found to account for 85% of the genome size differences between the species, and most (57%) of this variation was found to be driven by a single lineage of Ty3/gypsy LTR-retrotransposons, the Ogre elements. Although the amounts of several other lineages of LTR-retrotransposons and the total amount of satellite DNA were also positively correlated with genome size, their contributions to genome size variation were much smaller (up to 6%). Repeat analysis within a phylogenetic framework also revealed profound differences in the extent of sequence conservation between different repeat types across Fabeae. In addition to these findings, the study has provided a proof of concept for the approach combining recent developments in sequencing and bioinformatics to perform comparative analyses of repetitive DNAs in a large number of non-model species without the need to assemble their genomes.
ITEP: an integrated toolkit for exploration of microbial pan-genomes.
Benedict, Matthew N; Henriksen, James R; Metcalf, William W; Whitaker, Rachel J; Price, Nathan D
2014-01-03
Comparative genomics is a powerful approach for studying variation in physiological traits as well as the evolution and ecology of microorganisms. Recent technological advances have enabled sequencing large numbers of related genomes in a single project, requiring computational tools for their integrated analysis. In particular, accurate annotations and identification of gene presence and absence are critical for understanding and modeling the cellular physiology of newly sequenced genomes. Although many tools are available to compare the gene contents of related genomes, new tools are necessary to enable close examination and curation of protein families from large numbers of closely related organisms, to integrate curation with the analysis of gain and loss, and to generate metabolic networks linking the annotations to observed phenotypes. We have developed ITEP, an Integrated Toolkit for Exploration of microbial Pan-genomes, to curate protein families, compute similarities to externally-defined domains, analyze gene gain and loss, and generate draft metabolic networks from one or more curated reference network reconstructions in groups of related microbial species among which the combination of core and variable genes constitute the their "pan-genomes". The ITEP toolkit consists of: (1) a series of modular command-line scripts for identification, comparison, curation, and analysis of protein families and their distribution across many genomes; (2) a set of Python libraries for programmatic access to the same data; and (3) pre-packaged scripts to perform common analysis workflows on a collection of genomes. ITEP's capabilities include de novo protein family prediction, ortholog detection, analysis of functional domains, identification of core and variable genes and gene regions, sequence alignments and tree generation, annotation curation, and the integration of cross-genome analysis and metabolic networks for study of metabolic network evolution. ITEP is a powerful, flexible toolkit for generation and curation of protein families. ITEP's modular design allows for straightforward extension as analysis methods and tools evolve. By integrating comparative genomics with the development of draft metabolic networks, ITEP harnesses the power of comparative genomics to build confidence in links between genotype and phenotype and helps disambiguate gene annotations when they are evaluated in both evolutionary and metabolic network contexts.
Identification of cis-suppression of human disease mutations by comparative genomics
Jordan, Daniel M.; Frangakis, Stephan G.; Golzio, Christelle; Cassa, Christopher A.; Kurtzberg, Joanne; Davis, Erica E.; Sunyaev, Shamil R.; Katsanis, Nicholas
2015-01-01
Patterns of amino acid conservation have served as a tool for understanding protein evolution1. The same principles have also found broad application in human genomics, driven by the need to interpret the pathogenic potential of variants in patients2. Here we performed a systematic comparative genomics analysis of human disease-causing missense variants. We found that an appreciable fraction of disease-causing alleles are fixed in the genomes of other species, suggesting a role for genomic context. We developed a model of genetic interactions that predicts most of these to be simple pairwise compensations. Functional testing of this model on two known human disease genes3,4 revealed discrete cis amino acid residues that, although benign on their own, could rescue the human mutations in vivo. This approach was also applied to ab initio gene discovery to support the identification of a de novo disease driver in BTG2 that is subject to protective cis-modification in more than 50 species. Finally, on the basis of our data and models, we developed a computational tool to predict candidate residues subject to compensation. Taken together, our data highlight the importance of cis-genomic context as a contributor to protein evolution; they provide an insight into the complexity of allele effect on phenotype; and they are likely to assist methods for predicting allele pathogenicity5,6. PMID:26123021
Weniger, Markus; Engelmann, Julia C; Schultz, Jörg
2007-01-01
Background Regulation of gene expression is relevant to many areas of biology and medicine, in the study of treatments, diseases, and developmental stages. Microarrays can be used to measure the expression level of thousands of mRNAs at the same time, allowing insight into or comparison of different cellular conditions. The data derived out of microarray experiments is highly dimensional and often noisy, and interpretation of the results can get intricate. Although programs for the statistical analysis of microarray data exist, most of them lack an integration of analysis results and biological interpretation. Results We have developed GEPAT, Genome Expression Pathway Analysis Tool, offering an analysis of gene expression data under genomic, proteomic and metabolic context. We provide an integration of statistical methods for data import and data analysis together with a biological interpretation for subsets of probes or single probes on the chip. GEPAT imports various types of oligonucleotide and cDNA array data formats. Different normalization methods can be applied to the data, afterwards data annotation is performed. After import, GEPAT offers various statistical data analysis methods, as hierarchical, k-means and PCA clustering, a linear model based t-test or chromosomal profile comparison. The results of the analysis can be interpreted by enrichment of biological terms, pathway analysis or interaction networks. Different biological databases are included, to give various information for each probe on the chip. GEPAT offers no linear work flow, but allows the usage of any subset of probes and samples as a start for a new data analysis. GEPAT relies on established data analysis packages, offers a modular approach for an easy extension, and can be run on a computer grid to allow a large number of users. It is freely available under the LGPL open source license for academic and commercial users at . Conclusion GEPAT is a modular, scalable and professional-grade software integrating analysis and interpretation of microarray gene expression data. An installation available for academic users can be found at . PMID:17543125
PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses
Purcell, Shaun ; Neale, Benjamin ; Todd-Brown, Kathe ; Thomas, Lori ; Ferreira, Manuel A. R. ; Bender, David ; Maller, Julian ; Sklar, Pamela ; de Bakker, Paul I. W. ; Daly, Mark J. ; Sham, Pak C.
2007-01-01
Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis. PMID:17701901
Precision medicine in pediatric oncology: Lessons learned and next steps.
Mody, Rajen J; Prensner, John R; Everett, Jessica; Parsons, D Williams; Chinnaiyan, Arul M
2017-03-01
The maturation of genomic technologies has enabled new discoveries in disease pathogenesis as well as new approaches to patient care. In pediatric oncology, patients may now receive individualized genomic analysis to identify molecular aberrations of relevance for diagnosis and/or treatment. In this context, several recent clinical studies have begun to explore the feasibility and utility of genomics-driven precision medicine. Here, we review the major developments in this field, discuss current limitations, and explore aspects of the clinical implementation of precision medicine, which lack consensus. Lastly, we discuss ongoing scientific efforts in this arena, which may yield future clinical applications. © 2016 Wiley Periodicals, Inc.
Precision medicine in pediatric oncology: Lessons learned and next steps
Mody, Rajen J.; Prensner, John R.; Everett, Jessica; Parsons, D. Williams; Chinnaiyan, Arul M.
2017-01-01
The maturation of genomic technologies has enabled new discoveries in disease pathogenesis as well as new approaches to patient care. In pediatric oncology, patients may now receive individualized genomic analysis to identify molecular aberrations of relevance for diagnosis and/or treatment. In this context, several recent clinical studies have begun to explore the feasibility and utility of genomics-driven precision medicine. Here, we review the major developments in this field, discuss current limitations, and explore aspects of the clinical implementation of precision medicine, which lack consensus. Lastly, we discuss ongoing scientific efforts in this arena, which may yield future clinical applications. PMID:27748023
Deciphering the genomic targets of alkylating polyamide conjugates using high-throughput sequencing
Chandran, Anandhakumar; Syed, Junetha; Taylor, Rhys D.; Kashiwazaki, Gengo; Sato, Shinsuke; Hashiya, Kaori; Bando, Toshikazu; Sugiyama, Hiroshi
2016-01-01
Chemically engineered small molecules targeting specific genomic sequences play an important role in drug development research. Pyrrole-imidazole polyamides (PIPs) are a group of molecules that can bind to the DNA minor-groove and can be engineered to target specific sequences. Their biological effects rely primarily on their selective DNA binding. However, the binding mechanism of PIPs at the chromatinized genome level is poorly understood. Herein, we report a method using high-throughput sequencing to identify the DNA-alkylating sites of PIP-indole-seco-CBI conjugates. High-throughput sequencing analysis of conjugate 2 showed highly similar DNA-alkylating sites on synthetic oligos (histone-free DNA) and on human genomes (chromatinized DNA context). To our knowledge, this is the first report identifying alkylation sites across genomic DNA by alkylating PIP conjugates using high-throughput sequencing. PMID:27098039
Positional orthology: putting genomic evolutionary relationships into context.
Dewey, Colin N
2011-09-01
Orthology is a powerful refinement of homology that allows us to describe more precisely the evolution of genomes and understand the function of the genes they contain. However, because orthology is not concerned with genomic position, it is limited in its ability to describe genes that are likely to have equivalent roles in different genomes. Because of this limitation, the concept of 'positional orthology' has emerged, which describes the relation between orthologous genes that retain their ancestral genomic positions. In this review, we formally define this concept, for which we introduce the shorter term 'toporthology', with respect to the evolutionary events experienced by a gene's ancestors. Through a discussion of recent studies on the role of genomic context in gene evolution, we show that the distinction between orthology and toporthology is biologically significant. We then review a number of orthology prediction methods that take genomic context into account and thus that may be used to infer the important relation of toporthology.
Positional orthology: putting genomic evolutionary relationships into context
2011-01-01
Orthology is a powerful refinement of homology that allows us to describe more precisely the evolution of genomes and understand the function of the genes they contain. However, because orthology is not concerned with genomic position, it is limited in its ability to describe genes that are likely to have equivalent roles in different genomes. Because of this limitation, the concept of ‘positional orthology’ has emerged, which describes the relation between orthologous genes that retain their ancestral genomic positions. In this review, we formally define this concept, for which we introduce the shorter term ‘toporthology’, with respect to the evolutionary events experienced by a gene’s ancestors. Through a discussion of recent studies on the role of genomic context in gene evolution, we show that the distinction between orthology and toporthology is biologically significant. We then review a number of orthology prediction methods that take genomic context into account and thus that may be used to infer the important relation of toporthology. PMID:21705766
Analysis of Multiallelic CNVs by Emulsion Haplotype Fusion PCR.
Tyson, Jess; Armour, John A L
2017-01-01
Emulsion-fusion PCR recovers long-range sequence information by combining products in cis from individual genomic DNA molecules. Emulsion droplets act as very numerous small reaction chambers in which different PCR products from a single genomic DNA molecule are condensed into short joint products, to unite sequences in cis from widely separated genomic sites. These products can therefore provide information about the arrangement of sequences and variants at a larger scale than established long-read sequencing methods. The method has been useful in defining the phase of variants in haplotypes, the typing of inversions, and determining the configuration of sequence variants in multiallelic CNVs. In this description we outline the rationale for the application of emulsion-fusion PCR methods to the analysis of multiallelic CNVs, and give practical details for our own implementation of the method in that context.
Gene expression profiling--Opening the black box of plant ecosystem responses to global change
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leakey, A.D.B.; Ainsworth, E.A.; Bernard, S.M.
The use of genomic techniques to address ecological questions is emerging as the field of genomic ecology. Experimentation under environmentally realistic conditions to investigate the molecular response of plants to meaningful changes in growth conditions and ecological interactions is the defining feature of genomic ecology. Since the impact of global change factors on plant performance are mediated by direct effects at the molecular, biochemical and physiological scales, gene expression analysis promises important advances in understanding factors that have previously been consigned to the 'black box' of unknown mechanism. Various tools and approaches are available for assessing gene expression in modelmore » and non-model species as part of global change biology studies. Each approach has its own unique advantages and constraints. A first generation of genomic ecology studies in managed ecosystems and mesocosms have provided a testbed for the approach and have begun to reveal how the experimental design and data analysis of gene expression studies can be tailored for use in an ecological context.« less
Epigenetics of human papillomaviruses
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johannsen, Eric; Department of Medicine, School of Medicine and Public Health, University of Wisconsin, Madison, WI 53706; McArdle Laboratory for Cancer Research, School of Medicine and Public Health, University of Wisconsin, Madison, WI 53706
Human papilllomaviruses (HPVs) are common human pathogens that infect cutaneous or mucosal epithelia in which they cause warts, self-contained benign lesions that commonly regress. The HPV life cycle is intricately tied to the differentiation of the host epithelium it infects. Mucosotropic HPVs are the most common sexually transmitted pathogen known to mankind. A subset of the mucosotropic HPVs, so-called high risk HPVs, is etiologically associated with numerous cancers of the anogenital tract, most notably the cervix, as well as a growing fraction of head and neck cancers. In these cancers, the HPV genome, which normally exists an a double stranded,more » circular, nuclear plasmid, is commonly found integrated into the host genome and expresses two viral oncogenes, E6 and E7, that are implicated in the development and maintainance of the cancers caused by these high risk HPVs. Numerous studies, primarily on the high risk HPV16, have documented that the methylation status of the viral genome changes not only in the context of the viral life cycle but also in the context of the progressive neoplastic disease that culminates in cancer. In this article, we summarize the knowledge gained from those studies. We also provide the first analysis of available ChIP-seq data on the occupancy of both epigentically modified histones as well as transcription factors on the high risk HPV18 genome in the context of HeLa cells, a cervical cancer-derived cell line that has been the subject of extensive analyses using this technique. - Highlights: • Methylation status of HPV genomes is dynamic. • Changes are seen in both the viral life cycle and neoplasia. • Histone modification status at LCR is predictive of transcription factor occupancy. • Novel transcription factor binding noted by ChIP-seq.« less
Discovering novel subsystems using comparative genomics
Ferrer, Luciana; Shearer, Alexander G.; Karp, Peter D.
2011-01-01
Motivation: Key problems for computational genomics include discovering novel pathways in genome data, and discovering functional interaction partners for genes to define new members of partially elucidated pathways. Results: We propose a novel method for the discovery of subsystems from annotated genomes. For each gene pair, a score measuring the likelihood that the two genes belong to a same subsystem is computed using genome context methods. Genes are then grouped based on these scores, and the resulting groups are filtered to keep only high-confidence groups. Since the method is based on genome context analysis, it relies solely on structural annotation of the genomes. The method can be used to discover new pathways, find missing genes from a known pathway, find new protein complexes or other kinds of functional groups and assign function to genes. We tested the accuracy of our method in Escherichia coli K-12. In one configuration of the system, we find that 31.6% of the candidate groups generated by our method match a known pathway or protein complex closely, and that we rediscover 31.2% of all known pathways and protein complexes of at least 4 genes. We believe that a significant proportion of the candidates that do not match any known group in E.coli K-12 corresponds to novel subsystems that may represent promising leads for future laboratory research. We discuss in-depth examples of these findings. Availability: Predicted subsystems are available at http://brg.ai.sri.com/pwy-discovery/journal.html. Contact: lferrer@ai.sri.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21775308
Snitkin, Evan S; Won, Sarah; Pirani, Ali; Lapp, Zena; Weinstein, Robert A; Lolans, Karen; Hayden, Mary K
2017-11-22
Development of effective strategies to limit the proliferation of multidrug-resistant organisms requires a thorough understanding of how such organisms spread among health care facilities. We sought to uncover the chains of transmission underlying a 2008 U.S. regional outbreak of carbapenem-resistant Klebsiella pneumoniae by performing an integrated analysis of genomic and interfacility patient-transfer data. Genomic analysis yielded a high-resolution transmission network that assigned directionality to regional transmission events and discriminated between intra- and interfacility transmission when epidemiologic data were ambiguous or misleading. Examining the genomic transmission network in the context of interfacility patient transfers (patient-sharing networks) supported the role of patient transfers in driving the outbreak, with genomic analysis revealing that a small subset of patient-transfer events was sufficient to explain regional spread. Further integration of the genomic and patient-sharing networks identified one nursing home as an important bridge facility early in the outbreak-a role that was not apparent from analysis of genomic or patient-transfer data alone. Last, we found that when simulating a real-time regional outbreak, our methodology was able to accurately infer the facility at which patients acquired their infections. This approach has the potential to identify facilities with high rates of intra- or interfacility transmission, data that will be useful for triggering targeted interventions to prevent further spread of multidrug-resistant organisms. Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Scribl: an HTML5 Canvas-based graphics library for visualizing genomic data over the web.
Miller, Chase A; Anthony, Jon; Meyer, Michelle M; Marth, Gabor
2013-02-01
High-throughput biological research requires simultaneous visualization as well as analysis of genomic data, e.g. read alignments, variant calls and genomic annotations. Traditionally, such integrative analysis required desktop applications operating on locally stored data. Many current terabyte-size datasets generated by large public consortia projects, however, are already only feasibly stored at specialist genome analysis centers. As even small laboratories can afford very large datasets, local storage and analysis are becoming increasingly limiting, and it is likely that most such datasets will soon be stored remotely, e.g. in the cloud. These developments will require web-based tools that enable users to access, analyze and view vast remotely stored data with a level of sophistication and interactivity that approximates desktop applications. As rapidly dropping cost enables researchers to collect data intended to answer questions in very specialized contexts, developers must also provide software libraries that empower users to implement customized data analyses and data views for their particular application. Such specialized, yet lightweight, applications would empower scientists to better answer specific biological questions than possible with general-purpose genome browsers currently available. Using recent advances in core web technologies (HTML5), we developed Scribl, a flexible genomic visualization library specifically targeting coordinate-based data such as genomic features, DNA sequence and genetic variants. Scribl simplifies the development of sophisticated web-based graphical tools that approach the dynamism and interactivity of desktop applications. Software is freely available online at http://chmille4.github.com/Scribl/ and is implemented in JavaScript with all modern browsers supported.
The Genome of Naegleria gruberi Illuminates Early Eukaryotic Versatility
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fritz-Laylin, Lillian K.; Prochnik, Simon E.; Ginger, Michael L.
2010-03-01
Genome sequences of diverse free-living protists are essential for understanding eukaryotic evolution and molecular and cell biology. The free-living amoeboflagellate Naegleria gruberi belongs to a varied and ubiquitous protist clade (Heterolobosea) that diverged from other eukaryotic lineages over a billion years ago. Analysis of the 15,727 protein-coding genes encoded by Naegleria's 41 Mb nuclear genome indicates a capacity for both aerobic respiration and anaerobic metabolism with concomitant hydrogen production, with fundamental implications for the evolution of organelle metabolism. The Naegleria genome facilitates substantially broader phylogenomic comparisons of free-living eukaryotes than previously possible, allowing us to identify thousands of genes likelymore » present in the pan-eukaryotic ancestor, with 40% likely eukaryotic inventions. Moreover, we construct a comprehensive catalog of amoeboid-motility genes. The Naegleria genome, analyzed in the context of other protists, reveals a remarkably complex ancestral eukaryote with a rich repertoire of cytoskeletal, sexual, signaling, and metabolic modules.« less
Ou, Hong-Yu; He, Xinyi; Harrison, Ewan M.; Kulasekara, Bridget R.; Thani, Ali Bin; Kadioglu, Aras; Lory, Stephen; Hinton, Jay C. D.; Barer, Michael R.; Rajakumar, Kumar
2007-01-01
MobilomeFINDER (http://mml.sjtu.edu.cn/MobilomeFINDER) is an interactive online tool that facilitates bacterial genomic island or ‘mobile genome’ (mobilome) discovery; it integrates the ArrayOme and tRNAcc software packages. ArrayOme utilizes a microarray-derived comparative genomic hybridization input data set to generate ‘inferred contigs’ produced by merging adjacent genes classified as ‘present’. Collectively these ‘fragments’ represent a hypothetical ‘microarray-visualized genome (MVG)’. ArrayOme permits recognition of discordances between physical genome and MVG sizes, thereby enabling identification of strains rich in microarray-elusive novel genes. Individual tRNAcc tools facilitate automated identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites and other integration hotspots in closely related sequenced genomes. Accessory tools facilitate design of hotspot-flanking primers for in silico and/or wet-science-based interrogation of cognate loci in unsequenced strains and analysis of islands for features suggestive of foreign origins; island-specific and genome-contextual features are tabulated and represented in schematic and graphical forms. To date we have used MobilomeFINDER to analyse several Enterobacteriaceae, Pseudomonas aeruginosa and Streptococcus suis genomes. MobilomeFINDER enables high-throughput island identification and characterization through increased exploitation of emerging sequence data and PCR-based profiling of unsequenced test strains; subsequent targeted yeast recombination-based capture permits full-length sequencing and detailed functional studies of novel genomic islands. PMID:17537813
Vallenet, David; Calteau, Alexandra; Cruveiller, Stéphane; Gachet, Mathieu; Lajus, Aurélie; Josso, Adrien; Mercier, Jonathan; Renaux, Alexandre; Rollin, Johan; Rouy, Zoe; Roche, David; Scarpelli, Claude; Médigue, Claudine
2017-01-01
The annotation of genomes from NGS platforms needs to be automated and fully integrated. However, maintaining consistency and accuracy in genome annotation is a challenging problem because millions of protein database entries are not assigned reliable functions. This shortcoming limits the knowledge that can be extracted from genomes and metabolic models. Launched in 2005, the MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Effective comparative analysis requires a consistent and complete view of biological data, and therefore, support for reviewing the quality of functional annotation is critical. MicroScope allows users to analyze microbial (meta)genomes together with post-genomic experiment results if any (i.e. transcriptomics, re-sequencing of evolved strains, mutant collections, phenotype data). It combines tools and graphical interfaces to analyze genomes and to perform the expert curation of gene functions in a comparative context. Starting with a short overview of the MicroScope system, this paper focuses on some major improvements of the Web interface, mainly for the submission of genomic data and on original tools and pipelines that have been developed and integrated in the platform: computation of pan-genomes and prediction of biosynthetic gene clusters. Today the resource contains data for more than 6000 microbial genomes, and among the 2700 personal accounts (65% of which are now from foreign countries), 14% of the users are performing expert annotations, on at least a weekly basis, contributing to improve the quality of microbial genome annotations. PMID:27899624
Genomics and transcriptomics in drug discovery.
Dopazo, Joaquin
2014-02-01
The popularization of genomic high-throughput technologies is causing a revolution in biomedical research and, particularly, is transforming the field of drug discovery. Systems biology offers a framework to understand the extensive human genetic heterogeneity revealed by genomic sequencing in the context of the network of functional, regulatory and physical protein-drug interactions. Thus, approaches to find biomarkers and therapeutic targets will have to take into account the complex system nature of the relationships of the proteins with the disease. Pharmaceutical companies will have to reorient their drug discovery strategies considering the human genetic heterogeneity. Consequently, modeling and computational data analysis will have an increasingly important role in drug discovery. Copyright © 2013 Elsevier Ltd. All rights reserved.
Patient perspectives on whole-genome sequencing for undiagnosed diseases.
Boeldt, Debra L; Cheung, Cynthia; Ariniello, Lauren; Darst, Burcu F; Topol, Sarah; Schork, Nicholas J; Philis-Tsimikas, Athena; Torkamani, Ali; Fortmann, Addie L; Bloss, Cinnamon S
2017-01-01
This study assessed perspectives on whole-genome sequencing (WGS) for rare disease diagnosis and the process of receiving genetic results. Semistructured interviews were conducted with adult patients and parents of minor patients affected by idiopathic diseases (n = 10 cases). Three main themes were identified through qualitative data analysis and interpretation: perceived benefits of WGS; perceived drawbacks of WGS; and perceptions of the return of results from WGS. Findings suggest that patients and their families have important perspectives on the use of WGS in diagnostic odyssey cases. These perspectives could inform clinical sequencing research study designs as well as the appropriate deployment of patient and family support services in the context of clinical genome sequencing.
Phillips, Anastasia; Sotomayor, Cristina; Wang, Qinning; Holmes, Nadine; Furlong, Catriona; Ward, Kate; Howard, Peter; Octavia, Sophie; Lan, Ruiting; Sintchenko, Vitali
2016-09-15
Salmonella Typhimurium (STM) is an important cause of foodborne outbreaks worldwide. Subtyping of STM remains critical to outbreak investigation, yet current techniques (e.g. multilocus variable number tandem repeat analysis, MLVA) may provide insufficient discrimination. Whole genome sequencing (WGS) offers potentially greater discriminatory power to support infectious disease surveillance. We performed WGS on 62 STM isolates of a single, endemic MLVA type associated with two epidemiologically independent, food-borne outbreaks along with sporadic cases in New South Wales, Australia, during 2014. Genomes of case and environmental isolates were sequenced using HiSeq (Illumina) and the genetic distance between them was assessed by single nucleotide polymorphism (SNP) analysis. SNP analysis was compared to the epidemiological context. The WGS analysis supported epidemiological evidence and genomes of within-outbreak isolates were nearly identical. Sporadic cases differed from outbreak cases by a small number of SNPs, although their close relationship to outbreak cases may represent an unidentified common food source that may warrant further public health follow up. Previously unrecognised mini-clusters were detected. WGS of STM can discriminate foodborne community outbreaks within a single endemic MLVA clone. Our findings support the translation of WGS into public health laboratory surveillance of salmonellosis.
Gherghe, Cristina; Lombo, Tania; Leonard, Christopher W.; Datta, Siddhartha A. K.; Bess, Julian W.; Gorelick, Robert J.; Rein, Alan; Weeks, Kevin M.
2010-01-01
All retroviral genomic RNAs contain a cis-acting packaging signal by which dimeric genomes are selectively packaged into nascent virions. However, it is not understood how Gag (the viral structural protein) interacts with these signals to package the genome with high selectivity. We probed the structure of murine leukemia virus RNA inside virus particles using SHAPE, a high-throughput RNA structure analysis technology. These experiments showed that NC (the nucleic acid binding domain derived from Gag) binds within the virus to the sequence UCUG-UR-UCUG. Recombinant Gag and NC proteins bound to this same RNA sequence in dimeric RNA in vitro; in all cases, interactions were strongest with the first U and final G in each UCUG element. The RNA structural context is critical: High-affinity binding requires base-paired regions flanking this motif, and two UCUG-UR-UCUG motifs are specifically exposed in the viral RNA dimer. Mutating the guanosine residues in these two motifs—only four nucleotides per genomic RNA—reduced packaging 100-fold, comparable to the level of nonspecific packaging. These results thus explain the selective packaging of dimeric RNA. This paradigm has implications for RNA recognition in general, illustrating how local context and RNA structure can create information-rich recognition signals from simple single-stranded sequence elements in large RNAs. PMID:20974908
Rousseau-Gueutin, M; Bellot, S; Martin, G E; Boutte, J; Chelaifa, H; Lima, O; Michon-Coudouel, S; Naquin, D; Salmon, A; Ainouche, K; Ainouche, M
2015-12-01
The history of many plant lineages is complicated by reticulate evolution with cases of hybridization often followed by genome duplication (allopolyploidy). In such a context, the inference of phylogenetic relationships and biogeographic scenarios based on molecular data is easier using haploid markers like chloroplast genome sequences. Hybridization and polyploidization occurred recurrently in the genus Spartina (Poaceae, Chloridoideae), as illustrated by the recent formation of the invasive allododecaploid S. anglica during the 19th century in Europe. Until now, only a few plastid markers were available to explore the history of this genus and their low variability limited the resolution of species relationships. We sequenced the complete chloroplast genome (plastome) of S. maritima, the native European parent of S. anglica, and compared it to the plastomes of other Poaceae. Our analysis revealed the presence of fast-evolving regions of potential taxonomic, phylogeographic and phylogenetic utility at various levels within the Poaceae family. Using secondary calibrations, we show that the tetraploid and hexaploid lineages of Spartina diverged 6-10 my ago, and that the two parents of the invasive allopolyploid S. anglica separated 2-4 my ago via long distance dispersal of the ancestor of S. maritima over the Atlantic Ocean. Finally, we discuss the meaning of divergence times between chloroplast genomes in the context of reticulate evolution. Copyright © 2015 Elsevier Inc. All rights reserved.
A Plant-Associated Microbe Genome Initiative
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jan E. Leach; Scott Gold; Sue Tolin
2003-03-06
Plant-associated microorganisms are critical to agricultural and food security and are key components in maintaining the balance of our ecosystems. Some of these diverse microbes, which include viruses, bacteria, oomycetes, fungi, and nematodes, cause plant diseases, whereas others prevent diseases or enhance plant growth. Despite their importance, we know little about them on a genomic level. To intervene in disease and understand the basis of biological control or symbiotic relationships, a concerted and coordinated genomic analysis of these microbes is essential. Genome analysis, in this context, refers to the structural and functional analysis of the microbe DNA including the genes,more » the proteins encoded by those genes, as well as noncoding sequences involved in genome dynamics and function. The ultimate emphasis is on understanding genomic functions involved in plant associations. Members of The American Phytopathological Society (APS) developed a prioritized list of plant-associated microbes for genome analysis. With this list as a foundation for discussions, a Workshop on Genomic Analysis of Plant-Associated Microorganisms was held in Washington, D.C., on 9 to 11 April 2002. The workshop was organized by the Public Policy Board of APS, and was funded by the Department of Energy (DOE), the National Science Foundation (NSF), U.S. Department of Agriculture-Agricultural Research Service (USDA-ARS), and USDA-National Research Initiatives (USDA-NRI). The workshop included academic, industrial, and governmental experts from the genomics and microbial research communities and observers from the federal funding agencies. After reviewing current and near-term technologies, workshop participants proposed a comprehensive, international initiative to obtain the genomic information needed to understand these important microbes and their interactions with host plants and the environment. Specifically, the recommendations call for a 5-year, $500 million international public effort for genome analysis of plant-associated microbes. The goals are to (i) obtain genome sequence information for several representative groups of microbes; (ii) identify and determine function for the genes/proteins and other genomic elements involved in plant-microbe interactions; (iii) develop and implement standardized bioinformatic tools and a database system that is applicable across all microbes; and (iv) educate and train scientists with skills and knowledge of biological and computational sciences who will apply the information to the protection of our food sources and environment.« less
Methods for Genome-Wide Analysis of Gene Expression Changes in Polyploids
Wang, Jianlin; Lee, Jinsuk J.; Tian, Lu; Lee, Hyeon-Se; Chen, Meng; Rao, Sheetal; Wei, Edward N.; Doerge, R. W.; Comai, Luca; Jeffrey Chen, Z.
2007-01-01
Polyploidy is an evolutionary innovation, providing extra sets of genetic material for phenotypic variation and adaptation. It is predicted that changes of gene expression by genetic and epigenetic mechanisms are responsible for novel variation in nascent and established polyploids (Liu and Wendel, 2002; Osborn et al., 2003; Pikaard, 2001). Studying gene expression changes in allopolyploids is more complicated than in autopolyploids, because allopolyploids contain more than two sets of genomes originating from divergent, but related, species. Here we describe two methods that are applicable to the genome-wide analysis of gene expression differences resulting from genome duplication in autopolyploids or interactions between homoeologous genomes in allopolyploids. First, we describe an amplified fragment length polymorphism (AFLP)–complementary DNA (cDNA) display method that allows the discrimination of homoeologous loci based on restriction polymorphisms between the progenitors. Second, we describe microarray analyses that can be used to compare gene expression differences between the allopolyploids and respective progenitors using appropriate experimental design and statistical analysis. We demonstrate the utility of these two complementary methods and discuss the pros and cons of using the methods to analyze gene expression changes in autopolyploids and allopolyploids. Furthermore, we describe these methods in general terms to be of wider applicability for comparative gene expression in a variety of evolutionary, genetic, biological, and physiological contexts. PMID:15865985
MicroScope: a platform for microbial genome annotation and comparative genomics
Vallenet, D.; Engelen, S.; Mornico, D.; Cruveiller, S.; Fleury, L.; Lajus, A.; Rouy, Z.; Roche, D.; Salvignol, G.; Scarpelli, C.; Médigue, C.
2009-01-01
The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope’s rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of microbial genome annotation, especially for genomes initially analyzed by automatic procedures alone. Database URLs: http://www.genoscope.cns.fr/agc/mage and http://www.genoscope.cns.fr/agc/microcyc PMID:20157493
MicroScope: a platform for microbial genome annotation and comparative genomics.
Vallenet, D; Engelen, S; Mornico, D; Cruveiller, S; Fleury, L; Lajus, A; Rouy, Z; Roche, D; Salvignol, G; Scarpelli, C; Médigue, C
2009-01-01
The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope's rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of microbial genome annotation, especially for genomes initially analyzed by automatic procedures alone.Database URLs: http://www.genoscope.cns.fr/agc/mage and http://www.genoscope.cns.fr/agc/microcyc.
NASA Technical Reports Server (NTRS)
Blakely, Randy D. (Inventor); Robertson, David (Inventor)
2006-01-01
Isolated polynucleotide molecules and peptides encoded by these molecules are used in the analysis of human norepinephrine (NE) transporter variants, as well as in diagnostic and therapeutic applications, relating to a human NE transporter polymorphism. By analyzing genomic DNA or amplified genomic DNA, or amplified cDNA derived from mRNA, it is possible to type a human NE transporter with regard to the human NE transporter polymorphism, for example, in the context of diagnosing and treating NE transport impairments, and disorders associated with NE transport impairments, such as orthostatic intolerance.
Parson, Walther; Strobl, Christina; Huber, Gabriela; Zimmermann, Bettina; Gomes, Sibylle M.; Souto, Luis; Fendt, Liane; Delport, Rhena; Langit, Reina; Wootton, Sharon; Lagacé, Robert; Irwin, Jodi
2013-01-01
Insights into the human mitochondrial phylogeny have been primarily achieved by sequencing full mitochondrial genomes (mtGenomes). In forensic genetics (partial) mtGenome information can be used to assign haplotypes to their phylogenetic backgrounds, which may, in turn, have characteristic geographic distributions that would offer useful information in a forensic case. In addition and perhaps even more relevant in the forensic context, haplogroup-specific patterns of mutations form the basis for quality control of mtDNA sequences. The current method for establishing (partial) mtDNA haplotypes is Sanger-type sequencing (STS), which is laborious, time-consuming, and expensive. With the emergence of Next Generation Sequencing (NGS) technologies, the body of available mtDNA data can potentially be extended much more quickly and cost-efficiently. Customized chemistries, laboratory workflows and data analysis packages could support the community and increase the utility of mtDNA analysis in forensics. We have evaluated the performance of mtGenome sequencing using the Personal Genome Machine (PGM) and compared the resulting haplotypes directly with conventional Sanger-type sequencing. A total of 64 mtGenomes (>1 million bases) were established that yielded high concordance with the corresponding STS haplotypes (<0.02% differences). About two-thirds of the differences were observed in or around homopolymeric sequence stretches. In addition, the sequence alignment algorithm employed to align NGS reads played a significant role in the analysis of the data and the resulting mtDNA haplotypes. Further development of alignment software would be desirable to facilitate the application of NGS in mtDNA forensic genetics. PMID:23948325
Tiengwe, Calvin; Marcello, Lucio; Farr, Helen; Dickens, Nicholas; Kelly, Steven; Swiderski, Michal; Vaughan, Diane; Gull, Keith; Barry, J. David; Bell, Stephen D.; McCulloch, Richard
2012-01-01
Summary Identification of replication initiation sites, termed origins, is a crucial step in understanding genome transmission in any organism. Transcription of the Trypanosoma brucei genome is highly unusual, with each chromosome comprising a few discrete transcription units. To understand how DNA replication occurs in the context of such organization, we have performed genome-wide mapping of the binding sites of the replication initiator ORC1/CDC6 and have identified replication origins, revealing that both localize to the boundaries of the transcription units. A remarkably small number of active origins is seen, whose spacing is greater than in any other eukaryote. We show that replication and transcription in T. brucei have a profound functional overlap, as reducing ORC1/CDC6 levels leads to genome-wide increases in mRNA levels arising from the boundaries of the transcription units. In addition, ORC1/CDC6 loss causes derepression of silent Variant Surface Glycoprotein genes, which are critical for host immune evasion. PMID:22840408
Sharma, Abhay
2015-11-01
New discoveries are increasingly demanding integration of epigenetics, molecular biology, genomic networks and physiology with evolution. This article provides a proof of concept for evolutionary transgenerational systems biology, proposed recently in the context of epigenetic inheritance in mammals. Gene set enrichment analysis of available genome-level mammalian data presented here seem consistent with the concept that: (1) heritable information about environmental effects in somatic cells is communicated to the germline by circulating microRNAs (miRNAs) or other RNAs released in physiological fluids; (2) epigenetic factors including miRNA-like small RNAs, DNA methylation and histone modifications are propagated across generations via gene networks; and (3) inherited epigenetic variations in the form of methylated cytosines are fixed in the population as thymines over the evolutionary time course. The analysis supports integration of physiology and epigenetics with inheritance and evolution. This may catalyze efforts to develop a unified theory of biology. © 2015. Published by The Company of Biologists Ltd.
A new tool called DISSECT for analysing large genomic data sets using a Big Data approach
Canela-Xandri, Oriol; Law, Andy; Gray, Alan; Woolliams, John A.; Tenesa, Albert
2015-01-01
Large-scale genetic and genomic data are increasingly available and the major bottleneck in their analysis is a lack of sufficiently scalable computational tools. To address this problem in the context of complex traits analysis, we present DISSECT. DISSECT is a new and freely available software that is able to exploit the distributed-memory parallel computational architectures of compute clusters, to perform a wide range of genomic and epidemiologic analyses, which currently can only be carried out on reduced sample sizes or under restricted conditions. We demonstrate the usefulness of our new tool by addressing the challenge of predicting phenotypes from genotype data in human populations using mixed-linear model analysis. We analyse simulated traits from 470,000 individuals genotyped for 590,004 SNPs in ∼4 h using the combined computational power of 8,400 processor cores. We find that prediction accuracies in excess of 80% of the theoretical maximum could be achieved with large sample sizes. PMID:26657010
Boycott, Kym; Hartley, Taila; Adam, Shelin; Bernier, Francois; Chong, Karen; Fernandez, Bridget A; Friedman, Jan M; Geraghty, Michael T; Hume, Stacey; Knoppers, Bartha M; Laberge, Anne-Marie; Majewski, Jacek; Mendoza-Londono, Roberto; Meyn, M Stephen; Michaud, Jacques L; Nelson, Tanya N; Richer, Julie; Sadikovic, Bekim; Skidmore, David L; Stockley, Tracy; Taylor, Sherry; van Karnebeek, Clara; Zawati, Ma'n H; Lauzon, Julie; Armour, Christine M
2015-01-01
Purpose and scope The aim of this Position Statement is to provide recommendations for Canadian medical geneticists, clinical laboratory geneticists, genetic counsellors and other physicians regarding the use of genome-wide sequencing of germline DNA in the context of clinical genetic diagnosis. This statement has been developed to facilitate the clinical translation and development of best practices for clinical genome-wide sequencing for genetic diagnosis of monogenic diseases in Canada; it does not address the clinical application of this technology in other fields such as molecular investigation of cancer or for population screening of healthy individuals. Methods of statement development Two multidisciplinary groups consisting of medical geneticists, clinical laboratory geneticists, genetic counsellors, ethicists, lawyers and genetic researchers were assembled to review existing literature and guidelines on genome-wide sequencing for clinical genetic diagnosis in the context of monogenic diseases, and to make recommendations relevant to the Canadian context. The statement was circulated for comment to the Canadian College of Medical Geneticists (CCMG) membership-at-large and, following incorporation of feedback, approved by the CCMG Board of Directors. The CCMG is a Canadian organisation responsible for certifying medical geneticists and clinical laboratory geneticists, and for establishing professional and ethical standards for clinical genetics services in Canada. Results and conclusions Recommendations include (1) clinical genome-wide sequencing is an appropriate approach in the diagnostic assessment of a patient for whom there is suspicion of a significant monogenic disease that is associated with a high degree of genetic heterogeneity, or where specific genetic tests have failed to provide a diagnosis; (2) until the benefits of reporting incidental findings are established, we do not endorse the intentional clinical analysis of disease-associated genes other than those linked to the primary indication; and (3) clinicians should provide genetic counselling and obtain informed consent prior to undertaking clinical genome-wide sequencing. Counselling should include discussion of the limitations of testing, likelihood and implications of diagnosis and incidental findings, and the potential need for further analysis to facilitate clinical interpretation, including studies performed in a research setting. These recommendations will be routinely re-evaluated as knowledge of diagnostic and clinical utility of clinical genome-wide sequencing improves. While the document was developed to direct practice in Canada, the applicability of the statement is broader and will be of interest to clinicians and health jurisdictions internationally. PMID:25951830
Boycott, Kym; Hartley, Taila; Adam, Shelin; Bernier, Francois; Chong, Karen; Fernandez, Bridget A; Friedman, Jan M; Geraghty, Michael T; Hume, Stacey; Knoppers, Bartha M; Laberge, Anne-Marie; Majewski, Jacek; Mendoza-Londono, Roberto; Meyn, M Stephen; Michaud, Jacques L; Nelson, Tanya N; Richer, Julie; Sadikovic, Bekim; Skidmore, David L; Stockley, Tracy; Taylor, Sherry; van Karnebeek, Clara; Zawati, Ma'n H; Lauzon, Julie; Armour, Christine M
2015-07-01
The aim of this Position Statement is to provide recommendations for Canadian medical geneticists, clinical laboratory geneticists, genetic counsellors and other physicians regarding the use of genome-wide sequencing of germline DNA in the context of clinical genetic diagnosis. This statement has been developed to facilitate the clinical translation and development of best practices for clinical genome-wide sequencing for genetic diagnosis of monogenic diseases in Canada; it does not address the clinical application of this technology in other fields such as molecular investigation of cancer or for population screening of healthy individuals. Two multidisciplinary groups consisting of medical geneticists, clinical laboratory geneticists, genetic counsellors, ethicists, lawyers and genetic researchers were assembled to review existing literature and guidelines on genome-wide sequencing for clinical genetic diagnosis in the context of monogenic diseases, and to make recommendations relevant to the Canadian context. The statement was circulated for comment to the Canadian College of Medical Geneticists (CCMG) membership-at-large and, following incorporation of feedback, approved by the CCMG Board of Directors. The CCMG is a Canadian organisation responsible for certifying medical geneticists and clinical laboratory geneticists, and for establishing professional and ethical standards for clinical genetics services in Canada. Recommendations include (1) clinical genome-wide sequencing is an appropriate approach in the diagnostic assessment of a patient for whom there is suspicion of a significant monogenic disease that is associated with a high degree of genetic heterogeneity, or where specific genetic tests have failed to provide a diagnosis; (2) until the benefits of reporting incidental findings are established, we do not endorse the intentional clinical analysis of disease-associated genes other than those linked to the primary indication; and (3) clinicians should provide genetic counselling and obtain informed consent prior to undertaking clinical genome-wide sequencing. Counselling should include discussion of the limitations of testing, likelihood and implications of diagnosis and incidental findings, and the potential need for further analysis to facilitate clinical interpretation, including studies performed in a research setting. These recommendations will be routinely re-evaluated as knowledge of diagnostic and clinical utility of clinical genome-wide sequencing improves. While the document was developed to direct practice in Canada, the applicability of the statement is broader and will be of interest to clinicians and health jurisdictions internationally. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Ensembl genomes 2016: more genomes, more complexity
USDA-ARS?s Scientific Manuscript database
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent...
Microfluidics for genome-wide studies involving next generation sequencing
Murphy, Travis W.; Lu, Chang
2017-01-01
Next-generation sequencing (NGS) has revolutionized how molecular biology studies are conducted. Its decreasing cost and increasing throughput permit profiling of genomic, transcriptomic, and epigenomic features for a wide range of applications. Microfluidics has been proven to be highly complementary to NGS technology with its unique capabilities for handling small volumes of samples and providing platforms for automation, integration, and multiplexing. In this article, we review recent progress on applying microfluidics to facilitate genome-wide studies. We emphasize on several technical aspects of NGS and how they benefit from coupling with microfluidic technology. We also summarize recent efforts on developing microfluidic technology for genomic, transcriptomic, and epigenomic studies, with emphasis on single cell analysis. We envision rapid growth in these directions, driven by the needs for testing scarce primary cell samples from patients in the context of precision medicine. PMID:28396707
Draft genome sequence of the silver pomfret fish, Pampus argenteus.
AlMomin, Sabah; Kumar, Vinod; Al-Amad, Sami; Al-Hussaini, Mohsen; Dashti, Talal; Al-Enezi, Khaznah; Akbar, Abrar
2016-01-01
Silver pomfret, Pampus argenteus, is a fish species from coastal waters. Despite its high commercial value, this edible fish has not been sequenced. Hence, its genetic and genomic studies have been limited. We report the first draft genome sequence of the silver pomfret obtained using a Next Generation Sequencing (NGS) technology. We assembled 38.7 Gb of nucleotides into scaffolds of 350 Mb with N50 of about 1.5 kb, using high quality paired end reads. These scaffolds represent 63.7% of the estimated silver pomfret genome length. The newly sequenced and assembled genome has 11.06% repetitive DNA regions, and this percentage is comparable to that of the tilapia genome. The genome analysis predicted 16 322 genes. About 91% of these genes showed homology with known proteins. Many gene clusters were annotated to protein and fatty-acid metabolism pathways that may be important in the context of the meat texture and immune system developmental processes. The reference genome can pave the way for the identification of many other genomic features that could improve breeding and population-management strategies, and it can also help characterize the genetic diversity of P. argenteus.
Smith, Jennifer A; Zhao, Wei; Yasutake, Kalyn; August, Carmella; Ratliff, Scott M; Faul, Jessica D; Boerwinkle, Eric; Chakravarti, Aravinda; Diez Roux, Ana V; Gao, Yan; Griswold, Michael E; Heiss, Gerardo; Kardia, Sharon L R; Morrison, Alanna C; Musani, Solomon K; Mwasongwe, Stanford; North, Kari E; Rose, Kathryn M; Sims, Mario; Sun, Yan V; Weir, David R; Needham, Belinda L
2017-12-18
Inter-individual variability in blood pressure (BP) is influenced by both genetic and non-genetic factors including socioeconomic and psychosocial stressors. A deeper understanding of the gene-by-socioeconomic/psychosocial factor interactions on BP may help to identify individuals that are genetically susceptible to high BP in specific social contexts. In this study, we used a genomic region-based method for longitudinal analysis, Longitudinal Gene-Environment-Wide Interaction Studies (LGEWIS), to evaluate the effects of interactions between known socioeconomic/psychosocial and genetic risk factors on systolic and diastolic BP in four large epidemiologic cohorts of European and/or African ancestry. After correction for multiple testing, two interactions were significantly associated with diastolic BP. In European ancestry participants, outward/trait anger score had a significant interaction with the C10orf107 genomic region ( p = 0.0019). In African ancestry participants, depressive symptom score had a significant interaction with the HFE genomic region ( p = 0.0048). This study provides a foundation for using genomic region-based longitudinal analysis to identify subgroups of the population that may be at greater risk of elevated BP due to the combined influence of genetic and socioeconomic/psychosocial risk factors.
Adrenocortical carcinoma: the dawn of a new era of genomic and molecular biology analysis.
Armignacco, R; Cantini, G; Canu, L; Poli, G; Ercolino, T; Mannelli, M; Luconi, M
2018-05-01
Over the last decade, the development of novel and high penetrance genomic approaches to analyze biological samples has provided very new insights in the comprehension of the molecular biology and genetics of tumors. The use of these techniques, consisting of exome sequencing, transcriptome, miRNome, chromosome alteration, genome, and epigenome analysis, has also been successfully applied to adrenocortical carcinoma (ACC). In fact, the analysis of large cohorts of patients allowed the stratification of ACC with different patterns of molecular alterations, associated with different outcomes, thus providing a novel molecular classification of the malignancy to be associated with the classical pathological analysis. Improving our knowledge about ACC molecular features will result not only in a better diagnostic and prognostic accuracy, but also in the identification of more specific therapeutic targets for the development of more effective pharmacological anti-cancer approaches. In particular, the specific molecular alteration profiles identified in ACC may represent targetable events by the use of already developed or newly designed drugs enabling a better and more efficacious management of the ACC patient in the context of new frontiers of personalized precision medicine.
Godoy, Liliana; Silva-Moreno, Evelyn; Mardones, Wladimir; Guzman, Darwin; Cubillos, Francisco A; Ganga, Angélica
2017-01-01
Wine production is an important commercial issue for the liquor industry. The global production was estimated at 275.7 million hectoliters in 2015. The loss of wine production due to Brettanomyces bruxellensis contamination is currently a problem. This yeast causes a "horse sweat" flavor in wine, which is an undesired organoleptic attribute. To date, 6 B. bruxellensis annotated genome sequences are available (LAMAP2480, AWRI1499, AWRI1608, AWRI1613, ST05.12/22, and CBS2499), and whole genome comparisons between strains are limited. In this article, we reassembled and reannotated the genome of B. bruxellensis LAMAP2480, obtaining a 27-Mb assembly with 5.5 kb of N50. In addition, the genome of B. bruxellensis LAMAP2480 was analyzed in the context of spoilage yeast and potential as a biotechnological tool. In addition, we carried out an exploratory transcriptomic analysis of this strain grown in synthetic wine. Several genes related to stress tolerance, micronutrient acquisition, ethanol production, and lignocellulose assimilation were found. In conclusion, the analysis of the genome of B. bruxellensis LAMAP2480 reaffirms the biotechnological potential of this strain. This research represents an interesting platform for the study of the spoilage yeast B. bruxellensis. © 2017 S. Karger AG, Basel.
Steichen, Clara; Maluenda, Jérôme; Tosca, Lucie; Luce, Eléanor; Pineau, Dominique; Dianat, Noushin; Hannoun, Zara; Tachdjian, Gérard; Melki, Judith
2015-01-01
Human induced pluripotent stem cells (hiPSCs) hold great promise for cell therapy through their use as vital tools for regenerative and personalized medicine. However, the genomic integrity of hiPSCs still raises some concern and is one of the barriers limiting their use in clinical applications. Numerous articles have reported the occurrence of aneuploidies, copy number variations, or single point mutations in hiPSCs, and nonintegrative reprogramming strategies have been developed to minimize the impact of the reprogramming process on the hiPSC genome. Here, we report the characterization of an hiPSC line generated by daily transfections of modified messenger RNAs, displaying several genomic abnormalities. Karyotype analysis showed a complex genomic rearrangement, which remained stable during long-term culture. Fluorescent in situ hybridization analyses were performed on the hiPSC line showing that this karyotype is balanced. Interestingly, single-nucleotide polymorphism analysis revealed the presence of a large 1q region of uniparental disomy (UPD), demonstrating for the first time that UPD can occur in a noncompensatory context during nonintegrative reprogramming of normal fibroblasts. PMID:25650439
Sadhukhan, Priyanka P; Raghunathan, Anu
2014-01-01
Genome Scale Metabolic Modeling methods represent one way to compute whole cell function starting from the genome sequence of an organism and contribute towards understanding and predicting the genotype-phenotype relationship. About 80 models spanning all the kingdoms of life from archaea to eukaryotes have been built till date and used to interrogate cell phenotype under varying conditions. These models have been used to not only understand the flux distribution in evolutionary conserved pathways like glycolysis and the Krebs cycle but also in applications ranging from value added product formation in Escherichia coli to predicting inborn errors of Homo sapiens metabolism. This chapter describes a protocol that delineates the process of genome scale metabolic modeling for analysing host-pathogen behavior and interaction using flux balance analysis (FBA). The steps discussed in the process include (1) reconstruction of a metabolic network from the genome sequence, (2) its representation in a precise mathematical framework, (3) its translation to a model, and (4) the analysis using linear algebra and optimization. The methods for biological interpretations of computed cell phenotypes in the context of individual host and pathogen models and their integration are also discussed.
Scribl: an HTML5 Canvas-based graphics library for visualizing genomic data over the web
Miller, Chase A.; Anthony, Jon; Meyer, Michelle M.; Marth, Gabor
2013-01-01
Motivation: High-throughput biological research requires simultaneous visualization as well as analysis of genomic data, e.g. read alignments, variant calls and genomic annotations. Traditionally, such integrative analysis required desktop applications operating on locally stored data. Many current terabyte-size datasets generated by large public consortia projects, however, are already only feasibly stored at specialist genome analysis centers. As even small laboratories can afford very large datasets, local storage and analysis are becoming increasingly limiting, and it is likely that most such datasets will soon be stored remotely, e.g. in the cloud. These developments will require web-based tools that enable users to access, analyze and view vast remotely stored data with a level of sophistication and interactivity that approximates desktop applications. As rapidly dropping cost enables researchers to collect data intended to answer questions in very specialized contexts, developers must also provide software libraries that empower users to implement customized data analyses and data views for their particular application. Such specialized, yet lightweight, applications would empower scientists to better answer specific biological questions than possible with general-purpose genome browsers currently available. Results: Using recent advances in core web technologies (HTML5), we developed Scribl, a flexible genomic visualization library specifically targeting coordinate-based data such as genomic features, DNA sequence and genetic variants. Scribl simplifies the development of sophisticated web-based graphical tools that approach the dynamism and interactivity of desktop applications. Availability and implementation: Software is freely available online at http://chmille4.github.com/Scribl/ and is implemented in JavaScript with all modern browsers supported. Contact: gabor.marth@bc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23172864
Genetic Analysis of Pathways to Parkinson Disease
Hardy, John
2010-01-01
In this review I outline the arguments as to whether we should consider Parkinson disease one or more than one entity and discuss genetic findings from Mendelian and whole-genome association analysis in that context. I discuss what the demonstration of disease spread implies for our analysis of the genetic and epidemiologic risk factors for disease and outline the surprising fact that we now have genetically identified on the order of half our risk for developing the disease. PMID:20955928
Vallenet, David; Calteau, Alexandra; Cruveiller, Stéphane; Gachet, Mathieu; Lajus, Aurélie; Josso, Adrien; Mercier, Jonathan; Renaux, Alexandre; Rollin, Johan; Rouy, Zoe; Roche, David; Scarpelli, Claude; Médigue, Claudine
2017-01-04
The annotation of genomes from NGS platforms needs to be automated and fully integrated. However, maintaining consistency and accuracy in genome annotation is a challenging problem because millions of protein database entries are not assigned reliable functions. This shortcoming limits the knowledge that can be extracted from genomes and metabolic models. Launched in 2005, the MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Effective comparative analysis requires a consistent and complete view of biological data, and therefore, support for reviewing the quality of functional annotation is critical. MicroScope allows users to analyze microbial (meta)genomes together with post-genomic experiment results if any (i.e. transcriptomics, re-sequencing of evolved strains, mutant collections, phenotype data). It combines tools and graphical interfaces to analyze genomes and to perform the expert curation of gene functions in a comparative context. Starting with a short overview of the MicroScope system, this paper focuses on some major improvements of the Web interface, mainly for the submission of genomic data and on original tools and pipelines that have been developed and integrated in the platform: computation of pan-genomes and prediction of biosynthetic gene clusters. Today the resource contains data for more than 6000 microbial genomes, and among the 2700 personal accounts (65% of which are now from foreign countries), 14% of the users are performing expert annotations, on at least a weekly basis, contributing to improve the quality of microbial genome annotations. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Jackson, Christopher J; Norman, John E; Schnare, Murray N; Gray, Michael W; Keeling, Patrick J; Waller, Ross F
2007-01-01
Background Dinoflagellates comprise an ecologically significant and diverse eukaryotic phylum that is sister to the phylum containing apicomplexan endoparasites. The mitochondrial genome of apicomplexans is uniquely reduced in gene content and size, encoding only three proteins and two ribosomal RNAs (rRNAs) within a highly compacted 6 kb DNA. Dinoflagellate mitochondrial genomes have been comparatively poorly studied: limited available data suggest some similarities with apicomplexan mitochondrial genomes but an even more radical type of genomic organization. Here, we investigate structure, content and expression of dinoflagellate mitochondrial genomes. Results From two dinoflagellates, Crypthecodinium cohnii and Karlodinium micrum, we generated over 42 kb of mitochondrial genomic data that indicate a reduced gene content paralleling that of mitochondrial genomes in apicomplexans, i.e., only three protein-encoding genes and at least eight conserved components of the highly fragmented large and small subunit rRNAs. Unlike in apicomplexans, dinoflagellate mitochondrial genes occur in multiple copies, often as gene fragments, and in numerous genomic contexts. Analysis of cDNAs suggests several novel aspects of dinoflagellate mitochondrial gene expression. Polycistronic transcripts were found, standard start codons are absent, and oligoadenylation occurs upstream of stop codons, resulting in the absence of termination codons. Transcripts of at least one gene, cox3, are apparently trans-spliced to generate full-length mRNAs. RNA substitutional editing, a process previously identified for mRNAs in dinoflagellate mitochondria, is also implicated in rRNA expression. Conclusion The dinoflagellate mitochondrial genome shares the same gene complement and fragmentation of rRNA genes with its apicomplexan counterpart. However, it also exhibits several unique characteristics. Most notable are the expansion of gene copy numbers and their arrangements within the genome, RNA editing, loss of stop codons, and use of trans-splicing. PMID:17897476
A Nursing Informatics Research Agenda for 2008–18: Contextual Influences and Key Components
Bakken, Suzanne; Stone, Patricia W.; Larson, Elaine L.
2008-01-01
The context for nursing informatics research has changed significantly since the National Institute of Nursing Research-funded Nursing Informatics Research Agenda was published in 1993 and the Delphi study of nursing informatics research priorities reported a decade ago. The authors focus on three specific aspects of context - genomic health care, shifting research paradigms, and social (Web 2.0) technologies - that must be considered in formulating a nursing informatics research agenda. These influences are illustrated using the significant issue of healthcare associated infections (HAI). A nursing informatics research agenda for 2008–18 must expand users of interest to include interdisciplinary researchers; build upon the knowledge gained in nursing concept representation to address genomic and environmental data; guide the reengineering of nursing practice; harness new technologies to empower patients and their caregivers for collaborative knowledge development; develop user-configurable software approaches that support complex data visualization, analysis, and predictive modeling; facilitate the development of middle-range nursing informatics theories; and encourage innovative evaluation methodologies that attend to human-computer interface factors and organizational context. PMID:18922269
Phylogenetic Analysis of Klebsiella pneumoniae from Hospitalized Children, Pakistan.
Ejaz, Hasan; Wang, Nancy; Wilksch, Jonathan J; Page, Andrew J; Cao, Hanwei; Gujaran, Shruti; Keane, Jacqueline A; Lithgow, Trevor; Ul-Haq, Ikram; Dougan, Gordon; Strugnell, Richard A; Heinz, Eva
2017-11-01
Klebsiella pneumoniae shows increasing emergence of multidrug-resistant lineages, including strains resistant to all available antimicrobial drugs. We conducted whole-genome sequencing of 178 highly drug-resistant isolates from a tertiary hospital in Lahore, Pakistan. Phylogenetic analyses to place these isolates into global context demonstrate the expansion of multiple independent lineages, including K. quasipneumoniae.
Gene integrated set profile analysis: a context-based approach for inferring biological endpoints
Kowalski, Jeanne; Dwivedi, Bhakti; Newman, Scott; Switchenko, Jeffery M.; Pauly, Rini; Gutman, David A.; Arora, Jyoti; Gandhi, Khanjan; Ainslie, Kylie; Doho, Gregory; Qin, Zhaohui; Moreno, Carlos S.; Rossi, Michael R.; Vertino, Paula M.; Lonial, Sagar; Bernal-Mizrachi, Leon; Boise, Lawrence H.
2016-01-01
The identification of genes with specific patterns of change (e.g. down-regulated and methylated) as phenotype drivers or samples with similar profiles for a given gene set as drivers of clinical outcome, requires the integration of several genomic data types for which an ‘integrate by intersection’ (IBI) approach is often applied. In this approach, results from separate analyses of each data type are intersected, which has the limitation of a smaller intersection with more data types. We introduce a new method, GISPA (Gene Integrated Set Profile Analysis) for integrated genomic analysis and its variation, SISPA (Sample Integrated Set Profile Analysis) for defining respective genes and samples with the context of similar, a priori specified molecular profiles. With GISPA, the user defines a molecular profile that is compared among several classes and obtains ranked gene sets that satisfy the profile as drivers of each class. With SISPA, the user defines a gene set that satisfies a profile and obtains sample groups of profile activity. Our results from applying GISPA to human multiple myeloma (MM) cell lines contained genes of known profiles and importance, along with several novel targets, and their further SISPA application to MM coMMpass trial data showed clinical relevance. PMID:26826710
Deciphering the Origin of Dogs: From Fossils to Genomes.
Freedman, Adam H; Wayne, Robert K
2017-02-08
Understanding the timing and geographic context of dog origins is a crucial component for understanding human history, as well as the evolutionary context in which the morphological and behavioral divergence of dogs from wolves occurred. A substantial challenge to understanding domestication is that dogs have experienced a complicated demographic history. An initial severe bottleneck was associated with domestication followed by postdivergence gene flow between dogs and wolves, as well as population expansions, contractions, and replacements. In addition, because the domestication of dogs occurred in the relatively recent past, much of the observed polymorphism may be shared between dogs and wolves, limiting the power to distinguish between alternative models of dog history. Greater insight into the domestication process will require explicit tests of alternative models of domestication through the joint analysis of whole genomes from modern lineages and ancient wolves and dogs from across Eurasia.
Formin homology 2 domains occur in multiple contexts in angiosperms
Cvrčková, Fatima; Novotný, Marian; Pícková, Denisa; Žárský, Viktor
2004-01-01
Background Involvement of conservative molecular modules and cellular mechanisms in the widely diversified processes of eukaryotic cell morphogenesis leads to the intriguing question: how do similar proteins contribute to dissimilar morphogenetic outputs. Formins (FH2 proteins) play a central part in the control of actin organization and dynamics, providing a good example of evolutionarily versatile use of a conserved protein domain in the context of a variety of lineage-specific structural and signalling interactions. Results In order to identify possible plant-specific sequence features within the FH2 protein family, we performed a detailed analysis of angiosperm formin-related sequences available in public databases, with particular focus on the complete Arabidopsis genome and the nearly finished rice genome sequence. This has led to revision of the current annotation of half of the 22 Arabidopsis formin-related genes. Comparative analysis of the two plant genomes revealed a good conservation of the previously described two subfamilies of plant formins (Class I and Class II), as well as several subfamilies within them that appear to predate the separation of monocot and dicot plants. Moreover, a number of plant Class II formins share an additional conserved domain, related to the protein phosphatase/tensin/auxilin fold. However, considerable inter-species variability sets limits to generalization of any functional conclusions reached on a single species such as Arabidopsis. Conclusions The plant-specific domain context of the conserved FH2 domain, as well as plant-specific features of the domain itself, may reflect distinct functional requirements in plant cells. The variability of formin structures found in plants far exceeds that known from both fungi and metazoans, suggesting a possible contribution of FH2 proteins in the evolution of the plant type of multicellularity. PMID:15256004
2013-01-01
Background The binding of transcription factors to DNA plays an essential role in the regulation of gene expression. Numerous experiments elucidated binding sequences which subsequently have been used to derive statistical models for predicting potential transcription factor binding sites (TFBS). The rapidly increasing number of genome sequence data requires sophisticated computational approaches to manage and query experimental and predicted TFBS data in the context of other epigenetic factors and across different organisms. Results We have developed D-Light, a novel client-server software package to store and query large amounts of TFBS data for any number of genomes. Users can add small-scale data to the server database and query them in a large scale, genome-wide promoter context. The client is implemented in Java and provides simple graphical user interfaces and data visualization. Here we also performed a statistical analysis showing what a user can expect for certain parameter settings and we illustrate the usage of D-Light with the help of a microarray data set. Conclusions D-Light is an easy to use software tool to integrate, store and query annotation data for promoters. A public D-Light server, the client and server software for local installation and the source code under GNU GPL license are available at http://biwww.che.sbg.ac.at/dlight. PMID:23617301
Damienikan, Aliaksandr U.
2016-01-01
The majority of bacterial genome annotations are currently automated and based on a ‘gene by gene’ approach. Regulatory signals and operon structures are rarely taken into account which often results in incomplete and even incorrect gene function assignments. Here we present SigmoID, a cross-platform (OS X, Linux and Windows) open-source application aiming at simplifying the identification of transcription regulatory sites (promoters, transcription factor binding sites and terminators) in bacterial genomes and providing assistance in correcting annotations in accordance with regulatory information. SigmoID combines a user-friendly graphical interface to well known command line tools with a genome browser for visualising regulatory elements in genomic context. Integrated access to online databases with regulatory information (RegPrecise and RegulonDB) and web-based search engines speeds up genome analysis and simplifies correction of genome annotation. We demonstrate some features of SigmoID by constructing a series of regulatory protein binding site profiles for two groups of bacteria: Soft Rot Enterobacteriaceae (Pectobacterium and Dickeya spp.) and Pseudomonas spp. Furthermore, we inferred over 900 transcription factor binding sites and alternative sigma factor promoters in the annotated genome of Pectobacterium atrosepticum. These regulatory signals control putative transcription units covering about 40% of the P. atrosepticum chromosome. Reviewing the annotation in cases where it didn’t fit with regulatory information allowed us to correct product and gene names for over 300 loci. PMID:27257541
Deppdb--DNA electrostatic potential properties database: electrostatic properties of genome DNA.
Osypov, Alexander A; Krutinin, Gleb G; Kamzolova, Svetlana G
2010-06-01
The electrostatic properties of genome DNA influence its interactions with different proteins, in particular, the regulation of transcription by RNA-polymerases. DEPPDB--DNA Electrostatic Potential Properties Database--was developed to hold and provide all available information on the electrostatic properties of genome DNA combined with its sequence and annotation of biological and structural properties of genome elements and whole genomes. Genomes in DEPPDB are organized on a taxonomical basis. Currently, the database contains all the completely sequenced bacterial and viral genomes according to NCBI RefSeq. General properties of the genome DNA electrostatic potential profile and principles of its formation are revealed. This potential correlates with the GC content but does not correspond to it exactly and strongly depends on both the sequence arrangement and its context (flanking regions). Analysis of the promoter regions for bacterial and viral RNA polymerases revealed a correspondence between the scale of these proteins' physical properties and electrostatic profile patterns. We also discovered a direct correlation between the potential value and the binding frequency of RNA polymerase to DNA, supporting the idea of the role of electrostatics in these interactions. This matches a pronounced tendency of the promoter regions to possess higher values of the electrostatic potential.
Palacios-Flores, Kim; García-Sotelo, Jair; Castillo, Alejandra; Uribe, Carina; Aguilar, Luis; Morales, Lucía; Gómez-Romero, Laura; Reyes, José; Garciarubio, Alejandro; Boege, Margareta; Dávila, Guillermo
2018-01-01
We present a conceptually simple, sensitive, precise, and essentially nonstatistical solution for the analysis of genome variation in haploid organisms. The generation of a Perfect Match Genomic Landscape (PMGL), which computes intergenome identity with single nucleotide resolution, reveals signatures of variation wherever a query genome differs from a reference genome. Such signatures encode the precise location of different types of variants, including single nucleotide variants, deletions, insertions, and amplifications, effectively introducing the concept of a general signature of variation. The precise nature of variants is then resolved through the generation of targeted alignments between specific sets of sequence reads and known regions of the reference genome. Thus, the perfect match logic decouples the identification of the location of variants from the characterization of their nature, providing a unified framework for the detection of genome variation. We assessed the performance of the PMGL strategy via simulation experiments. We determined the variation profiles of natural genomes and of a synthetic chromosome, both in the context of haploid yeast strains. Our approach uncovered variants that have previously escaped detection. Moreover, our strategy is ideally suited for further refining high-quality reference genomes. The source codes for the automated PMGL pipeline have been deposited in a public repository. PMID:29367403
The Sorghum bicolor genome and the diversification of grasses
DOE Office of Scientific and Technical Information (OSTI.GOV)
Paterson, Andrew H.; Bowers, John E.; Bruggmann, Remy
2008-08-20
Sorghum, an African grass related to sugar cane and maize, is grown for food, feed, fibre and fuel. We present an initial analysis of the approx730-megabase Sorghum bicolor (L.) Moench genome, placing approx98percent of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information. Genetic recombination is largely confined to about one-third of the sorghum genome with gene order and density similar to those of rice. Retrotransposon accumulation in recombinationally recalcitrant heterochromatin explains the approx75percent larger genome size of sorghum compared with rice. Although gene and repetitive DNA distributions have been preserved since palaeopolyploidizationmore » approx70 million years ago, most duplicated gene sets lost one member before the sorghum rice divergence. Concerted evolution makes one duplicated chromosomal segment appear to be only a few million years old. About 24percent of genes are grass-specific and 7percent are sorghum-specific. Recent gene and microRNA duplications may contribute to sorghum's drought tolerance.« less
The Sorghum bicolor genome and the diversification of grasses.
Paterson, Andrew H; Bowers, John E; Bruggmann, Rémy; Dubchak, Inna; Grimwood, Jane; Gundlach, Heidrun; Haberer, Georg; Hellsten, Uffe; Mitros, Therese; Poliakov, Alexander; Schmutz, Jeremy; Spannagl, Manuel; Tang, Haibao; Wang, Xiyin; Wicker, Thomas; Bharti, Arvind K; Chapman, Jarrod; Feltus, F Alex; Gowik, Udo; Grigoriev, Igor V; Lyons, Eric; Maher, Christopher A; Martis, Mihaela; Narechania, Apurva; Otillar, Robert P; Penning, Bryan W; Salamov, Asaf A; Wang, Yu; Zhang, Lifang; Carpita, Nicholas C; Freeling, Michael; Gingle, Alan R; Hash, C Thomas; Keller, Beat; Klein, Patricia; Kresovich, Stephen; McCann, Maureen C; Ming, Ray; Peterson, Daniel G; Mehboob-ur-Rahman; Ware, Doreen; Westhoff, Peter; Mayer, Klaus F X; Messing, Joachim; Rokhsar, Daniel S
2009-01-29
Sorghum, an African grass related to sugar cane and maize, is grown for food, feed, fibre and fuel. We present an initial analysis of the approximately 730-megabase Sorghum bicolor (L.) Moench genome, placing approximately 98% of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information. Genetic recombination is largely confined to about one-third of the sorghum genome with gene order and density similar to those of rice. Retrotransposon accumulation in recombinationally recalcitrant heterochromatin explains the approximately 75% larger genome size of sorghum compared with rice. Although gene and repetitive DNA distributions have been preserved since palaeopolyploidization approximately 70 million years ago, most duplicated gene sets lost one member before the sorghum-rice divergence. Concerted evolution makes one duplicated chromosomal segment appear to be only a few million years old. About 24% of genes are grass-specific and 7% are sorghum-specific. Recent gene and microRNA duplications may contribute to sorghum's drought tolerance.
Building a genome analysis pipeline to predict disease risk and prevent disease.
Bromberg, Y
2013-11-01
Reduced costs and increased speed and accuracy of sequencing can bring the genome-based evaluation of individual disease risk to the bedside. While past efforts have identified a number of actionable mutations, the bulk of genetic risk remains hidden in sequence data. The biggest challenge facing genomic medicine today is the development of new techniques to predict the specifics of a given human phenome (set of all expressed phenotypes) encoded by each individual variome (full set of genome variants) in the context of the given environment. Numerous tools exist for the computational identification of the functional effects of a single variant. However, the pipelines taking advantage of full genomic, exomic, transcriptomic (and other) sequences have only recently become a reality. This review looks at the building of methodologies for predicting "variome"-defined disease risk. It also discusses some of the challenges for incorporating such a pipeline into everyday medical practice. © 2013. Published by Elsevier Ltd. All rights reserved.
Baichoo, Shakuntala; Ouzounis, Christos A
A multitude of algorithms for sequence comparison, short-read assembly and whole-genome alignment have been developed in the general context of molecular biology, to support technology development for high-throughput sequencing, numerous applications in genome biology and fundamental research on comparative genomics. The computational complexity of these algorithms has been previously reported in original research papers, yet this often neglected property has not been reviewed previously in a systematic manner and for a wider audience. We provide a review of space and time complexity of key sequence analysis algorithms and highlight their properties in a comprehensive manner, in order to identify potential opportunities for further research in algorithm or data structure optimization. The complexity aspect is poised to become pivotal as we will be facing challenges related to the continuous increase of genomic data on unprecedented scales and complexity in the foreseeable future, when robust biological simulation at the cell level and above becomes a reality. Copyright © 2017 Elsevier B.V. All rights reserved.
Sequences Associated with Centromere Competency in the Human Genome
Hayden, Karen E.; Strome, Erin D.; Merrett, Stephanie L.; Lee, Hye-Ran; Rudd, M. Katharine
2013-01-01
Centromeres, the sites of spindle attachment during mitosis and meiosis, are located in specific positions in the human genome, normally coincident with diverse subsets of alpha satellite DNA. While there is strong evidence supporting the association of some subfamilies of alpha satellite with centromere function, the basis for establishing whether a given alpha satellite sequence is or is not designated a functional centromere is unknown, and attempts to understand the role of particular sequence features in establishing centromere identity have been limited by the near identity and repetitive nature of satellite sequences. Utilizing a broadly applicable experimental approach to test sequence competency for centromere specification, we have carried out a genomic and epigenetic functional analysis of endogenous human centromere sequences available in the current human genome assembly. The data support a model in which functionally competent sequences confer an opportunity for centromere specification, integrating genomic and epigenetic signals and promoting the concept of context-dependent centromere inheritance. PMID:23230266
Genomics of Actinobacteria: Tracing the Evolutionary History of an Ancient Phylum†
Ventura, Marco; Canchaya, Carlos; Tauch, Andreas; Chandra, Govind; Fitzgerald, Gerald F.; Chater, Keith F.; van Sinderen, Douwe
2007-01-01
Summary: Actinobacteria constitute one of the largest phyla among Bacteria and represent gram-positive bacteria with a high G+C content in their DNA. This bacterial group includes microorganisms exhibiting a wide spectrum of morphologies, from coccoid to fragmenting hyphal forms, as well as possessing highly variable physiological and metabolic properties. Furthermore, Actinobacteria members have adopted different lifestyles, and can be pathogens (e.g., Corynebacterium, Mycobacterium, Nocardia, Tropheryma, and Propionibacterium), soil inhabitants (Streptomyces), plant commensals (Leifsonia), or gastrointestinal commensals (Bifidobacterium). The divergence of Actinobacteria from other bacteria is ancient, making it impossible to identify the phylogenetically closest bacterial group to Actinobacteria. Genome sequence analysis has revolutionized every aspect of bacterial biology by enhancing the understanding of the genetics, physiology, and evolutionary development of bacteria. Various actinobacterial genomes have been sequenced, revealing a wide genomic heterogeneity probably as a reflection of their biodiversity. This review provides an account of the recent explosion of actinobacterial genomics data and an attempt to place this in a biological and evolutionary context. PMID:17804669
Genomics Portals: integrative web-platform for mining genomics data.
Shinde, Kaustubh; Phatak, Mukta; Johannes, Freudenberg M; Chen, Jing; Li, Qian; Vineet, Joshi K; Hu, Zhen; Ghosh, Krishnendu; Meller, Jaroslaw; Medvedovic, Mario
2010-01-13
A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org.
Genomics Portals: integrative web-platform for mining genomics data
2010-01-01
Background A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Results Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. Conclusion The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org. PMID:20070909
Shestov, Maksim; Ontañón, Santiago; Tozeren, Aydin
2015-10-13
Bacterial infections comprise a global health challenge as the incidences of antibiotic resistance increase. Pathogenic potential of bacteria has been shown to be context dependent, varying in response to environment and even within the strains of the same genus. We used the KEGG repository and extensive literature searches to identify among the 2527 bacterial genomes in the literature those implicated as pathogenic to the host, including those which show pathogenicity in a context dependent manner. Using data on the gene contents of these genomes, we identified sets of genes highly abundant in pathogenic but relatively absent in commensal strains and vice versa. In addition, we carried out genome comparison within a genus for the seventeen largest genera in our genome collection. We projected the resultant lists of ortholog genes onto KEGG bacterial pathways to identify clusters and circuits, which can be linked to either pathogenicity or synergy. Gene circuits relatively abundant in nonpathogenic bacteria often mediated biosynthesis of antibiotics. Other synergy-linked circuits reduced drug-induced toxicity. Pathogen-abundant gene circuits included modules in one-carbon folate, two-component system, type-3 secretion system, and peptidoglycan biosynthesis. Antibiotics-resistant bacterial strains possessed genes modulating phagocytosis, vesicle trafficking, cytoskeletal reorganization, and regulation of the inflammatory response. Our study also identified bacterial genera containing a circuit, elements of which were previously linked to Alzheimer's disease. Present study produces for the first time, a signature, in the form of a robust list of gene circuitry whose presence or absence could potentially define the pathogenicity of a microbiome. Extensive literature search substantiated a bulk majority of the commensal and pathogenic circuitry in our predicted list. Scanning microbiome libraries for these circuitry motifs will provide further insights into the complex and context dependent pathogenicity of bacteria.
PhytoPath: an integrative resource for plant pathogen genomics.
Pedro, Helder; Maheswari, Uma; Urban, Martin; Irvine, Alistair George; Cuzick, Alayne; McDowall, Mark D; Staines, Daniel M; Kulesha, Eugene; Hammond-Kosack, Kim Elizabeth; Kersey, Paul Julian
2016-01-04
PhytoPath (www.phytopathdb.org) is a resource for genomic and phenotypic data from plant pathogen species, that integrates phenotypic data for genes from PHI-base, an expertly curated catalog of genes with experimentally verified pathogenicity, with the Ensembl tools for data visualization and analysis. The resource is focused on fungi, protists (oomycetes) and bacterial plant pathogens that have genomes that have been sequenced and annotated. Genes with associated PHI-base data can be easily identified across all plant pathogen species using a BioMart-based query tool and visualized in their genomic context on the Ensembl genome browser. The PhytoPath resource contains data for 135 genomic sequences from 87 plant pathogen species, and 1364 genes curated for their role in pathogenicity and as targets for chemical intervention. Support for community annotation of gene models is provided using the WebApollo online gene editor, and we are working with interested communities to improve reference annotation for selected species. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Epigenomics Reveals a Functional Genome Anatomy and a New Approach to Common Disease
Feinberg, Andrew P.
2010-01-01
Standfirst header Epigenomics provides the functional context of genome sequence, analogous to the functional anatomy of the human body provided by Vesalius a half millennium ago. Much of what appear to be inconclusive genetic data for common disease could therefore become meaningful in an epigenomic context. PMID:20944596
Teaching Expression Proteomics: From the Wet-Lab to the Laptop
ERIC Educational Resources Information Center
Teixeira, Miguel C.; Santos, Pedro M.; Rodrigues, Catarina; Sa-Correia, Isabel
2009-01-01
Expression proteomics has become, in recent years, a key genome-wide expression approach in fundamental and applied life sciences. This postgenomic technology aims the quantitative analysis of all the proteins or protein forms (the so-called proteome) of a given organism in a given environmental and genetic context. It is a challenge to provide…
Phylogenetic Analysis of Klebsiella pneumoniae from Hospitalized Children, Pakistan
Ejaz, Hasan; Wang, Nancy; Wilksch, Jonathan J.; Page, Andrew J.; Cao, Hanwei; Gujaran, Shruti; Keane, Jacqueline A.; Lithgow, Trevor; ul-Haq, Ikram; Dougan, Gordon
2017-01-01
Klebsiella pneumoniae shows increasing emergence of multidrug-resistant lineages, including strains resistant to all available antimicrobial drugs. We conducted whole-genome sequencing of 178 highly drug-resistant isolates from a tertiary hospital in Lahore, Pakistan. Phylogenetic analyses to place these isolates into global context demonstrate the expansion of multiple independent lineages, including K. quasipneumoniae. PMID:29048298
First generation annotations for the fathead minnow (Pimephales promelas) genome
Ab initio gene prediction and evidence alignment were used to produce the first annotations for the fathead minnow SOAPdenovo genome assembly. Additionally, a genome browser hosted at genome.setac.org provides simplified access to the annotation data in context with fathead minno...
Corona, Erik; Chen, Rong; Sikora, Martin; Morgan, Alexander A.; Patel, Chirag J.; Ramesh, Aditya; Bustamante, Carlos D.; Butte, Atul J.
2013-01-01
Genetic diversity across different human populations can enhance understanding of the genetic basis of disease. We calculated the genetic risk of 102 diseases in 1,043 unrelated individuals across 51 populations of the Human Genome Diversity Panel. We found that genetic risk for type 2 diabetes and pancreatic cancer decreased as humans migrated toward East Asia. In addition, biliary liver cirrhosis, alopecia areata, bladder cancer, inflammatory bowel disease, membranous nephropathy, systemic lupus erythematosus, systemic sclerosis, ulcerative colitis, and vitiligo have undergone genetic risk differentiation. This analysis represents a large-scale attempt to characterize genetic risk differentiation in the context of migration. We anticipate that our findings will enable detailed analysis pertaining to the driving forces behind genetic risk differentiation. PMID:23717210
El-Telbany, Ahmed
2012-01-01
Cancer is now known as a disease of genomic alterations. Mutational analysis and genomics profiling in recent years have advanced the field of lung cancer genetics/genomics significantly. It is becoming more accepted now that the identification of genomic alterations in lung cancer can impact therapeutics, especially when the alterations represent “oncogenic drivers” in the processes of tumorigenesis and progression. In this review, we will highlight the key driver oncogenic gene mutations and fusions identified in lung cancer. The review will summarize and report the available demographic and clinicopathological data as well as molecular details behind various lung cancer gene alterations in the context of race. We hope to shed some light into the disparities in the incidence of various genetic mutations among lung cancer patients of different racial backgrounds. As molecularly targeted therapy continues to advance in lung cancer, racial differences in specific genetic/genomic alterations can have an important impact in the choices of therapeutics and in our understanding of the drug sensitivity/resistance profile. The most relevant genes in lung cancer described in this review include the following: EGFR, KRAS, MET, LKB1, BRAF, PIK3CA, ALK, RET, and ROS1. Commonly identified genetic/genomic alterations such as missense or nonsense mutations, small insertions or deletions, alternative splicing, and chromosomal fusion rearrangements were discussed. Relevance in current targeted therapeutic drugs was mentioned when appropriate. We also highlighted various targeted therapeutics that are currently under clinical development, such as the MET inhibitors and antibodies. With the advent of next-generation sequencing, the landscape of genomic alterations in lung cancer is expected to be much transformed and detailed in upcoming years. These genomic landscape differences in the context of racial disparities should be emphasized both in tumorigenesis and in drug sensitivity/resistance. It is hoped that such effort will help to diminish racial disparities in lung cancer outcome in the future. PMID:23264847
Kent, Michael; García-Deister, Vivette; López-Beltrán, Carlos; Santos, Ricardo Ventura; Schwartz-Marín, Ernesto; Wade, Peter
2015-01-01
This article explores the relationship between genetic research, nationalism and the construction of collective social identities in Latin America. It makes a comparative analysis of two research projects – the ‘Genoma Mexicano’ and the ‘Homo Brasilis’ – both of which sought to establish national and genetic profiles. Both have reproduced and strengthened the idea of their respective nations of focus, incorporating biological elements into debates on social identities. Also, both have placed the unifying figure of the mestizo/mestiço at the heart of national identity constructions, and in so doing have displaced alternative identity categories, such as those based on race. However, having been developed in different national contexts, these projects have had distinct scientific and social trajectories: in Mexico, the genomic mestizo is mobilized mainly in relation to health, while in Brazil the key arena is that of race. We show the importance of the nation as a frame for mobilizing genetic data in public policy debates, and demonstrate how race comes in and out of focus in different Latin American national contexts of genomic research, while never completely disappearing. PMID:27479999
2014-01-01
Background Neisseria meningitidis expresses type four pili (Tfp) which are important for colonisation and virulence. Tfp have been considered as one of the most variable structures on the bacterial surface due to high frequency gene conversion, resulting in amino acid sequence variation of the major pilin subunit (PilE). Meningococci express either a class I or a class II pilE gene and recent work has indicated that class II pilins do not undergo antigenic variation, as class II pilE genes encode conserved pilin subunits. The purpose of this work was to use whole genome sequences to further investigate the frequency and variability of the class II pilE genes in meningococcal isolate collections. Results We analysed over 600 publically available whole genome sequences of N. meningitidis isolates to determine the sequence and genomic organization of pilE. We confirmed that meningococcal strains belonging to a limited number of clonal complexes (ccs, namely cc1, cc5, cc8, cc11 and cc174) harbour a class II pilE gene which is conserved in terms of sequence and chromosomal context. We also identified pilS cassettes in all isolates with class II pilE, however, our analysis indicates that these do not serve as donor sequences for pilE/pilS recombination. Furthermore, our work reveals that the class II pilE locus lacks the DNA sequence motifs that enable (G4) or enhance (Sma/Cla repeat) pilin antigenic variation. Finally, through analysis of pilin genes in commensal Neisseria species we found that meningococcal class II pilE genes are closely related to pilE from Neisseria lactamica and Neisseria polysaccharea, suggesting horizontal transfer among these species. Conclusions Class II pilins can be defined by their amino acid sequence and genomic context and are present in meningococcal isolates which have persisted and spread globally. The absence of G4 and Sma/Cla sequences adjacent to the class II pilE genes is consistent with the lack of pilin subunit variation in these isolates, although horizontal transfer may generate class II pilin diversity. This study supports the suggestion that high frequency antigenic variation of pilin is not universal in pathogenic Neisseria. PMID:24690385
Vanneste, Kevin; Baele, Guy; Maere, Steven; Van de Peer, Yves
2014-01-01
Ancient whole-genome duplications (WGDs), also referred to as paleopolyploidizations, have been reported in most evolutionary lineages. Their attributed role remains a major topic of discussion, ranging from an evolutionary dead end to a road toward evolutionary success, with evidence supporting both fates. Previously, based on dating WGDs in a limited number of plant species, we found a clustering of angiosperm paleopolyploidizations around the Cretaceous–Paleogene (K–Pg) extinction event about 66 million years ago. Here we revisit this finding, which has proven controversial, by combining genome sequence information for many more plant lineages and using more sophisticated analyses. We include 38 full genome sequences and three transcriptome assemblies in a Bayesian evolutionary analysis framework that incorporates uncorrelated relaxed clock methods and fossil uncertainty. In accordance with earlier findings, we demonstrate a strongly nonrandom pattern of genome duplications over time with many WGDs clustering around the K–Pg boundary. We interpret these results in the context of recent studies on invasive polyploid plant species, and suggest that polyploid establishment is promoted during times of environmental stress. We argue that considering the evolutionary potential of polyploids in light of the environmental and ecological conditions present around the time of polyploidization could mitigate the stark contrast in the proposed evolutionary fates of polyploids. PMID:24835588
Wenzel, Marius A; Douglas, Alex; James, Marianne C; Redpath, Steve M; Piertney, Stuart B
2016-01-01
Landscape genomics promises to provide novel insights into how neutral and adaptive processes shape genome-wide variation within and among populations. However, there has been little emphasis on examining whether individual-based phenotype-genotype relationships derived from approaches such as genome-wide association (GWAS) manifest themselves as a population-level signature of selection in a landscape context. The two may prove irreconcilable as individual-level patterns become diluted by high levels of gene flow and complex phenotypic or environmental heterogeneity. We illustrate this issue with a case study that examines the role of the highly prevalent gastrointestinal nematode Trichostrongylus tenuis in shaping genomic signatures of selection in red grouse (Lagopus lagopus scotica). Individual-level GWAS involving 384 SNPs has previously identified five SNPs that explain variation in T. tenuis burden. Here, we examine whether these same SNPs display population-level relationships between T. tenuis burden and genetic structure across a small-scale landscape of 21 sites with heterogeneous parasite pressure. Moreover, we identify adaptive SNPs showing signatures of directional selection using F(ST) outlier analysis and relate population- and individual-level patterns of multilocus neutral and adaptive genetic structure to T. tenuis burden. The five candidate SNPs for parasite-driven selection were neither associated with T. tenuis burden on a population level, nor under directional selection. Similarly, there was no evidence of parasite-driven selection in SNPs identified as candidates for directional selection. We discuss these results in the context of red grouse ecology and highlight the broader consequences for the utility of landscape genomics approaches for identifying signatures of selection. © 2015 John Wiley & Sons Ltd.
Mandage, Rajendra; Telford, Marco; Rodríguez, Juan Antonio; Farré, Xavier; Layouni, Hafid; Marigorta, Urko M; Cundiff, Caitlin; Heredia-Genestar, Jose Maria; Navarro, Arcadi; Santpere, Gabriel
2017-01-01
Epstein-Barr virus (EBV), human herpes virus 4, has been classically associated with infectious mononucleosis, multiple sclerosis and several types of cancers. Many of these diseases show marked geographical differences in prevalence, which points to underlying genetic and/or environmental factors. Those factors may include a different susceptibility to EBV infection and viral copy number among human populations. Since EBV is commonly used to transform B-cells into lymphoblastoid cell lines (LCLs) we hypothesize that differences in EBV copy number among individual LCLs may reflect differential susceptibility to EBV infection. To test this hypothesis, we retrieved whole-genome sequenced EBV-mapping reads from 1,753 LCL samples derived from 19 populations worldwide that were sequenced within the context of the 1000 Genomes Project. An in silico methodology was developed to estimate the number of EBV copy number in LCLs and validated these estimations by real-time PCR. After experimentally confirming that EBV relative copy number remains stable over cell passages, we performed a genome wide association analysis (GWAS) to try detecting genetic variants of the host that may be associated with EBV copy number. Our GWAS has yielded several genomic regions suggestively associated with the number of EBV genomes per cell in LCLs, unraveling promising candidate genes such as CAND1, a known inhibitor of EBV replication. While this GWAS does not unequivocally establish the degree to which genetic makeup of individuals determine viral levels within their derived LCLs, for which a larger sample size will be needed, it potentially highlighted human genes affecting EBV-related processes, which constitute interesting candidates to follow up in the context of EBV related pathologies.
Treadwell, Marsha J.; Makani, Julie; Ohene-Frempong, Kwaku; Ofori-Acquah, Solomon; McCurdy, Sheryl; de Vries, Jantina; Bukini, Daima; Dennis-Antwi, Jemima; Kamga, Karen Kengne; Mbekenga, Columba; Wonkam, Edmond Tingang; Tangwa, Godfrey; Royal, Charmaine D.
2017-01-01
Abstract Advances in omics technologies alone are not a guarantee that science will translate to robust responsible innovation that is firmly grounded in societal values. This study aimed to identify best practices for Ethical, Legal, and Social Implications (ELSI) research in Africa that allows for optimal integration of community perspectives into the design and implementation of genomics research. In a large sample of 346 stakeholders in Cameroon, Ghana, and Tanzania (59% women), we used a qualitative study design with a phenomenological approach and conducted 32 group and 74 individual interviews (25% rural). We imported interview recordings into NVivo software for analysis. We created a “concept map” to organize the coded information, with Perspectives on Genomics and Sickle Cell Disease (SCD) Public Health Interventions as the central themes. We found that (1) analyses of major subthemes across and within countries revealed differential knowledge and experiences of SCD, and perspectives on various aspects of research and genomics; (2) we were able to gather empirical data efficiently from urban and rural stakeholders, to study the issues related to sample sharing, consent processes, and return of clinical and genomic study results; (3) the concept of nondirectiveness in modern genetic medicine practice can be challenged by the views of stakeholders in the context of a high-burden disease such as SCD; and (4) linking community views to current and proposed public health interventions could be understood within the context of each specific country. Our work informs future qualitative social science and technology policy research designs on genomics applications in Africa.
Pyne, Michael E; Liu, Xuejia; Moo-Young, Murray; Chung, Duane A; Chou, C Perry
2016-09-19
Clostridium pasteurianum is emerging as a prospective host for the production of biofuels and chemicals, and has recently been shown to directly consume electric current. Despite this growing biotechnological appeal, the organism's genetics and central metabolism remain poorly understood. Here we present a concurrent genome sequence for the C. pasteurianum type strain and provide extensive genomic analysis of the organism's defence mechanisms and central fermentative metabolism. Next generation genome sequencing produced reads corresponding to spontaneous excision of a novel phage, designated φ6013, which could be induced using mitomycin C and detected using PCR and transmission electron microscopy. Methylome analysis of sequencing reads provided a near-complete glimpse into the organism's restriction-modification systems. We also unveiled the chief C. pasteurianum Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) locus, which was found to exemplify a Type I-B system. Finally, we show that C. pasteurianum possesses a highly complex fermentative metabolism whereby the metabolic pathways enlisted by the cell is governed by the degree of reductance of the substrate. Four distinct fermentation profiles, ranging from exclusively acidogenic to predominantly alcohologenic, were observed through redox consideration of the substrate. A detailed discussion of the organism's central metabolism within the context of metabolic engineering is provided.
A genome-wide resource for the analysis of protein localisation in Drosophila
Sarov, Mihail; Barz, Christiane; Jambor, Helena; Hein, Marco Y; Schmied, Christopher; Suchold, Dana; Stender, Bettina; Janosch, Stephan; KJ, Vinay Vikas; Krishnan, RT; Krishnamoorthy, Aishwarya; Ferreira, Irene RS; Ejsmont, Radoslaw K; Finkl, Katja; Hasse, Susanne; Kämpfer, Philipp; Plewka, Nicole; Vinis, Elisabeth; Schloissnig, Siegfried; Knust, Elisabeth; Hartenstein, Volker; Mann, Matthias; Ramaswami, Mani; VijayRaghavan, K; Tomancak, Pavel; Schnorrer, Frank
2016-01-01
The Drosophila genome contains >13000 protein-coding genes, the majority of which remain poorly investigated. Important reasons include the lack of antibodies or reporter constructs to visualise these proteins. Here, we present a genome-wide fosmid library of 10000 GFP-tagged clones, comprising tagged genes and most of their regulatory information. For 880 tagged proteins, we created transgenic lines, and for a total of 207 lines, we assessed protein expression and localisation in ovaries, embryos, pupae or adults by stainings and live imaging approaches. Importantly, we visualised many proteins at endogenous expression levels and found a large fraction of them localising to subcellular compartments. By applying genetic complementation tests, we estimate that about two-thirds of the tagged proteins are functional. Moreover, these tagged proteins enable interaction proteomics from developing pupae and adult flies. Taken together, this resource will boost systematic analysis of protein expression and localisation in various cellular and developmental contexts. DOI: http://dx.doi.org/10.7554/eLife.12068.001 PMID:26896675
Baker, Kate S.; Dallman, Timothy J.; Behar, Adi; Weill, François-Xavier; Gouali, Malika; Sobel, Jeremy; Fookes, Maria; Valinsky, Lea; Gal-Mor, Ohad; Connor, Thomas R.; Nissan, Israel; Bertrand, Sophie; Parkhill, Julian; Jenkins, Claire; Cohen, Dani
2016-01-01
Shigellae are sensitive indicator species for studying trends in the international transmission of antimicrobial-resistant Enterobacteriaceae. Orthodox Jewish communities (OJCs) are a known risk group for shigellosis; Shigella sonnei is cyclically epidemic in OJCs in Israel, and sporadic outbreaks occur in OJCs elsewhere. We generated whole-genome sequences for 437 isolates of S. sonnei from OJCs and non-OJCs collected over 22 years in Europe (the United Kingdom, France, and Belgium), the United States, Canada, and Israel and analyzed these within a known global genomic context. Through phylogenetic and genomic analysis, we showed that strains from outbreaks in OJCs outside of Israel are distinct from strains in the general population and relate to a single multidrug-resistant sublineage of S. sonnei that prevails in Israel. Further Bayesian phylogenetic analysis showed that this strain emerged approximately 30 years ago, demonstrating the speed at which antimicrobial drug–resistant pathogens can spread widely through geographically dispersed, but internationally connected, communities. PMID:27532625
Baker, Kate S; Dallman, Timothy J; Behar, Adi; Weill, François-Xavier; Gouali, Malika; Sobel, Jeremy; Fookes, Maria; Valinsky, Lea; Gal-Mor, Ohad; Connor, Thomas R; Nissan, Israel; Bertrand, Sophie; Parkhill, Julian; Jenkins, Claire; Cohen, Dani; Thomson, Nicholas R
2016-09-01
Shigellae are sensitive indicator species for studying trends in the international transmission of antimicrobial-resistant Enterobacteriaceae. Orthodox Jewish communities (OJCs) are a known risk group for shigellosis; Shigella sonnei is cyclically epidemic in OJCs in Israel, and sporadic outbreaks occur in OJCs elsewhere. We generated whole-genome sequences for 437 isolates of S. sonnei from OJCs and non-OJCs collected over 22 years in Europe (the United Kingdom, France, and Belgium), the United States, Canada, and Israel and analyzed these within a known global genomic context. Through phylogenetic and genomic analysis, we showed that strains from outbreaks in OJCs outside of Israel are distinct from strains in the general population and relate to a single multidrug-resistant sublineage of S. sonnei that prevails in Israel. Further Bayesian phylogenetic analysis showed that this strain emerged approximately 30 years ago, demonstrating the speed at which antimicrobial drug-resistant pathogens can spread widely through geographically dispersed, but internationally connected, communities.
Biological invasions, climate change and genomics
Chown, Steven L; Hodgins, Kathryn A; Griffin, Philippa C; Oakeshott, John G; Byrne, Margaret; Hoffmann, Ary A
2015-01-01
The rate of biological invasions is expected to increase as the effects of climate change on biological communities become widespread. Climate change enhances habitat disturbance which facilitates the establishment of invasive species, which in turn provides opportunities for hybridization and introgression. These effects influence local biodiversity that can be tracked through genetic and genomic approaches. Metabarcoding and metagenomic approaches provide a way of monitoring some types of communities under climate change for the appearance of invasives. Introgression and hybridization can be followed by the analysis of entire genomes so that rapidly changing areas of the genome are identified and instances of genetic pollution monitored. Genomic markers enable accurate tracking of invasive species’ geographic origin well beyond what was previously possible. New genomic tools are promoting fresh insights into classic questions about invading organisms under climate change, such as the role of genetic variation, local adaptation and climate pre-adaptation in successful invasions. These tools are providing managers with often more effective means to identify potential threats, improve surveillance and assess impacts on communities. We provide a framework for the application of genomic techniques within a management context and also indicate some important limitations in what can be achieved. PMID:25667601
Introduction to the fathead minnow genome browser and opportunities for collaborative development
Ab initio gene prediction and evidence alignment were used to produce the first annotations for the fathead minnow SOAPdenovo genome assembly. Additionally, a genome browser hosted at genome.setac.org provides simplified access to the annotation data in context with fathead minno...
Lee, Moon Young; Park, Chanjae; Berent, Robyn M.; Park, Paul J.; Fuchs, Robert; Syn, Hannah; Chin, Albert; Townsend, Jared; Benson, Craig C.; Redelman, Doug; Shen, Tsai-wei; Park, Jong Kun; Miano, Joseph M.; Sanders, Kenton M.; Ro, Seungil
2015-01-01
Genome-scale expression data on the absolute numbers of gene isoforms offers essential clues in cellular functions and biological processes. Smooth muscle cells (SMCs) perform a unique contractile function through expression of specific genes controlled by serum response factor (SRF), a transcription factor that binds to DNA sites known as the CArG boxes. To identify SRF-regulated genes specifically expressed in SMCs, we isolated SMC populations from mouse small intestine and colon, obtained their transcriptomes, and constructed an interactive SMC genome and CArGome browser. To our knowledge, this is the first online resource that provides a comprehensive library of all genetic transcripts expressed in primary SMCs. The browser also serves as the first genome-wide map of SRF binding sites. The browser analysis revealed novel SMC-specific transcriptional variants and SRF target genes, which provided new and unique insights into the cellular and biological functions of the cells in gastrointestinal (GI) physiology. The SRF target genes in SMCs, which were discovered in silico, were confirmed by proteomic analysis of SMC-specific Srf knockout mice. Our genome browser offers a new perspective into the alternative expression of genes in the context of SRF binding sites in SMCs and provides a valuable reference for future functional studies. PMID:26241044
Clinical evaluation incorporating a personal genome
Ashley, Euan A.; Butte, Atul J.; Wheeler, Matthew T.; Chen, Rong; Klein, Teri E.; Dewey, Frederick E.; Dudley, Joel T.; Ormond, Kelly E.; Pavlovic, Aleksandra; Hudgins, Louanne; Gong, Li; Hodges, Laura M.; Berlin, Dorit S.; Thorn, Caroline F.; Sangkuhl, Katrin; Hebert, Joan M.; Woon, Mark; Sagreiya, Hersh; Whaley, Ryan; Morgan, Alexander A.; Pushkarev, Dmitry; Neff, Norma F; Knowles, Joshua W.; Chou, Mike; Thakuria, Joseph; Rosenbaum, Abraham; Zaranek, Alexander Wait; Church, George; Greely, Henry T.; Quake, Stephen R.; Altman, Russ B.
2010-01-01
Background The cost of genomic information has fallen steeply but the path to clinical translation of risk estimates for common variants found in genome wide association studies remains unclear. Since the speed and cost of sequencing complete genomes is rapidly declining, more comprehensive means of analyzing these data in concert with rare variants for genetic risk assessment and individualisation of therapy are required. Here, we present the first integrated analysis of a complete human genome in a clinical context. Methods An individual with a family history of vascular disease and early sudden death was evaluated. Clinical assessment included risk prediction for coronary artery disease, screening for causes of sudden cardiac death, and genetic counselling. Genetic analysis included the development of novel methods for the integration of whole genome sequence data including 2.6 million single nucleotide polymorphisms and 752 copy number variations. The algorithm focused on predicting genetic risk of genes associated with known Mendelian disease, recognised drug responses, and pathogenicity for novel variants. In addition, since integration of risk ratios derived from case control studies is challenging, we estimated posterior probabilities from age and sex appropriate prior probability and likelihood ratios derived for each genotype. In addition, we developed a visualisation approach to account for gene-environment interactions and conditionally dependent risks. Findings We found increased genetic risk for myocardial infarction, type II diabetes and certain cancers. Rare variants in LPA are consistent with the family history of coronary artery disease. Pharmacogenomic analysis suggested a positive response to lipid lowering therapy, likely clopidogrel resistance, and a low initial dosing requirement for warfarin. Many variants of uncertain significance were reported. Interpretation Although challenges remain, our results suggest that whole genome sequencing can yield useful and clinically relevant information for individual patients, especially for those with a strong family history of significant disease. PMID:20435227
Niemiec, Emilia; Borry, Pascal; Pinxten, Wim; Howard, Heidi Carmen
2016-12-01
Whole exome sequencing (WES) and whole genome sequencing (WGS) have become increasingly available in the research and clinical settings and are now also being offered by direct-to-consumer (DTC) genetic testing (GT) companies. This offer can be perceived as amplifying the already identified concerns regarding adequacy of informed consent (IC) for both WES/WGS and the DTC GT context. We performed a qualitative content analysis of Websites of four companies offering WES/WGS DTC regarding the following elements of IC: pre-test counseling, benefits and risks, and incidental findings (IFs). The analysis revealed concerns, including the potential lack of pre-test counseling in three of the companies studied, missing relevant information in the risks and benefits sections, and potentially misleading information for consumers. Regarding IFs, only one company, which provides opportunistic screening, provides basic information about their management. In conclusion, some of the information (and related practices) present on the companies' Web pages salient to the consent process are not adequate in reference to recommendations for IC for WGS or WES in the clinical context. Requisite resources should be allocated to ensure that commercial companies are offering high-throughput sequencing under responsible conditions, including an adequate consent process. © 2016 WILEY PERIODICALS, INC.
The need for high-quality whole-genome sequence databases in microbial forensics.
Sjödin, Andreas; Broman, Tina; Melefors, Öjar; Andersson, Gunnar; Rasmusson, Birgitta; Knutsson, Rickard; Forsman, Mats
2013-09-01
Microbial forensics is an important part of a strengthened capability to respond to biocrime and bioterrorism incidents to aid in the complex task of distinguishing between natural outbreaks and deliberate acts. The goal of a microbial forensic investigation is to identify and criminally prosecute those responsible for a biological attack, and it involves a detailed analysis of the weapon--that is, the pathogen. The recent development of next-generation sequencing (NGS) technologies has greatly increased the resolution that can be achieved in microbial forensic analyses. It is now possible to identify, quickly and in an unbiased manner, previously undetectable genome differences between closely related isolates. This development is particularly relevant for the most deadly bacterial diseases that are caused by bacterial lineages with extremely low levels of genetic diversity. Whole-genome analysis of pathogens is envisaged to be increasingly essential for this purpose. In a microbial forensic context, whole-genome sequence analysis is the ultimate method for strain comparisons as it is informative during identification, characterization, and attribution--all 3 major stages of the investigation--and at all levels of microbial strain identity resolution (ie, it resolves the full spectrum from family to isolate). Given these capabilities, one bottleneck in microbial forensics investigations is the availability of high-quality reference databases of bacterial whole-genome sequences. To be of high quality, databases need to be curated and accurate in terms of sequences, metadata, and genetic diversity coverage. The development of whole-genome sequence databases will be instrumental in successfully tracing pathogens in the future.
VISMapper: ultra-fast exhaustive cartography of viral insertion sites for gene therapy.
Juanes, José M; Gallego, Asunción; Tárraga, Joaquín; Chaves, Felipe J; Marín-Garcia, Pablo; Medina, Ignacio; Arnau, Vicente; Dopazo, Joaquín
2017-09-20
The possibility of integrating viral vectors to become a persistent part of the host genome makes them a crucial element of clinical gene therapy. However, viral integration has associated risks, such as the unintentional activation of oncogenes that can result in cancer. Therefore, the analysis of integration sites of retroviral vectors is a crucial step in developing safer vectors for therapeutic use. Here we present VISMapper, a vector integration site analysis web server, to analyze next-generation sequencing data for retroviral vector integration sites. VISMapper can be found at: http://vismapper.babelomics.org . Because it uses novel mapping algorithms VISMapper is remarkably faster than previous available programs. It also provides a useful graphical interface to analyze the integration sites found in the genomic context.
Palacios-Flores, Kim; García-Sotelo, Jair; Castillo, Alejandra; Uribe, Carina; Aguilar, Luis; Morales, Lucía; Gómez-Romero, Laura; Reyes, José; Garciarubio, Alejandro; Boege, Margareta; Dávila, Guillermo
2018-04-01
We present a conceptually simple, sensitive, precise, and essentially nonstatistical solution for the analysis of genome variation in haploid organisms. The generation of a Perfect Match Genomic Landscape (PMGL), which computes intergenome identity with single nucleotide resolution, reveals signatures of variation wherever a query genome differs from a reference genome. Such signatures encode the precise location of different types of variants, including single nucleotide variants, deletions, insertions, and amplifications, effectively introducing the concept of a general signature of variation. The precise nature of variants is then resolved through the generation of targeted alignments between specific sets of sequence reads and known regions of the reference genome. Thus, the perfect match logic decouples the identification of the location of variants from the characterization of their nature, providing a unified framework for the detection of genome variation. We assessed the performance of the PMGL strategy via simulation experiments. We determined the variation profiles of natural genomes and of a synthetic chromosome, both in the context of haploid yeast strains. Our approach uncovered variants that have previously escaped detection. Moreover, our strategy is ideally suited for further refining high-quality reference genomes. The source codes for the automated PMGL pipeline have been deposited in a public repository. Copyright © 2018 by the Genetics Society of America.
Gruenstaeudl, Michael; Gerschler, Nico; Borsch, Thomas
2018-06-21
The sequencing and comparison of plastid genomes are becoming a standard method in plant genomics, and many researchers are using this approach to infer plant phylogenetic relationships. Due to the widespread availability of next-generation sequencing, plastid genome sequences are being generated at breakneck pace. This trend towards massive sequencing of plastid genomes highlights the need for standardized bioinformatic workflows. In particular, documentation and dissemination of the details of genome assembly, annotation, alignment and phylogenetic tree inference are needed, as these processes are highly sensitive to the choice of software and the precise settings used. Here, we present the procedure and results of sequencing, assembling, annotating and quality-checking of three complete plastid genomes of the aquatic plant genus Cabomba as well as subsequent gene alignment and phylogenetic tree inference. We accompany our findings by a detailed description of the bioinformatic workflow employed. Importantly, we share a total of eleven software scripts for each of these bioinformatic processes, enabling other researchers to evaluate and replicate our analyses step by step. The results of our analyses illustrate that the plastid genomes of Cabomba are highly conserved in both structure and gene content.
Single-cell analysis of population context advances RNAi screening at multiple levels
Snijder, Berend; Sacher, Raphael; Rämö, Pauli; Liberali, Prisca; Mench, Karin; Wolfrum, Nina; Burleigh, Laura; Scott, Cameron C; Verheije, Monique H; Mercer, Jason; Moese, Stefan; Heger, Thomas; Theusner, Kristina; Jurgeit, Andreas; Lamparter, David; Balistreri, Giuseppe; Schelhaas, Mario; De Haan, Cornelis A M; Marjomäki, Varpu; Hyypiä, Timo; Rottier, Peter J M; Sodeik, Beate; Marsh, Mark; Gruenberg, Jean; Amara, Ali; Greber, Urs; Helenius, Ari; Pelkmans, Lucas
2012-01-01
Isogenic cells in culture show strong variability, which arises from dynamic adaptations to the microenvironment of individual cells. Here we study the influence of the cell population context, which determines a single cell's microenvironment, in image-based RNAi screens. We developed a comprehensive computational approach that employs Bayesian and multivariate methods at the single-cell level. We applied these methods to 45 RNA interference screens of various sizes, including 7 druggable genome and 2 genome-wide screens, analysing 17 different mammalian virus infections and four related cell physiological processes. Analysing cell-based screens at this depth reveals widespread RNAi-induced changes in the population context of individual cells leading to indirect RNAi effects, as well as perturbations of cell-to-cell variability regulators. We find that accounting for indirect effects improves the consistency between siRNAs targeted against the same gene, and between replicate RNAi screens performed in different cell lines, in different labs, and with different siRNA libraries. In an era where large-scale RNAi screens are increasingly performed to reach a systems-level understanding of cellular processes, we show that this is often improved by analyses that account for and incorporate the single-cell microenvironment. PMID:22531119
Montaña, Sabrina; Schramm, Sareda T J; Traglia, German Matías; Chiem, Kevin; Parmeciano Di Noto, Gisela; Almuzara, Marisa; Barberis, Claudia; Vay, Carlos; Quiroga, Cecilia; Tolmasky, Marcelo E; Iriarte, Andrés; Ramírez, María Soledad
2016-01-01
Acinetobacter johnsonii rarely causes human infections. While most A. johnsonii isolates are susceptible to virtually all antibiotics, strains harboring a variety of β-lactamases have recently been described. An A. johnsonii Aj2199 clinical strain recovered from a hospital in Buenos Aires produces PER-2 and OXA-58. We decided to delve into its genome by obtaining the whole genome sequence of the Aj2199 strain. Genome comparison studies on Aj2199 revealed 240 unique genes and a close relation to strain WJ10621, isolated from the urine of a patient in China. Genomic analysis showed evidence of horizontal genetic transfer (HGT) events. Forty-five insertion sequences and two intact prophages were found in addition to several resistance determinants such as blaPER-2, blaOXA-58, blaTEM-1, strA, strB, ereA, sul1, aacC2 and a new variant of blaOXA-211, called blaOXA-498. In particular, blaPER-2 and blaTEM-1 are present within the typical contexts previously described in the Enterobacteriaceae family. These results suggest that A. johnsonii actively acquires exogenous DNA from other bacterial species and concomitantly becomes a reservoir of resistance genes.
Consumer Health Informatics Aspects of Direct-to-Consumer Personal Genomic Testing.
Gray, Kathleen; Stephen, Remya; Terrill, Bronwyn; Wilson, Brenda; Middleton, Anna; Tytherleigh, Rigan; Turbitt, Erin; Gaff, Clara; Savard, Jacqueline; Hickerton, Chriselle; Newson, Ainsley; Metcalfe, Sylvia
2017-01-01
This paper uses consumer health informatics as a framework to explore whether and how direct-to-consumer personal genomic testing can be regarded as a form of information which assists consumers to manage their health. It presents findings from qualitative content analysis of web sites that offer testing services, and of transcripts from focus groups conducted as part a study of the Australian public's expectations of personal genomics. Content analysis showed that service offerings have some features of consumer health information but lack consistency. Focus group participants were mostly unfamiliar with the specifics of test reports and related information services. Some of their ideas about aids to knowledge were in line with the benefits described on provider web sites, but some expectations were inflated. People were ambivalent about whether these services would address consumers' health needs, interests and contexts and whether they would support consumers' health self-management decisions and outcomes. There is scope for consumer health informatics approaches to refine the usage and the utility of direct-to-consumer personal genomic testing. Further research may focus on how uptake is affected by consumers' health literacy or by services' engagement with consumers about what they really want.
Consortium biology in immunology: the perspective from the Immunological Genome Project.
Benoist, Christophe; Lanier, Lewis; Merad, Miriam; Mathis, Diane
2012-10-01
Although the field has a long collaborative tradition, immunology has made less use than genetics of 'consortium biology', wherein groups of investigators together tackle large integrated questions or problems. However, immunology is naturally suited to large-scale integrative and systems-level approaches, owing to the multicellular and adaptive nature of the cells it encompasses. Here, we discuss the value and drawbacks of this organization of research, in the context of the long-running 'big science' debate, and consider the opportunities that may exist for the immunology community. We position this analysis in light of our own experience, both positive and negative, as participants of the Immunological Genome Project.
Kang, Zhen; Huang, Hao; Zhang, Yunfeng; Du, Guocheng; Chen, Jian
2017-01-01
Pichia pastoris: (reclassified as Komagataella phaffii), a methylotrophic yeast strain has been widely used for heterologous protein production because of its unique advantages, such as readily achievable high-density fermentation, tractable genetic modifications and typical eukaryotic post-translational modifications. More recently, P. pastoris as a metabolic pathway engineering platform has also gained much attention. In this mini-review, we addressed recent advances of molecular toolboxes, including synthetic promoters, signal peptides, and genome engineering tools that established for P. pastoris. Furthermore, the applications of P. pastoris towards synthetic biology were also discussed and prospected especially in the context of genome-scale metabolic pathway analysis.
Genetic Variation in the Acorn Barnacle from Allozymes to Population Genomics
Flight, Patrick A.; Rand, David M.
2012-01-01
Understanding the patterns of genetic variation within and among populations is a central problem in population and evolutionary genetics. We examine this question in the acorn barnacle, Semibalanus balanoides, in which the allozyme loci Mpi and Gpi have been implicated in balancing selection due to varying selective pressures at different spatial scales. We review the patterns of genetic variation at the Mpi locus, compare this to levels of population differentiation at mtDNA and microsatellites, and place these data in the context of genome-wide variation from high-throughput sequencing of population samples spanning the North Atlantic. Despite considerable geographic variation in the patterns of selection at the Mpi allozyme, this locus shows rather low levels of population differentiation at ecological and trans-oceanic scales (FST ∼ 5%). Pooled population sequencing was performed on samples from Rhode Island (RI), Maine (ME), and Southwold, England (UK). Analysis of more than 650 million reads identified approximately 335,000 high-quality SNPs in 19 million base pairs of the S. balanoides genome. Much variation is shared across the Atlantic, but there are significant examples of strong population differentiation among samples from RI, ME, and UK. An FST outlier screen of more than 22,000 contigs provided a genome-wide context for interpretation of earlier studies on allozymes, mtDNA, and microsatellites. FST values for allozymes, mtDNA and microsatellites are close to the genome-wide average for random SNPs, with the exception of the trans-Atlantic FST for mtDNA. The majority of FST outliers were unique between individual pairs of populations, but some genes show shared patterns of excess differentiation. These data indicate that gene flow is high, that selection is strong on a subset of genes, and that a variety of genes are experiencing diversifying selection at large spatial scales. This survey of polymorphism in S. balanoides provides a number of genomic tools that promise to make this a powerful model for ecological genomics of the rocky intertidal. PMID:22767487
A metabolite-centric view on flux distributions in genome-scale metabolic models
2013-01-01
Background Genome-scale metabolic models are important tools in systems biology. They permit the in-silico prediction of cellular phenotypes via mathematical optimisation procedures, most importantly flux balance analysis. Current studies on metabolic models mostly consider reaction fluxes in isolation. Based on a recently proposed metabolite-centric approach, we here describe a set of methods that enable the analysis and interpretation of flux distributions in an integrated metabolite-centric view. We demonstrate how this framework can be used for the refinement of genome-scale metabolic models. Results We applied the metabolite-centric view developed here to the most recent metabolic reconstruction of Escherichia coli. By compiling the balance sheets of a small number of currency metabolites, we were able to fully characterise the energy metabolism as predicted by the model and to identify a possibility for model refinement in NADPH metabolism. Selected branch points were examined in detail in order to demonstrate how a metabolite-centric view allows identifying functional roles of metabolites. Fructose 6-phosphate aldolase and the sedoheptulose bisphosphate bypass were identified as enzymatic reactions that can carry high fluxes in the model but are unlikely to exhibit significant activity in vivo. Performing a metabolite essentiality analysis, unconstrained import and export of iron ions could be identified as potentially problematic for the quality of model predictions. Conclusions The system-wide analysis of split ratios and branch points allows a much deeper insight into the metabolic network than reaction-centric analyses. Extending an earlier metabolite-centric approach, the methods introduced here establish an integrated metabolite-centric framework for the interpretation of flux distributions in genome-scale metabolic networks that can complement the classical reaction-centric framework. Analysing fluxes and their metabolic context simultaneously opens the door to systems biological interpretations that are not apparent from isolated reaction fluxes. Particularly powerful demonstrations of this are the analyses of the complete metabolic contexts of energy metabolism and the folate-dependent one-carbon pool presented in this work. Finally, a metabolite-centric view on flux distributions can guide the refinement of metabolic reconstructions for specific growth scenarios. PMID:23587327
Molecular definition of the identity and activation of natural killer cells.
Bezman, Natalie A; Kim, Charles C; Sun, Joseph C; Min-Oo, Gundula; Hendricks, Deborah W; Kamimura, Yosuke; Best, J Adam; Goldrath, Ananda W; Lanier, Lewis L
2012-10-01
Using whole-genome microarray data sets of the Immunological Genome Project, we demonstrate a closer transcriptional relationship between NK cells and T cells than between any other leukocytes, distinguished by their shared expression of genes encoding molecules with similar signaling functions. Whereas resting NK cells are known to share expression of a few genes with cytotoxic CD8(+) T cells, our transcriptome-wide analysis demonstrates that the commonalities extend to hundreds of genes, many encoding molecules with unknown functions. Resting NK cells demonstrate a 'preprimed' state compared with naive T cells, which allows NK cells to respond more rapidly to viral infection. Collectively, our data provide a global context for known and previously unknown molecular aspects of NK cell identity and function by delineating the genome-wide repertoire of gene expression of NK cells in various states.
GreenPhylDB v2.0: comparative and functional genomics in plants.
Rouard, Mathieu; Guignon, Valentin; Aluome, Christelle; Laporte, Marie-Angélique; Droc, Gaëtan; Walde, Christian; Zmasek, Christian M; Périn, Christophe; Conte, Matthieu G
2011-01-01
GreenPhylDB is a database designed for comparative and functional genomics based on complete genomes. Version 2 now contains sixteen full genomes of members of the plantae kingdom, ranging from algae to angiosperms, automatically clustered into gene families. Gene families are manually annotated and then analyzed phylogenetically in order to elucidate orthologous and paralogous relationships. The database offers various lists of gene families including plant, phylum and species specific gene families. For each gene cluster or gene family, easy access to gene composition, protein domains, publications, external links and orthologous gene predictions is provided. Web interfaces have been further developed to improve the navigation through information related to gene families. New analysis tools are also available, such as a gene family ontology browser that facilitates exploration. GreenPhylDB is a component of the South Green Bioinformatics Platform (http://southgreen.cirad.fr/) and is accessible at http://greenphyl.cirad.fr. It enables comparative genomics in a broad taxonomy context to enhance the understanding of evolutionary processes and thus tends to speed up gene discovery.
Gene context conservation of a higher order than operons.
Lathe, W C; Snel, B; Bork, P
2000-10-01
Operons, co-transcribed and co-regulated contiguous sets of genes, are poorly conserved over short periods of evolutionary time. The gene order, gene content and regulatory mechanisms of operons can be very different, even in closely related species. Here, we present several lines of evidence which suggest that, although an operon and its individual genes and regulatory structures are rearranged when comparing the genomes of different species, this rearrangement is a conservative process. Genomic rearrangements invariably maintain individual genes in very specific functional and regulatory contexts. We call this conserved context an uber-operon.
Form and function of topologically associating genomic domains in budding yeast.
Eser, Umut; Chandler-Brown, Devon; Ay, Ferhat; Straight, Aaron F; Duan, Zhijun; Noble, William Stafford; Skotheim, Jan M
2017-04-11
The genome of metazoan cells is organized into topologically associating domains (TADs) that have similar histone modifications, transcription level, and DNA replication timing. Although similar structures appear to be conserved in fission yeast, computational modeling and analysis of high-throughput chromosome conformation capture (Hi-C) data have been used to argue that the small, highly constrained budding yeast chromosomes could not have these structures. In contrast, herein we analyze Hi-C data for budding yeast and identify 200-kb scale TADs, whose boundaries are enriched for transcriptional activity. Furthermore, these boundaries separate regions of similarly timed replication origins connecting the long-known effect of genomic context on replication timing to genome architecture. To investigate the molecular basis of TAD formation, we performed Hi-C experiments on cells depleted for the Forkhead transcription factors, Fkh1 and Fkh2, previously associated with replication timing. Forkhead factors do not regulate TAD formation, but do promote longer-range genomic interactions and control interactions between origins near the centromere. Thus, our work defines spatial organization within the budding yeast nucleus, demonstrates the conserved role of genome architecture in regulating DNA replication, and identifies a molecular mechanism specifically regulating interactions between pericentric origins.
Initial sequence and comparative analysis of the cat genome
Pontius, Joan U.; Mullikin, James C.; Smith, Douglas R.; Lindblad-Toh, Kerstin; Gnerre, Sante; Clamp, Michele; Chang, Jean; Stephens, Robert; Neelam, Beena; Volfovsky, Natalia; Schäffer, Alejandro A.; Agarwala, Richa; Narfström, Kristina; Murphy, William J.; Giger, Urs; Roca, Alfred L.; Antunes, Agostinho; Menotti-Raymond, Marilyn; Yuhki, Naoya; Pecon-Slattery, Jill; Johnson, Warren E.; Bourque, Guillaume; Tesler, Glenn; O’Brien, Stephen J.
2007-01-01
The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing ∼65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence. PMID:17975172
A tag-based approach for high-throughput analysis of CCWGG methylation.
Denisova, Oksana V; Chernov, Andrei V; Koledachkina, Tatyana Y; Matvienko, Nicholas I
2007-10-15
Non-CpG methylation occurring in the context of CNG sequences is found in plants at a large number of genomic loci. However, there is still little information available about non-CpG methylation in mammals. Efficient methods that would allow detection of scarcely localized methylated sites in small quantities of DNA are required to elucidate the biological role of non-CpG methylation in both plants and animals. In this study, we tested a new whole genome approach to identify sites of CCWGG methylation (W is A or T), a particular case of CNG methylation, in genomic DNA. This technique is based on digestion of DNAs with methylation-sensitive restriction endonucleases EcoRII-C and AjnI. Short DNAs flanking methylated CCWGG sites (tags) are selectively purified and assembled in tandem arrays of up to nine tags. This allows high-throughput sequencing of tags, identification of flanking regions, and their exact positions in the genome. In this study, we tested specificity and efficiency of the approach.
The UCSC Genome Browser: What Every Molecular Biologist Should Know
Mangan, Mary E.; Williams, Jennifer M.; Kuhn, Robert M.; Lathe, Warren C.
2014-01-01
Electronic data resources can enable molecular biologists to quickly get information from around the world that a decade ago would have been buried in papers scattered throughout the library. The ability to access, query, and display these data make benchwork much more efficient and drive new discoveries. Increasingly, mastery of software resources and corresponding data repositories is required to fully explore the volume of data generated in biomedical and agricultural research, because only small amounts of data are actually found in traditional publications. The UCSC Genome Browser provides a wealth of data and tools that advance understanding of genomic context for many species, enable detailed analysis of data, and provide the ability to interrogate regions of interest across disparate data sets from a wide variety of sources. Researchers can also supplement the standard display with their own data to query and share this with others. Effective use of these resources has become crucial to biological research today, and this unit describes some practical applications of the UCSC Genome Browser. PMID:24984850
A search for H/ACA snoRNAs in yeast using MFE secondary structure prediction.
Edvardsson, Sverker; Gardner, Paul P; Poole, Anthony M; Hendy, Michael D; Penny, David; Moulton, Vincent
2003-05-01
Noncoding RNA genes produce functional RNA molecules rather than coding for proteins. One such family is the H/ACA snoRNAs. Unlike the related C/D snoRNAs these have resisted automated detection to date. We develop an algorithm to screen the yeast genome for novel H/ACA snoRNAs. To achieve this, we introduce some new methods for facilitating the search for noncoding RNAs in genomic sequences which are based on properties of predicted minimum free-energy (MFE) secondary structures. The algorithm has been implemented and can be generalized to enable screening of other eukaryote genomes. We find that use of primary sequence alone is insufficient for identifying novel H/ACA snoRNAs. Only the use of secondary structure filters reduces the number of candidates to a manageable size. From genomic context, we identify three strong H/ACA snoRNA candidates. These together with a further 47 candidates obtained by our analysis are being experimentally screened.
Poland, Jesse
2015-04-01
The revolution of inexpensive sequencing has ushered in an unprecedented age of genomics. The promise of using this technology to accelerate plant breeding is being realized with a vision of genomics-assisted breeding that will lead to rapid genetic gain for expensive and difficult traits. The reality is now that robust phenotypic data is an increasing limiting resource to complement the current wealth of genomic information. While genomics has been hailed as the discipline to fundamentally change the scope of plant breeding, a more symbiotic relationship is likely to emerge. In the context of developing and evaluating large populations needed for functional genomics, none excel in this area more than plant breeders. While genetic studies have long relied on dedicated, well-structured populations, the resources dedicated to these populations in the context of readily available, inexpensive genotyping is making this philosophy less tractable relative to directly focusing functional genomics on material in breeding programs. Through shifting effort for basic genomic studies from dedicated structured populations, to capturing the entire scope of genetic determinants in breeding lines, we can move towards not only furthering our understanding of functional genomics in plants, but also rapidly improving crops for increased food security, availability and nutrition. Copyright © 2015 Elsevier Ltd. All rights reserved.
Verma, Subhash C.; Lu, Jie; Cai, Qiliang; Kosiyatrakul, Settapong; McDowell, Maria E.; Schildkraut, Carl L.; Robertson, Erle S.
2011-01-01
Kaposi's sarcoma associated herpesvirus (KSHV), an etiologic agent of Kaposi's sarcoma, Body Cavity Based Lymphoma and Multicentric Castleman's Disease, establishes lifelong latency in infected cells. The KSHV genome tethers to the host chromosome with the help of a latency associated nuclear antigen (LANA). Additionally, LANA supports replication of the latent origins within the terminal repeats by recruiting cellular factors. Our previous studies identified and characterized another latent origin, which supported the replication of plasmids ex-vivo without LANA expression in trans. Therefore identification of an additional origin site prompted us to analyze the entire KSHV genome for replication initiation sites using single molecule analysis of replicated DNA (SMARD). Our results showed that replication of DNA can initiate throughout the KSHV genome and the usage of these regions is not conserved in two different KSHV strains investigated. SMARD also showed that the utilization of multiple replication initiation sites occurs across large regions of the genome rather than a specified sequence. The replication origin of the terminal repeats showed only a slight preference for their usage indicating that LANA dependent origin at the terminal repeats (TR) plays only a limited role in genome duplication. Furthermore, we performed chromatin immunoprecipitation for ORC2 and MCM3, which are part of the pre-replication initiation complex to determine the genomic sites where these proteins accumulate, to provide further characterization of potential replication initiation sites on the KSHV genome. The ChIP data confirmed accumulation of these pre-RC proteins at multiple genomic sites in a cell cycle dependent manner. Our data also show that both the frequency and the sites of replication initiation vary within the two KSHV genomes studied here, suggesting that initiation of replication is likely to be affected by the genomic context rather than the DNA sequences. PMID:22072974
Chi, Bryan; DeLeeuw, Ronald J; Coe, Bradley P; MacAulay, Calum; Lam, Wan L
2004-02-09
Array comparative genomic hybridization (CGH) is a technique which detects copy number differences in DNA segments. Complete sequencing of the human genome and the development of an array representing a tiling set of tens of thousands of DNA segments spanning the entire human genome has made high resolution copy number analysis throughout the genome possible. Since array CGH provides signal ratio for each DNA segment, visualization would require the reassembly of individual data points into chromosome profiles. We have developed a visualization tool for displaying whole genome array CGH data in the context of chromosomal location. SeeGH is an application that translates spot signal ratio data from array CGH experiments to displays of high resolution chromosome profiles. Data is imported from a simple tab delimited text file obtained from standard microarray image analysis software. SeeGH processes the signal ratio data and graphically displays it in a conventional CGH karyotype diagram with the added features of magnification and DNA segment annotation. In this process, SeeGH imports the data into a database, calculates the average ratio and standard deviation for each replicate spot, and links them to chromosome regions for graphical display. Once the data is displayed, users have the option of hiding or flagging DNA segments based on user defined criteria, and retrieve annotation information such as clone name, NCBI sequence accession number, ratio, base pair position on the chromosome, and standard deviation. SeeGH represents a novel software tool used to view and analyze array CGH data. The software gives users the ability to view the data in an overall genomic view as well as magnify specific chromosomal regions facilitating the precise localization of genetic alterations. SeeGH is easily installed and runs on Microsoft Windows 2000 or later environments.
Pacheco-Arjona, Jose Ramon; Ramirez-Prado, Jorge Humberto
2014-01-01
The cell wall is a protective and versatile structure distributed in all fungi. The component responsible for its rigidity is chitin, a product of chitin synthase (Chsp) enzymes. There are seven classes of chitin synthase genes (CHS) and the amount and type encoded in fungal genomes varies considerably from one species to another. Previous Chsp sequence analyses focused on their study as individual units, regardless of genomic context. The identification of blocks of conserved genes between genomes can provide important clues about the interactions and localization of chitin synthases. On the present study, we carried out an in silico search of all putative Chsp encoded in 54 full fungal genomes, encompassing 21 orders from five phyla. Phylogenetic studies of these Chsp were able to confidently classify 347 out of the 369 Chsp identified (94%). Patterns in the distribution of Chsp related to taxonomy were identified, the most prominent being related to the type of fungal growth. More importantly, a synteny analysis for genomic blocks centered on class IV Chsp (the most abundant and widely distributed Chsp class) identified a putative cell wall metabolism gene cluster in members of the genus Aspergillus, the first such association reported for any fungal genome. PMID:25148134
Herrera, Carlos M; Bazaga, Pilar
2010-08-01
*In plants, epigenetic variations based on DNA methylation are often heritable and could influence the course of evolution. Before this hypothesis can be assessed, fundamental questions about epigenetic variation remain to be addressed in a real-world context, including its magnitude, structuring within and among natural populations, and autonomy in relation to the genetic context. *Extent and patterns of cytosine methylation, and the relationship to adaptive genetic divergence between populations, were investigated for wild populations of the southern Spanish violet Viola cazorlensis (Violaceae) using the methylation-sensitive amplified polymorphism (MSAP) technique, a modification of the amplified fragment length polymorphism method (AFLP) based on the differential sensitivity of isoschizomeric restriction enzymes to site-specific cytosine methylation. *The genome of V. cazorlensis plants exhibited extensive levels of methylation, and methylation-based epigenetic variation was structured into distinct between- and within- population components. Epigenetic differentiation of populations was correlated with adaptive genetic divergence revealed by a Bayesian population-genomic analysis of AFLP data. Significant associations existed at the individual genome level between adaptive AFLP loci and the methylation state of methylation-susceptible MSAP loci. *Population-specific, divergent patterns of correlated selection on epigenetic and genetic individual variation could account for the coordinated epigenetic-genetic adaptive population differentiation revealed by this study.
Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V
2009-06-03
The prokaryotic toxin-antitoxin systems (TAS, also referred to as TA loci) are widespread, mobile two-gene modules that can be viewed as selfish genetic elements because they evolved mechanisms to become addictive for replicons and cells in which they reside, but also possess "normal" cellular functions in various forms of stress response and management of prokaryotic population. Several distinct TAS of type 1, where the toxin is a protein and the antitoxin is an antisense RNA, and numerous, unrelated TAS of type 2, in which both the toxin and the antitoxin are proteins, have been experimentally characterized, and it is suspected that many more remain to be identified. We report a comprehensive comparative-genomic analysis of Type 2 toxin-antitoxin systems in prokaryotes. Using sensitive methods for distant sequence similarity search, genome context analysis and a new approach for the identification of mobile two-component systems, we identified numerous, previously unnoticed protein families that are homologous to toxins and antitoxins of known type 2 TAS. In addition, we predict 12 new families of toxins and 13 families of antitoxins, and also, predict a TAS or TAS-like activity for several gene modules that were not previously suspected to function in that capacity. In particular, we present indications that the two-gene module that encodes a minimal nucleotidyl transferase and the accompanying HEPN protein, and is extremely abundant in many archaea and bacteria, especially, thermophiles might comprise a novel TAS. We present a survey of previously known and newly predicted TAS in 750 complete genomes of archaea and bacteria, quantitatively demonstrate the exceptional mobility of the TAS, and explore the network of toxin-antitoxin pairings that combines plasticity with selectivity. The defining properties of the TAS, namely, the typically small size of the toxin and antitoxin genes, fast evolution, and extensive horizontal mobility, make the task of comprehensive identification of these systems particularly challenging. However, these same properties can be exploited to develop context-based computational approaches which, combined with exhaustive analysis of subtle sequence similarities were employed in this work to substantially expand the current collection of TAS by predicting both previously unnoticed, derived versions of known toxins and antitoxins, and putative novel TAS-like systems. In a broader context, the TAS belong to the resistome domain of the prokaryotic mobilome which includes partially selfish, addictive gene cassettes involved in various aspects of stress response and organized under the same general principles as the TAS. The "selfish altruism", or "responsible selfishness", of TAS-like systems appears to be a defining feature of the resistome and an important characteristic of the entire prokaryotic pan-genome given that in the prokaryotic world the mobilome and the "stable" chromosomes form a dynamic continuum. This paper was reviewed by Kenn Gerdes (nominated by Arcady Mushegian), Daniel Haft, Arcady Mushegian, and Andrei Osterman. For full reviews, go to the Reviewers' Reports section.
Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V
2009-01-01
Background The prokaryotic toxin-antitoxin systems (TAS, also referred to as TA loci) are widespread, mobile two-gene modules that can be viewed as selfish genetic elements because they evolved mechanisms to become addictive for replicons and cells in which they reside, but also possess "normal" cellular functions in various forms of stress response and management of prokaryotic population. Several distinct TAS of type 1, where the toxin is a protein and the antitoxin is an antisense RNA, and numerous, unrelated TAS of type 2, in which both the toxin and the antitoxin are proteins, have been experimentally characterized, and it is suspected that many more remain to be identified. Results We report a comprehensive comparative-genomic analysis of Type 2 toxin-antitoxin systems in prokaryotes. Using sensitive methods for distant sequence similarity search, genome context analysis and a new approach for the identification of mobile two-component systems, we identified numerous, previously unnoticed protein families that are homologous to toxins and antitoxins of known type 2 TAS. In addition, we predict 12 new families of toxins and 13 families of antitoxins, and also, predict a TAS or TAS-like activity for several gene modules that were not previously suspected to function in that capacity. In particular, we present indications that the two-gene module that encodes a minimal nucleotidyl transferase and the accompanying HEPN protein, and is extremely abundant in many archaea and bacteria, especially, thermophiles might comprise a novel TAS. We present a survey of previously known and newly predicted TAS in 750 complete genomes of archaea and bacteria, quantitatively demonstrate the exceptional mobility of the TAS, and explore the network of toxin-antitoxin pairings that combines plasticity with selectivity. Conclusion The defining properties of the TAS, namely, the typically small size of the toxin and antitoxin genes, fast evolution, and extensive horizontal mobility, make the task of comprehensive identification of these systems particularly challenging. However, these same properties can be exploited to develop context-based computational approaches which, combined with exhaustive analysis of subtle sequence similarities were employed in this work to substantially expand the current collection of TAS by predicting both previously unnoticed, derived versions of known toxins and antitoxins, and putative novel TAS-like systems. In a broader context, the TAS belong to the resistome domain of the prokaryotic mobilome which includes partially selfish, addictive gene cassettes involved in various aspects of stress response and organized under the same general principles as the TAS. The "selfish altruism", or "responsible selfishness", of TAS-like systems appears to be a defining feature of the resistome and an important characteristic of the entire prokaryotic pan-genome given that in the prokaryotic world the mobilome and the "stable" chromosomes form a dynamic continuum. Reviewers This paper was reviewed by Kenn Gerdes (nominated by Arcady Mushegian), Daniel Haft, Arcady Mushegian, and Andrei Osterman. For full reviews, go to the Reviewers' Reports section. PMID:19493340
Reconstruction of a Bacterial Genome from DNA Cassettes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Christopher Dupont; John Glass; Laura Sheahan
2011-12-31
This basic research program comprised two major areas: (1) acquisition and analysis of marine microbial metagenomic data and development of genomic analysis tools for broad, external community use; (2) development of a minimal bacterial genome. Our Marine Metagenomic Diversity effort generated and analyzed shotgun sequencing data from microbial communities sampled from over 250 sites around the world. About 40% of the 26 Gbp of sequence data has been made publicly available to date with a complete release anticipated in six months. Our results and those mining the deposited data have revealed a vast diversity of genes coding for critical metabolicmore » processes whose phylogenetic and geographic distributions will enable a deeper understanding of carbon and nutrient cycling, microbial ecology, and rapid rate evolutionary processes such as horizontal gene transfer by viruses and plasmids. A global assembly of the generated dataset resulted in a massive set (5Gbp) of genome fragments that provide context to the majority of the generated data that originated from uncultivated organisms. Our Synthetic Biology team has made significant progress towards the goal of synthesizing a minimal mycoplasma genome that will have all of the machinery for independent life. This project, once completed, will provide fundamentally new knowledge about requirements for microbial life and help to lay a basic research foundation for developing microbiological approaches to bioenergy.« less
Pang, Chi Nam Ignatius; Tay, Aidan P; Aya, Carlos; Twine, Natalie A; Harkness, Linda; Hart-Smith, Gene; Chia, Samantha Z; Chen, Zhiliang; Deshpande, Nandan P; Kaakoush, Nadeem O; Mitchell, Hazel M; Kassem, Moustapha; Wilkins, Marc R
2014-01-03
Direct links between proteomic and genomic/transcriptomic data are not frequently made, partly because of lack of appropriate bioinformatics tools. To help address this, we have developed the PG Nexus pipeline. The PG Nexus allows users to covisualize peptides in the context of genomes or genomic contigs, along with RNA-seq reads. This is done in the Integrated Genome Viewer (IGV). A Results Analyzer reports the precise base position where LC-MS/MS-derived peptides cover genes or gene isoforms, on the chromosomes or contigs where this occurs. In prokaryotes, the PG Nexus pipeline facilitates the validation of genes, where annotation or gene prediction is available, or the discovery of genes using a "virtual protein"-based unbiased approach. We illustrate this with a comprehensive proteogenomics analysis of two strains of Campylobacter concisus . For higher eukaryotes, the PG Nexus facilitates gene validation and supports the identification of mRNA splice junction boundaries and splice variants that are protein-coding. This is illustrated with an analysis of splice junctions covered by human phosphopeptides, and other examples of relevance to the Chromosome-Centric Human Proteome Project. The PG Nexus is open-source and available from https://github.com/IntersectAustralia/ap11_Samifier. It has been integrated into Galaxy and made available in the Galaxy tool shed.
Fleischmann, Andreas; Michael, Todd P.; Rivadavia, Fernando; Sousa, Aretuza; Wang, Wenqin; Temsch, Eva M.; Greilhuber, Johann; Müller, Kai F.; Heubl, Günther
2014-01-01
Background and Aims Some species of Genlisea possess ultrasmall nuclear genomes, the smallest known among angiosperms, and some have been found to have chromosomes of diminutive size, which may explain why chromosome numbers and karyotypes are not known for the majority of species of the genus. However, other members of the genus do not possess ultrasmall genomes, nor do most taxa studied in related genera of the family or order. This study therefore examined the evolution of genome sizes and chromosome numbers in Genlisea in a phylogenetic context. The correlations of genome size with chromosome number and size, with the phylogeny of the group and with growth forms and habitats were also examined. Methods Nuclear genome sizes were measured from cultivated plant material for a comprehensive sampling of taxa, including nearly half of all species of Genlisea and representing all major lineages. Flow cytometric measurements were conducted in parallel in two laboratories in order to compare the consistency of different methods and controls. Chromosome counts were performed for the majority of taxa, comparing different staining techniques for the ultrasmall chromosomes. Key Results Genome sizes of 15 taxa of Genlisea are presented and interpreted in a phylogenetic context. A high degree of congruence was found between genome size distribution and the major phylogenetic lineages. Ultrasmall genomes with 1C values of <100 Mbp were almost exclusively found in a derived lineage of South American species. The ancestral haploid chromosome number was inferred to be n = 8. Chromosome numbers in Genlisea ranged from 2n = 2x = 16 to 2n = 4x = 32. Ascendant dysploid series (2n = 36, 38) are documented for three derived taxa. The different ploidy levels corresponded to the two subgenera, but were not directly correlated to differences in genome size; the three different karyotype ranges mirrored the different sections of the genus. The smallest known plant genomes were not found in G. margaretae, as previously reported, but in G. tuberosa (1C ≈ 61 Mbp) and some strains of G. aurea (1C ≈ 64 Mbp). Conclusions Genlisea is an ideal candidate model organism for the understanding of genome reduction as the genus includes species with both relatively large (∼1700 Mbp) and ultrasmall (∼61 Mbp) genomes. This comparative, phylogeny-based analysis of genome sizes and karyotypes in Genlisea provides essential data for selection of suitable species for comparative whole-genome analyses, as well as for further studies on both the molecular and cytogenetic basis of genome reduction in plants. PMID:25274549
Belkorchia, Abdel; Biderre, Corinne; Militon, Cécile; Polonais, Valérie; Wincker, Patrick; Jubin, Claire; Delbac, Frédéric; Peyretaillade, Eric; Peyret, Pierre
2008-03-01
Brachiola algerae has a broad host spectrum from human to mosquitoes. The successful infection of two mosquito cell lines (Mos55: embryonic cells and Sua 4.0: hemocyte-like cells) and a human cell line (HFF) highlights the efficient adaptive capacity of this microsporidian pathogen. The molecular karyotype of this microsporidian species was determined in the context of the B. algerae genome sequencing project, showing that its haploid genome consists of 30 chromosomal-sized DNAs ranging from 160 to 2240 kbp giving an estimated genome size of 23 Mbp. A contig of 12,269 bp including the DNA sequence of the B. algerae ribosomal transcription unit has been built from initial genomic sequences and the secondary structure of the large subunit rRNA constructed. The data obtained indicate that B. algerae should be an excellent parasitic model to understand genome evolution in relation to infectious capacity.
Kuo, Wen-Hua
2011-10-01
This paper compares the development of genomics as a form of state project in Japan and Taiwan. Broadening the concepts of genomic sovereignty and bionationalism, I argue that the establishment and use of genomic databases vary according to techno-political context. While both Japan and Taiwan hold population-based databases to be necessary for scientific advance and competitiveness, they differ in how they have attempted to transform the information produced by databases into regulatory schemes for drug approval. The effectiveness of Taiwan's biobank is severely limited by the IRB reviewing process. By contrast, while updating its regulations for drug approval, Japan, is using pharmacogenomics to deal with matters relating to ethnic identity. By analysing genomic initiatives in the political context that nurtures them, this paper seeks to capture how global science and local societies interact and offers insight into the assessment of state-sponsored science in East Asia as they become transnational. Copyright © 2011 Elsevier Ltd. All rights reserved.
DNA methylation in amphioxus: from ancestral functions to new roles in vertebrates.
Albalat, Ricard; Martí-Solans, Josep; Cañestro, Cristian
2012-03-01
In vertebrates, DNA methylation is an epigenetic mechanism that modulates gene transcription, and plays crucial roles during development, cell fate maintenance, germ cell pluripotency and inheritable genome imprinting. DNA methylation might also play a role as a genome defense mechanism against the mutational activity derived from transposon mobility. In contrast to the heavily methylated genomes in vertebrates, most genomes in invertebrates are poorly or just moderately methylated, and the function of DNA methylation remains unclear. Here, we review the DNA methylation system in the cephalochordate amphioxus, which belongs to the most basally divergent group of our own phylum, the chordates. First, surveys of the amphioxus genome database reveal the presence of the DNA methylation machinery, DNA methyltransferases and methyl-CpG-binding domain proteins. Second, comparative genomics and analyses of conserved synteny between amphioxus and vertebrates provide robust evidence that the DNA methylation machinery of amphioxus represents the ancestral toolkit of chordates, and that its expansion in vertebrates was originated by the two rounds of whole-genome duplication that occurred in stem vertebrates. Third, in silico analysis of CpGo/e ratios throughout the amphioxus genome suggests a bimodal distribution of DNA methylation, consistent with a mosaic pattern comprising domains of methylated DNA interspersed with domains of unmethylated DNA, similar to the situation described in ascidians, but radically different to the globally methylated vertebrate genomes. Finally, we discuss potential roles of the DNA methylation system in amphioxus in the context of chordate genome evolution and the origin of vertebrates.
Cameron, Linda D.; Biesecker, Barbara Bowles; Peters, Ellen; Taber, Jennifer M.; Klein, William M. P.
2017-01-01
Advances in theory and research on self-regulation and decision-making processes have yielded important insights into how cognitive, emotional, and social processes shape risk perceptions and risk-related decisions. We examine how self-regulation theory can be applied to inform our understanding of decision-making processes within the context of genomic testing, a clinical arena in which individuals face complex risk information and potentially life-altering decisions. After presenting key principles of self-regulation, we present a genomic testing case example to illustrate how principles related to risk representations, approach and avoidance motivations, emotion regulation, defensive responses, temporal construals, and capacities such as numeric abilities can shape decisions and psychological responses during the genomic testing process. We conclude with implications for using self-regulation theory to advance science within genomic testing and opportunities for how this research can inform further developments in self-regulation theory. PMID:29225669
Cameron, Linda D; Biesecker, Barbara Bowles; Peters, Ellen; Taber, Jennifer M; Klein, William M P
2017-05-01
Advances in theory and research on self-regulation and decision-making processes have yielded important insights into how cognitive, emotional, and social processes shape risk perceptions and risk-related decisions. We examine how self-regulation theory can be applied to inform our understanding of decision-making processes within the context of genomic testing, a clinical arena in which individuals face complex risk information and potentially life-altering decisions. After presenting key principles of self-regulation, we present a genomic testing case example to illustrate how principles related to risk representations, approach and avoidance motivations, emotion regulation, defensive responses, temporal construals, and capacities such as numeric abilities can shape decisions and psychological responses during the genomic testing process. We conclude with implications for using self-regulation theory to advance science within genomic testing and opportunities for how this research can inform further developments in self-regulation theory.
GUIDEseq: a bioconductor package to analyze GUIDE-Seq datasets for CRISPR-Cas nucleases.
Zhu, Lihua Julie; Lawrence, Michael; Gupta, Ankit; Pagès, Hervé; Kucukural, Alper; Garber, Manuel; Wolfe, Scot A
2017-05-15
Genome editing technologies developed around the CRISPR-Cas9 nuclease system have facilitated the investigation of a broad range of biological questions. These nucleases also hold tremendous promise for treating a variety of genetic disorders. In the context of their therapeutic application, it is important to identify the spectrum of genomic sequences that are cleaved by a candidate nuclease when programmed with a particular guide RNA, as well as the cleavage efficiency of these sites. Powerful new experimental approaches, such as GUIDE-seq, facilitate the sensitive, unbiased genome-wide detection of nuclease cleavage sites within the genome. Flexible bioinformatics analysis tools for processing GUIDE-seq data are needed. Here, we describe an open source, open development software suite, GUIDEseq, for GUIDE-seq data analysis and annotation as a Bioconductor package in R. The GUIDEseq package provides a flexible platform with more than 60 adjustable parameters for the analysis of datasets associated with custom nuclease applications. These parameters allow data analysis to be tailored to different nuclease platforms with different length and complexity in their guide and PAM recognition sequences or their DNA cleavage position. They also enable users to customize sequence aggregation criteria, and vary peak calling thresholds that can influence the number of potential off-target sites recovered. GUIDEseq also annotates potential off-target sites that overlap with genes based on genome annotation information, as these may be the most important off-target sites for further characterization. In addition, GUIDEseq enables the comparison and visualization of off-target site overlap between different datasets for a rapid comparison of different nuclease configurations or experimental conditions. For each identified off-target, the GUIDEseq package outputs mapped GUIDE-Seq read count as well as cleavage score from a user specified off-target cleavage score prediction algorithm permitting the identification of genomic sequences with unexpected cleavage activity. The GUIDEseq package enables analysis of GUIDE-data from various nuclease platforms for any species with a defined genomic sequence. This software package has been used successfully to analyze several GUIDE-seq datasets. The software, source code and documentation are freely available at http://www.bioconductor.org/packages/release/bioc/html/GUIDEseq.html .
Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs
Green, Richard E; Braun, Edward L; Armstrong, Joel; Earl, Dent; Nguyen, Ngan; Hickey, Glenn; Vandewege, Michael W; St John, John A; Capella-Gutiérrez, Salvador; Castoe, Todd A; Kern, Colin; Fujita, Matthew K; Opazo, Juan C; Jurka, Jerzy; Kojima, Kenji K; Caballero, Juan; Hubley, Robert M; Smit, Arian F; Platt, Roy N; Lavoie, Christine A; Ramakodi, Meganathan P; Finger, John W; Suh, Alexander; Isberg, Sally R; Miles, Lee; Chong, Amanda Y; Jaratlerdsiri, Weerachai; Gongora, Jaime; Moran, Christopher; Iriarte, Andrés; McCormack, John; Burgess, Shane C; Edwards, Scott V; Lyons, Eric; Williams, Christina; Breen, Matthew; Howard, Jason T; Gresham, Cathy R; Peterson, Daniel G; Schmitz, Jürgen; Pollock, David D; Haussler, David; Triplett, Eric W; Zhang, Guojie; Irie, Naoki; Jarvis, Erich D; Brochu, Christopher A; Schmidt, Carl J; McCarthy, Fiona M; Faircloth, Brant C; Hoffmann, Federico G; Glenn, Travis C; Gabaldón, Toni; Paten, Benedict; Ray, David A
2015-01-01
To provide context for the diversifications of archosaurs, the group that includes crocodilians, dinosaurs and birds, we generated draft genomes of three crocodilians, Alligator mississippiensis (the American alligator), Crocodylus porosus (the saltwater crocodile), and Gavialis gangeticus (the Indian gharial). We observed an exceptionally slow rate of genome evolution within crocodilians at all levels, including nucleotide substitutions, indels, transposable element content and movement, gene family evolution, and chromosomal synteny. When placed within the context of related taxa including birds and turtles, this suggests that the common ancestor of all of these taxa also exhibited slow genome evolution and that the relatively rapid evolution of bird genomes represents an autapomorphy within that clade. The data also provided the opportunity to analyze heterozygosity in crocodilians, which indicates a likely reduction in population size for all three taxa through the Pleistocene. Finally, these new data combined with newly published bird genomes allowed us to reconstruct the partial genome of the common ancestor of archosaurs providing a tool to investigate the genetic starting material of crocodilians, birds, and dinosaurs. PMID:25504731
Agren, Rasmus; Liu, Liming; Shoaie, Saeed; Vongsangnak, Wanwipa; Nookaew, Intawat; Nielsen, Jens
2013-01-01
We present the RAVEN (Reconstruction, Analysis and Visualization of Metabolic Networks) Toolbox: a software suite that allows for semi-automated reconstruction of genome-scale models. It makes use of published models and/or the KEGG database, coupled with extensive gap-filling and quality control features. The software suite also contains methods for visualizing simulation results and omics data, as well as a range of methods for performing simulations and analyzing the results. The software is a useful tool for system-wide data analysis in a metabolic context and for streamlined reconstruction of metabolic networks based on protein homology. The RAVEN Toolbox workflow was applied in order to reconstruct a genome-scale metabolic model for the important microbial cell factory Penicillium chrysogenum Wisconsin54-1255. The model was validated in a bibliomic study of in total 440 references, and it comprises 1471 unique biochemical reactions and 1006 ORFs. It was then used to study the roles of ATP and NADPH in the biosynthesis of penicillin, and to identify potential metabolic engineering targets for maximization of penicillin production. PMID:23555215
High-Performance Mixed Models Based Genome-Wide Association Analysis with omicABEL software
Fabregat-Traver, Diego; Sharapov, Sodbo Zh.; Hayward, Caroline; Rudan, Igor; Campbell, Harry; Aulchenko, Yurii; Bientinesi, Paolo
2014-01-01
To raise the power of genome-wide association studies (GWAS) and avoid false-positive results in structured populations, one can rely on mixed model based tests. When large samples are used, and when multiple traits are to be studied in the ’omics’ context, this approach becomes computationally challenging. Here we consider the problem of mixed-model based GWAS for arbitrary number of traits, and demonstrate that for the analysis of single-trait and multiple-trait scenarios different computational algorithms are optimal. We implement these optimal algorithms in a high-performance computing framework that uses state-of-the-art linear algebra kernels, incorporates optimizations, and avoids redundant computations, increasing throughput while reducing memory usage and energy consumption. We show that, compared to existing libraries, our algorithms and software achieve considerable speed-ups. The OmicABEL software described in this manuscript is available under the GNU GPL v. 3 license as part of the GenABEL project for statistical genomics at http: //www.genabel.org/packages/OmicABEL. PMID:25717363
High-Performance Mixed Models Based Genome-Wide Association Analysis with omicABEL software.
Fabregat-Traver, Diego; Sharapov, Sodbo Zh; Hayward, Caroline; Rudan, Igor; Campbell, Harry; Aulchenko, Yurii; Bientinesi, Paolo
2014-01-01
To raise the power of genome-wide association studies (GWAS) and avoid false-positive results in structured populations, one can rely on mixed model based tests. When large samples are used, and when multiple traits are to be studied in the 'omics' context, this approach becomes computationally challenging. Here we consider the problem of mixed-model based GWAS for arbitrary number of traits, and demonstrate that for the analysis of single-trait and multiple-trait scenarios different computational algorithms are optimal. We implement these optimal algorithms in a high-performance computing framework that uses state-of-the-art linear algebra kernels, incorporates optimizations, and avoids redundant computations, increasing throughput while reducing memory usage and energy consumption. We show that, compared to existing libraries, our algorithms and software achieve considerable speed-ups. The OmicABEL software described in this manuscript is available under the GNU GPL v. 3 license as part of the GenABEL project for statistical genomics at http: //www.genabel.org/packages/OmicABEL.
Kehdy, Fernanda S G; Gouveia, Mateus H; Machado, Moara; Magalhães, Wagner C S; Horimoto, Andrea R; Horta, Bernardo L; Moreira, Rennan G; Leal, Thiago P; Scliar, Marilia O; Soares-Souza, Giordano B; Rodrigues-Soares, Fernanda; Araújo, Gilderlanio S; Zamudio, Roxana; Sant Anna, Hanaisa P; Santos, Hadassa C; Duarte, Nubia E; Fiaccone, Rosemeire L; Figueiredo, Camila A; Silva, Thiago M; Costa, Gustavo N O; Beleza, Sandra; Berg, Douglas E; Cabrera, Lilia; Debortoli, Guilherme; Duarte, Denise; Ghirotto, Silvia; Gilman, Robert H; Gonçalves, Vanessa F; Marrero, Andrea R; Muniz, Yara C; Weissensteiner, Hansi; Yeager, Meredith; Rodrigues, Laura C; Barreto, Mauricio L; Lima-Costa, M Fernanda; Pereira, Alexandre C; Rodrigues, Maíra R; Tarazona-Santos, Eduardo
2015-07-14
While South Americans are underrepresented in human genomic diversity studies, Brazil has been a classical model for population genetics studies on admixture. We present the results of the EPIGEN Brazil Initiative, the most comprehensive up-to-date genomic analysis of any Latin-American population. A population-based genome-wide analysis of 6,487 individuals was performed in the context of worldwide genomic diversity to elucidate how ancestry, kinship, and inbreeding interact in three populations with different histories from the Northeast (African ancestry: 50%), Southeast, and South (both with European ancestry >70%) of Brazil. We showed that ancestry-positive assortative mating permeated Brazilian history. We traced European ancestry in the Southeast/South to a wider European/Middle Eastern region with respect to the Northeast, where ancestry seems restricted to Iberia. By developing an approximate Bayesian computation framework, we infer more recent European immigration to the Southeast/South than to the Northeast. Also, the observed low Native-American ancestry (6-8%) was mostly introduced in different regions of Brazil soon after the European Conquest. We broadened our understanding of the African diaspora, the major destination of which was Brazil, by revealing that Brazilians display two within-Africa ancestry components: one associated with non-Bantu/western Africans (more evident in the Northeast and African Americans) and one associated with Bantu/eastern Africans (more present in the Southeast/South). Furthermore, the whole-genome analysis of 30 individuals (42-fold deep coverage) shows that continental admixture rather than local post-Columbian history is the main and complex determinant of the individual amount of deleterious genotypes.
Power, Robert A; Cohen-Woods, Sarah; Ng, Mandy Y; Butler, Amy W; Craddock, Nick; Korszun, Ania; Jones, Lisa; Jones, Ian; Gill, Michael; Rice, John P; Maier, Wolfgang; Zobel, Astrid; Mors, Ole; Placentino, Anna; Rietschel, Marcella; Aitchison, Katherine J; Tozzi, Federica; Muglia, Pierandrea; Breen, Gerome; Farmer, Anne E; McGuffin, Peter; Lewis, Cathryn M; Uher, Rudolf
2013-09-01
Stressful life events are an established trigger for depression and may contribute to the heterogeneity within genome-wide association analyses. With depression cases showing an excess of exposure to stressful events compared to controls, there is difficulty in distinguishing between "true" cases and a "normal" response to a stressful environment. This potential contamination of cases, and that from genetically at risk controls that have not yet experienced environmental triggers for onset, may reduce the power of studies to detect causal variants. In the RADIANT sample of 3,690 European individuals, we used propensity score matching to pair cases and controls on exposure to stressful life events. In 805 case-control pairs matched on stressful life event, we tested the influence of 457,670 common genetic variants on the propensity to depression under comparable level of adversity with a sign test. While this analysis produced no significant findings after genome-wide correction for multiple testing, we outline a novel methodology and perspective for providing environmental context in genetic studies. We recommend contextualizing depression by incorporating environmental exposure into genome-wide analyses as a complementary approach to testing gene-environment interactions. Possible explanations for negative findings include a lack of statistical power due to small sample size and conditional effects, resulting from the low rate of adequate matching. Our findings underscore the importance of collecting information on environmental risk factors in studies of depression and other complex phenotypes, so that sufficient sample sizes are available to investigate their effect in genome-wide association analysis. Copyright © 2013 Wiley Periodicals, Inc.
Kehdy, Fernanda S. G.; Gouveia, Mateus H.; Machado, Moara; Magalhães, Wagner C. S.; Horimoto, Andrea R.; Horta, Bernardo L.; Moreira, Rennan G.; Leal, Thiago P.; Scliar, Marilia O.; Soares-Souza, Giordano B.; Rodrigues-Soares, Fernanda; Araújo, Gilderlanio S.; Zamudio, Roxana; Sant Anna, Hanaisa P.; Santos, Hadassa C.; Duarte, Nubia E.; Fiaccone, Rosemeire L.; Figueiredo, Camila A.; Silva, Thiago M.; Costa, Gustavo N. O.; Beleza, Sandra; Berg, Douglas E.; Cabrera, Lilia; Debortoli, Guilherme; Duarte, Denise; Ghirotto, Silvia; Gilman, Robert H.; Gonçalves, Vanessa F.; Marrero, Andrea R.; Muniz, Yara C.; Weissensteiner, Hansi; Yeager, Meredith; Rodrigues, Laura C.; Barreto, Mauricio L.; Lima-Costa, M. Fernanda; Pereira, Alexandre C.; Rodrigues, Maíra R.; Tarazona-Santos, Eduardo
2015-01-01
While South Americans are underrepresented in human genomic diversity studies, Brazil has been a classical model for population genetics studies on admixture. We present the results of the EPIGEN Brazil Initiative, the most comprehensive up-to-date genomic analysis of any Latin-American population. A population-based genome-wide analysis of 6,487 individuals was performed in the context of worldwide genomic diversity to elucidate how ancestry, kinship, and inbreeding interact in three populations with different histories from the Northeast (African ancestry: 50%), Southeast, and South (both with European ancestry >70%) of Brazil. We showed that ancestry-positive assortative mating permeated Brazilian history. We traced European ancestry in the Southeast/South to a wider European/Middle Eastern region with respect to the Northeast, where ancestry seems restricted to Iberia. By developing an approximate Bayesian computation framework, we infer more recent European immigration to the Southeast/South than to the Northeast. Also, the observed low Native-American ancestry (6–8%) was mostly introduced in different regions of Brazil soon after the European Conquest. We broadened our understanding of the African diaspora, the major destination of which was Brazil, by revealing that Brazilians display two within-Africa ancestry components: one associated with non-Bantu/western Africans (more evident in the Northeast and African Americans) and one associated with Bantu/eastern Africans (more present in the Southeast/South). Furthermore, the whole-genome analysis of 30 individuals (42-fold deep coverage) shows that continental admixture rather than local post-Columbian history is the main and complex determinant of the individual amount of deleterious genotypes. PMID:26124090
Ontology-based meta-analysis of global collections of high-throughput public data.
Kupershmidt, Ilya; Su, Qiaojuan Jane; Grewal, Anoop; Sundaresh, Suman; Halperin, Inbal; Flynn, James; Shekar, Mamatha; Wang, Helen; Park, Jenny; Cui, Wenwu; Wall, Gregory D; Wisotzkey, Robert; Alag, Satnam; Akhtari, Saeid; Ronaghi, Mostafa
2010-09-29
The investigation of the interconnections between the molecular and genetic events that govern biological systems is essential if we are to understand the development of disease and design effective novel treatments. Microarray and next-generation sequencing technologies have the potential to provide this information. However, taking full advantage of these approaches requires that biological connections be made across large quantities of highly heterogeneous genomic datasets. Leveraging the increasingly huge quantities of genomic data in the public domain is fast becoming one of the key challenges in the research community today. We have developed a novel data mining framework that enables researchers to use this growing collection of public high-throughput data to investigate any set of genes or proteins. The connectivity between molecular states across thousands of heterogeneous datasets from microarrays and other genomic platforms is determined through a combination of rank-based enrichment statistics, meta-analyses, and biomedical ontologies. We address data quality concerns through dataset replication and meta-analysis and ensure that the majority of the findings are derived using multiple lines of evidence. As an example of our strategy and the utility of this framework, we apply our data mining approach to explore the biology of brown fat within the context of the thousands of publicly available gene expression datasets. Our work presents a practical strategy for organizing, mining, and correlating global collections of large-scale genomic data to explore normal and disease biology. Using a hypothesis-free approach, we demonstrate how a data-driven analysis across very large collections of genomic data can reveal novel discoveries and evidence to support existing hypothesis.
Comparative genomics identifies distinct lineages of S. Enteritidis from Queensland, Australia.
Graham, Rikki M A; Hiley, Lester; Rathnayake, Irani U; Jennison, Amy V
2018-01-01
Salmonella enterica is a major cause of gastroenteritis and foodborne illness in Australia where notification rates in the state of Queensland are the highest in the country. S. Enteritidis is among the five most common serotypes reported in Queensland and it is a priority for epidemiological surveillance due to concerns regarding its emergence in Australia. Using whole genome sequencing, we have analysed the genomic epidemiology of 217 S. Enteritidis isolates from Queensland, and observed that they fall into three distinct clades, which we have differentiated as Clades A, B and C. Phage types and MLST sequence types differed between the clades and comparative genomic analysis has shown that each has a unique profile of prophage and genomic islands. Several of the phage regions present in the S. Enteritidis reference strain P125109 were absent in Clades A and C, and these clades also had difference in the presence of pathogenicity islands, containing complete SPI-6 and SPI-19 regions, while P125109 does not. Antimicrobial resistance markers were found in 39 isolates, all but one of which belonged to Clade B. Phylogenetic analysis of the Queensland isolates in the context of 170 international strains showed that Queensland Clade B isolates group together with the previously identified global clade, while the other two clades are distinct and appear largely restricted to Australia. Locally sourced environmental isolates included in this analysis all belonged to Clades A and C, which is consistent with the theory that these clades are a source of locally acquired infection, while Clade B isolates are mostly travel related.
Bible, Paul W; Kanno, Yuka; Wei, Lai; Brooks, Stephen R; O'Shea, John J; Morasso, Maria I; Loganantharaj, Rasiah; Sun, Hong-Wei
2015-01-01
Comparative co-localization analysis of transcription factors (TFs) and epigenetic marks (EMs) in specific biological contexts is one of the most critical areas of ChIP-Seq data analysis beyond peak calling. Yet there is a significant lack of user-friendly and powerful tools geared towards co-localization analysis based exploratory research. Most tools currently used for co-localization analysis are command line only and require extensive installation procedures and Linux expertise. Online tools partially address the usability issues of command line tools, but slow response times and few customization features make them unsuitable for rapid data-driven interactive exploratory research. We have developed PAPST: Peak Assignment and Profile Search Tool, a user-friendly yet powerful platform with a unique design, which integrates both gene-centric and peak-centric co-localization analysis into a single package. Most of PAPST's functions can be completed in less than five seconds, allowing quick cycles of data-driven hypothesis generation and testing. With PAPST, a researcher with or without computational expertise can perform sophisticated co-localization pattern analysis of multiple TFs and EMs, either against all known genes or a set of genomic regions obtained from public repositories or prior analysis. PAPST is a versatile, efficient, and customizable tool for genome-wide data-driven exploratory research. Creatively used, PAPST can be quickly applied to any genomic data analysis that involves a comparison of two or more sets of genomic coordinate intervals, making it a powerful tool for a wide range of exploratory genomic research. We first present PAPST's general purpose features then apply it to several public ChIP-Seq data sets to demonstrate its rapid execution and potential for cutting-edge research with a case study in enhancer analysis. To our knowledge, PAPST is the first software of its kind to provide efficient and sophisticated post peak-calling ChIP-Seq data analysis as an easy-to-use interactive application. PAPST is available at https://github.com/paulbible/papst and is a public domain work.
Bible, Paul W.; Kanno, Yuka; Wei, Lai; Brooks, Stephen R.; O’Shea, John J.; Morasso, Maria I.; Loganantharaj, Rasiah; Sun, Hong-Wei
2015-01-01
Comparative co-localization analysis of transcription factors (TFs) and epigenetic marks (EMs) in specific biological contexts is one of the most critical areas of ChIP-Seq data analysis beyond peak calling. Yet there is a significant lack of user-friendly and powerful tools geared towards co-localization analysis based exploratory research. Most tools currently used for co-localization analysis are command line only and require extensive installation procedures and Linux expertise. Online tools partially address the usability issues of command line tools, but slow response times and few customization features make them unsuitable for rapid data-driven interactive exploratory research. We have developed PAPST: Peak Assignment and Profile Search Tool, a user-friendly yet powerful platform with a unique design, which integrates both gene-centric and peak-centric co-localization analysis into a single package. Most of PAPST’s functions can be completed in less than five seconds, allowing quick cycles of data-driven hypothesis generation and testing. With PAPST, a researcher with or without computational expertise can perform sophisticated co-localization pattern analysis of multiple TFs and EMs, either against all known genes or a set of genomic regions obtained from public repositories or prior analysis. PAPST is a versatile, efficient, and customizable tool for genome-wide data-driven exploratory research. Creatively used, PAPST can be quickly applied to any genomic data analysis that involves a comparison of two or more sets of genomic coordinate intervals, making it a powerful tool for a wide range of exploratory genomic research. We first present PAPST’s general purpose features then apply it to several public ChIP-Seq data sets to demonstrate its rapid execution and potential for cutting-edge research with a case study in enhancer analysis. To our knowledge, PAPST is the first software of its kind to provide efficient and sophisticated post peak-calling ChIP-Seq data analysis as an easy-to-use interactive application. PAPST is available at https://github.com/paulbible/papst and is a public domain work. PMID:25970601
2008-01-01
Background The phosphoenolpyruvate phosphotransferase system (PTS) plays a major role in sugar transport and in the regulation of essential physiological processes in many bacteria. The PTS couples solute transport to its phosphorylation at the expense of phosphoenolpyruvate (PEP) and it consists of general cytoplasmic phosphoryl transfer proteins and specific enzyme II complexes which catalyze the uptake and phosphorylation of solutes. Previous studies have suggested that the evolution of the constituents of the enzyme II complexes has been driven largely by horizontal gene transfer whereas vertical inheritance has been prevalent in the general phosphoryl transfer proteins in some bacterial groups. The aim of this work is to test this hypothesis by studying the evolution of the phosphoryl transfer proteins of the PTS. Results We have analyzed the evolutionary history of the PTS phosphoryl transfer chain (PTS-ptc) components in 222 complete genomes by combining phylogenetic methods and analysis of genomic context. Phylogenetic analyses alone were not conclusive for the deepest nodes but when complemented with analyses of genomic context and functional information, the main evolutionary trends of this system could be depicted. Conclusion The PTS-ptc evolved in bacteria after the divergence of early lineages such as Aquificales, Thermotogales and Thermus/Deinococcus. The subsequent evolutionary history of the PTS-ptc varied in different bacterial lineages: vertical inheritance and lineage-specific gene losses mainly explain the current situation in Actinobacteria and Firmicutes whereas horizontal gene transfer (HGT) also played a major role in Proteobacteria. Most remarkably, we have identified a HGT event from Firmicutes or Fusobacteria to the last common ancestor of the Enterobacteriaceae, Pasteurellaceae, Shewanellaceae and Vibrionaceae. This transfer led to extensive changes in the metabolic and regulatory networks of these bacteria including the development of a novel carbon catabolite repression system. Hence, this example illustrates that HGT can drive major physiological modifications in bacteria. PMID:18485189
Genomic research, publics and experts in Latin America: Nation, race and body
Wade, Peter; López-Beltrán, Carlos; Restrepo, Eduardo; Santos, Ricardo Ventura
2015-01-01
The articles in this issue highlight contributions that studies of Latin America can make to wider debates about the effects of genomic science on public ideas about race and nation. We argue that current ideas about the power of genomics to transfigure and transform existing ways of thinking about human diversity are often overstated. If a range of social contexts are examined, the effects are uneven. Our data show that genomic knowledge can unsettle and reinforce ideas of nation and race; it can be both banal and highly politicized. In this introduction, we outline concepts of genetic knowledge in society; theories of genetics, nation and race; approaches to public understandings of science; and the Latin American contexts of transnational ideas of nation and race. PMID:27479996
Proteogenomic characterization of human colon and rectal cancer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Bing; Wang, Jing; Wang, Xiaojing
2014-09-18
We analyzed proteomes of colon and rectal tumors previously characterized by the Cancer Genome Atlas (TCGA) and performed integrated proteogenomic analyses. Protein sequence variants encoded by somatic genomic variations displayed reduced expression compared to protein variants encoded by germline variations. mRNA transcript abundance did not reliably predict protein expression differences between tumors. Proteomics identified five protein expression subtypes, two of which were associated with the TCGA "MSI/CIMP" transcriptional subtype, but had distinct mutation and methylation patterns and associated with different clinical outcomes. Although CNAs showed strong cis- and trans-effects on mRNA expression, relatively few of these extend to the proteinmore » level. Thus, proteomics data enabled prioritization of candidate driver genes. Our analyses identified HNF4A, a novel candidate driver gene in tumors with chromosome 20q amplifications. Integrated proteogenomic analysis provides functional context to interpret genomic abnormalities and affords novel insights into cancer biology.« less
Marian, Ali J.; van Rooij, Eva; Roberts, Robert
2016-01-01
This is the first of 2 review papers on genetics and genomics appearing as part of the series on “omics.” Genomics pertains to all components of an organism’s genes, whereas genetics involves analysis of a specific gene(s) in the context of heredity. The paper provides introductory comments, describes the basis of human genetic diversity, and addresses the phenotypic consequences of genetic variants. Rare variants with large effect sizes are responsible for single-gene disorders, whereas complex polygenic diseases are typically due to multiple genetic variants, each exerting a modest effect size. To illustrate the clinical implications of genetic variants with large effect sizes, 3 common forms of hereditary cardiomyopathies are discussed as prototypic examples of single-gene disorders, including their genetics, clinical manifestations, pathogenesis, and treatment. The genetic basis of complex traits is discussed in a separate paper. PMID:28007145
SuperPhy: predictive genomics for the bacterial pathogen Escherichia coli.
Whiteside, Matthew D; Laing, Chad R; Manji, Akiff; Kruczkiewicz, Peter; Taboada, Eduardo N; Gannon, Victor P J
2016-04-12
Predictive genomics is the translation of raw genome sequence data into a phenotypic assessment of the organism. For bacterial pathogens, these phenotypes can range from environmental survivability, to the severity of human disease. Significant progress has been made in the development of generic tools for genomic analyses that are broadly applicable to all microorganisms; however, a fundamental missing component is the ability to analyze genomic data in the context of organism-specific phenotypic knowledge, which has been accumulated from decades of research and can provide a meaningful interpretation of genome sequence data. In this study, we present SuperPhy, an online predictive genomics platform ( http://lfz.corefacility.ca/superphy/ ) for Escherichia coli. The platform integrates the analytical tools and genome sequence data for all publicly available E. coli genomes and facilitates the upload of new genome sequences from users under public or private settings. SuperPhy provides real-time analyses of thousands of genome sequences with results that are understandable and useful to a wide community, including those in the fields of clinical medicine, epidemiology, ecology, and evolution. SuperPhy includes identification of: 1) virulence and antimicrobial resistance determinants 2) statistical associations between genotypes, biomarkers, geospatial distribution, host, source, and phylogenetic clade; 3) the identification of biomarkers for groups of genomes on the based presence/absence of specific genomic regions and single-nucleotide polymorphisms and 4) in silico Shiga-toxin subtype. SuperPhy is a predictive genomics platform that attempts to provide an essential link between the vast amounts of genome information currently being generated and phenotypic knowledge in an organism-specific context.
Lehmann, David S.; Ribaudo, Heather J.; Daar, Eric S.; Gulick, Roy M.; Haubrich, Richard H.; Robbins, Gregory K.; de Bakker, Paul I.W.; Haas, David W.; McLaren, Paul J.
2015-01-01
Background Efavirenz and abacavir are components of recommended first-line regimens for human immunodeficiency virus (HIV)-1 infection. We used genome-wide genotyping and clinical data to explore genetic associations with virologic failure among subjects randomized to efavirenz- or abacavir-containing regimens in AIDS Clinical Trials Group (ACTG) protocols. Methods Virologic response and genome-wide genotype data were available from treatment-naive subjects randomized to efavirenz-containing (n=1,596) or abacavir-containing (n=786) regimens in ACTG protocols 384, A5142, A5095, and A5202. Results Meta-analysis of association results across race/ethnic groups showed no genome-wide significant associations (p<5×10−8) with virologic response for either efavirenz or abacavir. Our sample size provided 80% power to detect a genotype relative risk of 1.8 for efavirenz, and 2.4 for abacavir. Analyses focused on CYP2B genotypes that define the lowest plasma efavirenz exposure stratum did not reveal associations, nor did analysis limited to gene sets predicted to be relevant to efavirenz and abacavir disposition. Conclusions No single polymorphism is strongly associated with virologic failure with efavirenz- or abacavir-containing regimens. Analyses to better consider context, and that minimize confounding by non-genetic factors, may reveal associations not apparent herein. PMID:25461247
Lehmann, David S; Ribaudo, Heather J; Daar, Eric S; Gulick, Roy M; Haubrich, Richard H; Robbins, Gregory K; de Bakker, Paul I W; Haas, David W; McLaren, Paul J
2015-02-01
Efavirenz and abacavir are components of recommended first-line regimens for HIV-1 infection. We used genome-wide genotyping and clinical data to explore genetic associations with virologic failure among patients randomized to efavirenz-containing or abacavir-containing regimens in AIDS Clinical Trials Group (ACTG) protocols. Virologic response and genome-wide genotype data were available from treatment-naive patients randomized to efavirenz-containing (n=1596) or abacavir-containing (n = 786) regimens in ACTG protocols 384, A5142, A5095, and A5202. Meta-analysis of association results across race/ethnic groups showed no genome-wide significant associations (P < 5 × 10) with virologic response for either efavirenz or abacavir. Our sample size provided 80% power to detect a genotype relative risk of 1.8 for efavirenz and 2.4 for abacavir. Analyses focused on CYP2B genotypes that define the lowest plasma efavirenz exposure stratum did not show associations nor did analysis limited to gene sets predicted to be relevant to efavirenz and abacavir disposition. No single polymorphism is associated strongly with virologic failure with efavirenz-containing or abacavir-containing regimens. Analyses to better consider context, and that minimize confounding by nongenetic factors, may show associations not apparent here.
Cell Context Dependent p53 Genome-Wide Binding Patterns and Enrichment at Repeats
Botcheva, Krassimira; McCorkle, Sean R.
2014-11-21
The p53 ability to elicit stress specific and cell type specific responses is well recognized, but how that specificity is established remains to be defined. Whether upon activation p53 binds to its genomic targets in a cell type and stress type dependent manner is still an open question. Here we show that the p53 binding to the human genome is selective and cell context-dependent. We mapped the genomic binding sites for the endogenous wild type p53 protein in the human cancer cell line HCT116 and compared them to those we previously determined in the normal cell line IMR90. We reportmore » distinct p53 genome-wide binding landscapes in two different cell lines, analyzed under the same treatment and experimental conditions, using the same ChIP-seq approach. This is evidence for cell context dependent p53 genomic binding. The observed differences affect the p53 binding sites distribution with respect to major genomic and epigenomic elements (promoter regions, CpG islands and repeats). We correlated the high-confidence p53 ChIP-seq peaks positions with the annotated human repeats (UCSC Human Genome Browser) and observed both common and cell line specific trends. In HCT116, the p53 binding was specifically enriched at LINE repeats, compared to IMR90 cells. The p53 genome-wide binding patterns in HCT116 and IMR90 likely reflect the different epigenetic landscapes in these two cell lines, resulting from cancer-associated changes (accumulated in HCT116) superimposed on tissue specific differences (HCT116 has epithelial, while IMR90 has mesenchymal origin). In conclusion, our data support the model for p53 binding to the human genome in a highly selective manner, mobilizing distinct sets of genes, contributing to distinct pathways.« less
MultiMetEval: Comparative and Multi-Objective Analysis of Genome-Scale Metabolic Models
Gevorgyan, Albert; Kierzek, Andrzej M.; Breitling, Rainer; Takano, Eriko
2012-01-01
Comparative metabolic modelling is emerging as a novel field, supported by the development of reliable and standardized approaches for constructing genome-scale metabolic models in high throughput. New software solutions are needed to allow efficient comparative analysis of multiple models in the context of multiple cellular objectives. Here, we present the user-friendly software framework Multi-Metabolic Evaluator (MultiMetEval), built upon SurreyFBA, which allows the user to compose collections of metabolic models that together can be subjected to flux balance analysis. Additionally, MultiMetEval implements functionalities for multi-objective analysis by calculating the Pareto front between two cellular objectives. Using a previously generated dataset of 38 actinobacterial genome-scale metabolic models, we show how these approaches can lead to exciting novel insights. Firstly, after incorporating several pathways for the biosynthesis of natural products into each of these models, comparative flux balance analysis predicted that species like Streptomyces that harbour the highest diversity of secondary metabolite biosynthetic gene clusters in their genomes do not necessarily have the metabolic network topology most suitable for compound overproduction. Secondly, multi-objective analysis of biomass production and natural product biosynthesis in these actinobacteria shows that the well-studied occurrence of discrete metabolic switches during the change of cellular objectives is inherent to their metabolic network architecture. Comparative and multi-objective modelling can lead to insights that could not be obtained by normal flux balance analyses. MultiMetEval provides a powerful platform that makes these analyses straightforward for biologists. Sources and binaries of MultiMetEval are freely available from https://github.com/PiotrZakrzewski/MetEval/downloads. PMID:23272111
Jolley, Keith A.; Reed, Elizabeth; Martinez-Urtaza, Jaime
2017-01-01
ABSTRACT Vibrio parahaemolyticus is an important human foodborne pathogen whose transmission is associated with the consumption of contaminated seafood, with a growing number of infections reported over recent years worldwide. A multilocus sequence typing (MLST) database for V. parahaemolyticus was created in 2008, and a large number of clones have been identified, causing severe outbreaks worldwide (sequence type 3 [ST3]), recurrent outbreaks in certain regions (e.g., ST36), or spreading to other regions where they are nonendemic (e.g., ST88 or ST189). The current MLST scheme uses sequences of 7 genes to generate an ST, which results in a powerful tool for inferring the population structure of this pathogen, although with limited resolution, especially compared to pulsed-field gel electrophoresis (PFGE). The application of whole-genome sequencing (WGS) has become routine for trace back investigations, with core genome MLST (cgMLST) analysis as one of the most straightforward ways to explore complex genomic data in an epidemiological context. Therefore, there is a need to generate a new, portable, standardized, and more advanced system that provides higher resolution and discriminatory power among V. parahaemolyticus strains using WGS data. We sequenced 92 V. parahaemolyticus genomes and used the genome of strain RIMD 2210633 as a reference (with a total of 4,832 genes) to determine which genes were suitable for establishing a V. parahaemolyticus cgMLST scheme. This analysis resulted in the identification of 2,254 suitable core genes for use in the cgMLST scheme. To evaluate the performance of this scheme, we performed a cgMLST analysis of 92 newly sequenced genomes, plus an additional 142 strains with genomes available at NCBI. cgMLST analysis was able to distinguish related and unrelated strains, including those with the same ST, clearly showing its enhanced resolution over conventional MLST analysis. It also distinguished outbreak-related from non-outbreak-related strains within the same ST. The sequences obtained from this work were deposited and are available in the public database (http://pubmlst.org/vparahaemolyticus). The application of this cgMLST scheme to the characterization of V. parahaemolyticus strains provided by different laboratories from around the world will reveal the global picture of the epidemiology, spread, and evolution of this pathogen and will become a powerful tool for outbreak investigations, allowing for the unambiguous comparison of strains with global coverage. PMID:28330888
Context-specific metabolic networks are consistent with experiments.
Becker, Scott A; Palsson, Bernhard O
2008-05-16
Reconstructions of cellular metabolism are publicly available for a variety of different microorganisms and some mammalian genomes. To date, these reconstructions are "genome-scale" and strive to include all reactions implied by the genome annotation, as well as those with direct experimental evidence. Clearly, many of the reactions in a genome-scale reconstruction will not be active under particular conditions or in a particular cell type. Methods to tailor these comprehensive genome-scale reconstructions into context-specific networks will aid predictive in silico modeling for a particular situation. We present a method called Gene Inactivity Moderated by Metabolism and Expression (GIMME) to achieve this goal. The GIMME algorithm uses quantitative gene expression data and one or more presupposed metabolic objectives to produce the context-specific reconstruction that is most consistent with the available data. Furthermore, the algorithm provides a quantitative inconsistency score indicating how consistent a set of gene expression data is with a particular metabolic objective. We show that this algorithm produces results consistent with biological experiments and intuition for adaptive evolution of bacteria, rational design of metabolic engineering strains, and human skeletal muscle cells. This work represents progress towards producing constraint-based models of metabolism that are specific to the conditions where the expression profiling data is available.
The goal of this project is to identify key druggable regulators of glucocorticoid resistance in T-ALL. To this end, a reverse-engineered T-ALL context-specific regulatory interaction network was created from a phenotypically diverse T-ALL gene expression dataset, and then this network was interrogated using master regulator analysis to find drivers of glucocorticoid resistance.
Functional Genomics Assistant (FUGA): a toolbox for the analysis of complex biological networks
2011-01-01
Background Cellular constituents such as proteins, DNA, and RNA form a complex web of interactions that regulate biochemical homeostasis and determine the dynamic cellular response to external stimuli. It follows that detailed understanding of these patterns is critical for the assessment of fundamental processes in cell biology and pathology. Representation and analysis of cellular constituents through network principles is a promising and popular analytical avenue towards a deeper understanding of molecular mechanisms in a system-wide context. Findings We present Functional Genomics Assistant (FUGA) - an extensible and portable MATLAB toolbox for the inference of biological relationships, graph topology analysis, random network simulation, network clustering, and functional enrichment statistics. In contrast to conventional differential expression analysis of individual genes, FUGA offers a framework for the study of system-wide properties of biological networks and highlights putative molecular targets using concepts of systems biology. Conclusion FUGA offers a simple and customizable framework for network analysis in a variety of systems biology applications. It is freely available for individual or academic use at http://code.google.com/p/fuga. PMID:22035155
Exploring and Harnessing Haplotype Diversity to Improve Yield Stability in Crops.
Qian, Lunwen; Hickey, Lee T; Stahl, Andreas; Werner, Christian R; Hayes, Ben; Snowdon, Rod J; Voss-Fels, Kai P
2017-01-01
In order to meet future food, feed, fiber, and bioenergy demands, global yields of all major crops need to be increased significantly. At the same time, the increasing frequency of extreme weather events such as heat and drought necessitates improvements in the environmental resilience of modern crop cultivars. Achieving sustainably increase yields implies rapid improvement of quantitative traits with a very complex genetic architecture and strong environmental interaction. Latest advances in genome analysis technologies today provide molecular information at an ultrahigh resolution, revolutionizing crop genomic research, and paving the way for advanced quantitative genetic approaches. These include highly detailed assessment of population structure and genotypic diversity, facilitating the identification of selective sweeps and signatures of directional selection, dissection of genetic variants that underlie important agronomic traits, and genomic selection (GS) strategies that not only consider major-effect genes. Single-nucleotide polymorphism (SNP) markers today represent the genotyping system of choice for crop genetic studies because they occur abundantly in plant genomes and are easy to detect. SNPs are typically biallelic, however, hence their information content compared to multiallelic markers is low, limiting the resolution at which SNP-trait relationships can be delineated. An efficient way to overcome this limitation is to construct haplotypes based on linkage disequilibrium, one of the most important features influencing genetic analyses of crop genomes. Here, we give an overview of the latest advances in genomics-based haplotype analyses in crops, highlighting their importance in the context of polyploidy and genome evolution, linkage drag, and co-selection. We provide examples of how haplotype analyses can complement well-established quantitative genetics frameworks, such as quantitative trait analysis and GS, ultimately providing an effective tool to equip modern crops with environment-tailored characteristics.
Bahreini, Amir; Li, Zheqi; Wang, Peilu; Levine, Kevin M; Tasdemir, Nilgun; Cao, Lan; Weir, Hazel M; Puhalla, Shannon L; Davidson, Nancy E; Stern, Andrew M; Chu, David; Park, Ben Ho; Lee, Adrian V; Oesterreich, Steffi
2017-05-23
Mutations in the estrogen receptor alpha (ERα) 1 gene (ESR1) are frequently detected in ER+ metastatic breast cancer, and there is increasing evidence that these mutations confer endocrine resistance in breast cancer patients with advanced disease. However, their functional role is not well-understood, at least in part due to a lack of ESR1 mutant models. Here, we describe the generation and characterization of genome-edited T47D and MCF7 breast cancer cell lines with the two most common ESR1 mutations, Y537S and D538G. Genome editing was performed using CRISPR and adeno-associated virus (AAV) technologies to knock-in ESR1 mutations into T47D and MCF7 cell lines, respectively. Various techniques were utilized to assess the activity of mutant ER, including transactivation, growth and chromatin-immunoprecipitation (ChIP) assays. The level of endocrine resistance was tested in mutant cells using a number of selective estrogen receptor modulators (SERMs) and degraders (SERDs). RNA sequencing (RNA-seq) was employed to study gene targets of mutant ER. Cells with ESR1 mutations displayed ligand-independent ER activity, and were resistant to several SERMs and SERDs, with cell line and mutation-specific differences with respect to magnitude of effect. The SERD AZ9496 showed increased efficacy compared to other drugs tested. Wild-type and mutant cell co-cultures demonstrated a unique evolution of mutant cells under estrogen deprivation and tamoxifen treatment. Transcriptome analysis confirmed ligand-independent regulation of ERα target genes by mutant ERα, but also identified novel target genes, some of which are involved in metastasis-associated phenotypes. Despite significant overlap in the ligand-independent genes between Y537S and D538G, the number of mutant ERα-target genes shared between the two cell lines was limited, suggesting context-dependent activity of the mutant receptor. Some genes and phenotypes were unique to one mutation within a given cell line, suggesting a mutation-specific effect. Taken together, ESR1 mutations in genome-edited breast cancer cell lines confer ligand-independent growth and endocrine resistance. These biologically relevant models can be used for further mechanistic and translational studies, including context-specific and mutation site-specific analysis of the ESR1 mutations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Overbeek, Ross; Fonstein, Veronika; Osterman, Andrei
2005-02-15
The team of the Fellowship for Interpretation of Genomes (FIG) under the leadership of Ross Overbeek, began working on this Project in November 2003. During the previous year, the Project was performed at Integrated Genomics Inc. A transition from the industrial environment to the public domain prompted us to adjust some aspects of the Project. Notwithstanding the challenges, we believe that these adjustments had a strong positive impact on our deliverables. Most importantly, the work of the research team led by R. Overbeek resulted in the deployment of a new open source genomic platform, the SEED (Specific Aim 1). Thismore » platform provided a foundation for the development of CyanoSEED a specialized portal to comparative analysis and metabolic reconstruction of all available cyanobacterial genomes (Specific Aim 3). The SEED represents a new generation of software for genome analysis. Briefly, it is a portable and extendable system, containing one of the largest and permanently growing collections of complete and partial genomes. The complete system with annotations and tools is freely available via browsing or via installation on a user's Mac or Linux computer. One of the important unique features of the SEED is the support of metabolic reconstruction and comparative genome analysis via encoding and projection of functional subsystems. During the project period, the FIG research team has validated the new software by developing a significant number of core subsystems, covering many aspects of central metabolism (Specific Aim 2), as well as metabolic areas specific for cyanobacteria and other photoautotrophic organisms (Specific Aim 3). In addition to providing a proof of technology and a starting point for further community-based efforts, these subsystems represent a valuable asset. An extensive coverage of central metabolism provides the bulk of information required for metabolic modeling in Synechocystis sp.PCC 6803. Detailed analysis of several subsystems covering energy, carbon, and redox metabolism in the Synechocystis sp. PCC 6803 and other cyanobacteria has been performed (Specific Aim 4). The main objectives for this year (adjusted to reflect a new, public domain, setting of the Project research team) were: Aim 1. To develop, test, and deploy a new open source system, the SEED, for integrating community-based annotation, and comparative analysis of all publicly available microbial genomes. Develop a comprehensive genomic database by integrating within SEED all publicly available complete and nearly complete genome sequences with special emphasis on genomes of cyanobacteria, phototrophic eukaryotes, and anoxygenic phototrophic bacteria--invaluable for comparative genomic studies of energy and carbon metabolism in Synechocystis sp. PCC 6803. Aim 2. To develop the SEED's biological content in the form of a collection of encoded Subsystems largely covering the conserved cellular machinery in prokaryotes (and central metabolic machinery in eukaryotes). Aim 3. To develop, utilizing core SEED technology, the CyanoSEED--a specialized WEB portal for community-based annotation, and comparative analysis of all publicly available cyanobacterial genomes. Encode the set of additional subsystems representing key metabolic transformations in cyanobacteria and other photoautotrophs. We envisioned this resource as complementary to other public access databases for comparative genomic analysis currently available to the cyanobacterial research community. Aim 4. Perform in-depth analysis of several subsystems covering energy, carbon, and redox metabolism in the Synechocystis sp. PCC 6803 and all other cyanobacteria with available genome sequences. Reveal inconsistencies and gaps in the current knowledge of these subsystems. Use functional and genome context analysis tools in CyanoSEED to predict, whenever possible, candidate genes for inferred functional roles. To disseminate freely these conjectures and predictions by publishing them on CyanoSEED (http://cyanoseed.thefig.info/) and the Subsystems Forum (http://brucella.uchicago.edu/SubsystemForum/) in order to facilitate experimental analysis by our collaborator on this Project and by other experimentalists working in various field of cyanobacterial physiology and biotechnology.« less
The Aquaporin Channel Repertoire of the Tardigrade Milnesium tardigradum
Grohme, Markus A.; Mali, Brahim; Wełnicz, Weronika; Michel, Stephanie; Schill, Ralph O.; Frohme, Marcus
2013-01-01
Limno-terrestrial tardigrades are small invertebrates that are subjected to periodic drought of their micro-environment. They have evolved to cope with these unfavorable conditions by anhydrobiosis, an ametabolic state of low cellular water. During drying and rehydration, tardigrades go through drastic changes in cellular water content. By our transcriptome sequencing effort of the limno-terrestrial tardigrade Milnesium tardigradum and by a combination of cloning and targeted sequence assembly, we identified transcripts encoding eleven putative aquaporins. Analysis of these sequences proposed 2 classical aquaporins, 8 aquaglyceroporins and a single potentially intracellular unorthodox aquaporin. Using quantitative real-time PCR we analyzed aquaporin transcript expression in the anhydrobiotic context. We have identified additional unorthodox aquaporins in various insect genomes and have identified a novel common conserved structural feature in these proteins. Analysis of the genomic organization of insect aquaporin genes revealed several conserved gene clusters. PMID:23761966
Rafehi, Haloom; Ververis, Katherine; Balcerczyk, Aneta; Ziemann, Mark; Ooi, Jenny; Hu, Sean; Kwa, Faith A A; Loveridge, Shanon J; Georgiadis, George T; El-Osta, Assam; Karagiannis, Tom C
2012-01-01
The accumulating evidence of the beneficial effects of cinnamon (Cinnamomum burmanni) in type-2 diabetes, a chronic age-associated disease, has prompted the commercialisation of various supplemental forms of the spice. One such supplement, Cinnulin PF(®), represents the water soluble fraction containing relatively high levels of the double-linked procyanidin type-A polymers of flavanoids. The overall aim of this study was to utilize genome-wide mRNA-Seq analysis to characterise the changes in gene expression caused by Cinnulin PF in immortalised human keratinocytes and microvascular endothelial cells, which are relevant with respect to diabetic complications. In summary, our findings provide insights into the mechanisms of action of Cinnulin PF in diabetes and diabetic complications. More generally, we identify relevant candidate genes which could provide the basis for further investigation.
Rafehi, Haloom; Ververis, Katherine; Balcerczyk, Aneta; Ziemann, Mark; Ooi, Jenny; Hu, Sean; Kwa, Faith A. A.; Loveridge, Shanon J.; Georgiadis, George T.; El-Osta, Assam; Karagiannis, Tom C.
2012-01-01
The accumulating evidence of the beneficial effects of cinnamon (Cinnamomum burmanni) in type-2 diabetes, a chronic age-associated disease, has prompted the commercialisation of various supplemental forms of the spice. One such supplement, Cinnulin PF®, represents the water soluble fraction containing relatively high levels of the double-linked procyanidin type-A polymers of flavanoids. The overall aim of this study was to utilize genome-wide mRNA-Seq analysis to characterise the changes in gene expression caused by Cinnulin PF in immortalised human keratinocytes and microvascular endothelial cells, which are relevant with respect to diabetic complications. In summary, our findings provide insights into the mechanisms of action of Cinnulin PF in diabetes and diabetic complications. More generally, we identify relevant candidate genes which could provide the basis for further investigation. PMID:22953038
CNV-WebStore: online CNV analysis, storage and interpretation.
Vandeweyer, Geert; Reyniers, Edwin; Wuyts, Wim; Rooms, Liesbeth; Kooy, R Frank
2011-01-05
Microarray technology allows the analysis of genomic aberrations at an ever increasing resolution, making functional interpretation of these vast amounts of data the main bottleneck in routine implementation of high resolution array platforms, and emphasising the need for a centralised and easy to use CNV data management and interpretation system. We present CNV-WebStore, an online platform to streamline the processing and downstream interpretation of microarray data in a clinical context, tailored towards but not limited to the Illumina BeadArray platform. Provided analysis tools include CNV analsyis, parent of origin and uniparental disomy detection. Interpretation tools include data visualisation, gene prioritisation, automated PubMed searching, linking data to several genome browsers and annotation of CNVs based on several public databases. Finally a module is provided for uniform reporting of results. CNV-WebStore is able to present copy number data in an intuitive way to both lab technicians and clinicians, making it a useful tool in daily clinical practice.
Efficient identification of context dependent subgroups of risk from genome wide association studies
Dyson, Greg; Sing, Charles F.
2014-01-01
We have developed a modified Patient Rule-Induction Method (PRIM) as an alternative strategy for analyzing representative samples of non-experimental human data to estimate and test the role of genomic variations as predictors of disease risk in etiologically heterogeneous sub-samples. A computational limit of the proposed strategy is encountered when the number of genomic variations (predictor variables) under study is large (> 500) because permutations are used to generate a null distribution to test the significance of a term (defined by values of particular variables) that characterizes a sub-sample of individuals through the peeling and pasting processes. As an alternative, in this paper we introduce a theoretical strategy that facilitates the quick calculation of Type I and Type II errors in the evaluation of terms in the peeling and pasting processes carried out in the execution of a PRIM analysis that are underestimated and non-existent, respectively, when a permutation-based hypothesis test is employed. The resultant savings in computational time makes possible the consideration of larger numbers of genomic variations (an example genome wide association study is given) in the selection of statistically significant terms in the formulation of PRIM prediction models. PMID:24570412
Su, Chang; Wang, Chao; He, Lin; Yang, Chuanping; Wang, Yucheng
2014-01-01
DNA methylation plays a critical role in the regulation of gene expression. Most studies of DNA methylation have been performed in herbaceous plants, and little is known about the methylation patterns in tree genomes. In the present study, we generated a map of methylated cytosines at single base pair resolution for Betula platyphylla (white birch) by bisulfite sequencing combined with transcriptomics to analyze DNA methylation and its effects on gene expression. We obtained a detailed view of the function of DNA methylation sequence composition and distribution in the genome of B. platyphylla. There are 34,460 genes in the whole genome of birch, and 31,297 genes are methylated. Conservatively, we estimated that 14.29% of genomic cytosines are methylcytosines in birch. Among the methylation sites, the CHH context accounts for 48.86%, and is the largest proportion. Combined transcriptome and methylation analysis showed that the genes with moderate methylation levels had higher expression levels than genes with high and low methylation. In addition, methylated genes are highly enriched for the GO subcategories of binding activities, catalytic activities, cellular processes, response to stimulus and cell death, suggesting that methylation mediates these pathways in birch trees. PMID:25514241
Draft genome sequences of bacteria isolated from the Deschampsia antarctica phyllosphere.
Cid, Fernanda P; Maruyama, Fumito; Murase, Kazunori; Graether, Steffen P; Larama, Giovanni; Bravo, Leon A; Jorquera, Milko A
2018-05-01
Genome analyses are being used to characterize plant growth-promoting (PGP) bacteria living in different plant compartiments. In this context, we have recently isolated bacteria from the phyllosphere of an Antarctic plant (Deschampsia antarctica) showing ice recrystallization inhibition (IRI), an activity related to the presence of antifreeze proteins (AFPs). In this study, the draft genomes of six phyllospheric bacteria showing IRI activity were sequenced and annotated according to their functional gene categories. Genome sizes ranged from 5.6 to 6.3 Mbp, and based on sequence analysis of the 16S rRNA genes, five strains were identified as Pseudomonas and one as Janthinobacterium. Interestingly, most strains showed genes associated with PGP traits, such as nutrient uptake (ammonia assimilation, nitrogen fixing, phosphatases, and organic acid production), bioactive metabolites (indole acetic acid and 1-aminocyclopropane-1-carboxylate deaminase), and antimicrobial compounds (hydrogen cyanide and pyoverdine). In relation with IRI activity, a search of putative AFPs using current bioinformatic tools was also carried out. Despite that genes associated with reported AFPs were not found in these genomes, genes connected to ice-nucleation proteins (InaA) were found in all Pseudomonas strains, but not in the Janthinobacterium strain.
Hong, Yanbin; Pandey, Manish K; Liu, Ying; Chen, Xiaoping; Liu, Hong; Varshney, Rajeev K; Liang, Xuanqiang; Huang, Shangzhi
2015-01-01
The cultivated peanut (Arachis hypogaea L.) is an allotetraploid (AABB) species derived from the A-genome (Arachis duranensis) and B-genome (Arachis ipaensis) progenitors. Presence of two versions of a DNA sequence based on the two progenitor genomes poses a serious technical and analytical problem during single nucleotide polymorphism (SNP) marker identification and analysis. In this context, we have analyzed 200 amplicons derived from expressed sequence tags (ESTs) and genome survey sequences (GSS) to identify SNPs in a panel of genotypes consisting of 12 cultivated peanut varieties and two diploid progenitors representing the ancestral genomes. A total of 18 EST-SNPs and 44 genomic-SNPs were identified in 12 peanut varieties by aligning the sequence of A. hypogaea with diploid progenitors. The average frequency of sequence polymorphism was higher for genomic-SNPs than the EST-SNPs with one genomic-SNP every 1011 bp as compared to one EST-SNP every 2557 bp. In order to estimate the potential and further applicability of these identified SNPs, 96 peanut varieties were genotyped using high resolution melting (HRM) method. Polymorphism information content (PIC) values for EST-SNPs ranged between 0.021 and 0.413 with a mean of 0.172 in the set of peanut varieties, while genomic-SNPs ranged between 0.080 and 0.478 with a mean of 0.249. Total 33 SNPs were used for polymorphism detection among the parents and 10 selected lines from mapping population Y13Zh (Zhenzhuhei × Yueyou13). Of the total 33 SNPs, nine SNPs showed polymorphism in the mapping population Y13Zh, and seven SNPs were successfully mapped into five linkage groups. Our results showed that SNPs can be identified in allotetraploid peanut with high accuracy through amplicon sequencing and HRM assay. The identified SNPs were very informative and can be used for different genetic and breeding applications in peanut.
Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs.
Green, Richard E; Braun, Edward L; Armstrong, Joel; Earl, Dent; Nguyen, Ngan; Hickey, Glenn; Vandewege, Michael W; St John, John A; Capella-Gutiérrez, Salvador; Castoe, Todd A; Kern, Colin; Fujita, Matthew K; Opazo, Juan C; Jurka, Jerzy; Kojima, Kenji K; Caballero, Juan; Hubley, Robert M; Smit, Arian F; Platt, Roy N; Lavoie, Christine A; Ramakodi, Meganathan P; Finger, John W; Suh, Alexander; Isberg, Sally R; Miles, Lee; Chong, Amanda Y; Jaratlerdsiri, Weerachai; Gongora, Jaime; Moran, Christopher; Iriarte, Andrés; McCormack, John; Burgess, Shane C; Edwards, Scott V; Lyons, Eric; Williams, Christina; Breen, Matthew; Howard, Jason T; Gresham, Cathy R; Peterson, Daniel G; Schmitz, Jürgen; Pollock, David D; Haussler, David; Triplett, Eric W; Zhang, Guojie; Irie, Naoki; Jarvis, Erich D; Brochu, Christopher A; Schmidt, Carl J; McCarthy, Fiona M; Faircloth, Brant C; Hoffmann, Federico G; Glenn, Travis C; Gabaldón, Toni; Paten, Benedict; Ray, David A
2014-12-12
To provide context for the diversification of archosaurs--the group that includes crocodilians, dinosaurs, and birds--we generated draft genomes of three crocodilians: Alligator mississippiensis (the American alligator), Crocodylus porosus (the saltwater crocodile), and Gavialis gangeticus (the Indian gharial). We observed an exceptionally slow rate of genome evolution within crocodilians at all levels, including nucleotide substitutions, indels, transposable element content and movement, gene family evolution, and chromosomal synteny. When placed within the context of related taxa including birds and turtles, this suggests that the common ancestor of all of these taxa also exhibited slow genome evolution and that the comparatively rapid evolution is derived in birds. The data also provided the opportunity to analyze heterozygosity in crocodilians, which indicates a likely reduction in population size for all three taxa through the Pleistocene. Finally, these data combined with newly published bird genomes allowed us to reconstruct the partial genome of the common ancestor of archosaurs, thereby providing a tool to investigate the genetic starting material of crocodilians, birds, and dinosaurs. Copyright © 2014, American Association for the Advancement of Science.
Phylogenetics of modern birds in the era of genomics
Edwards, Scott V; Bryan Jennings, W; Shedlock, Andrew M
2005-01-01
In the 14 years since the first higher-level bird phylogenies based on DNA sequence data, avian phylogenetics has witnessed the advent and maturation of the genomics era, the completion of the chicken genome and a suite of technologies that promise to add considerably to the agenda of avian phylogenetics. In this review, we summarize current approaches and data characteristics of recent higher-level bird studies and suggest a number of as yet untested molecular and analytical approaches for the unfolding tree of life for birds. A variety of comparative genomics strategies, including adoption of objective quality scores for sequence data, analysis of contiguous DNA sequences provided by large-insert genomic libraries, and the systematic use of retroposon insertions and other rare genomic changes all promise an integrated phylogenetics that is solidly grounded in genome evolution. The avian genome is an excellent testing ground for such approaches because of the more balanced representation of single-copy and repetitive DNA regions than in mammals. Although comparative genomics has a number of obvious uses in avian phylogenetics, its application to large numbers of taxa poses a number of methodological and infrastructural challenges, and can be greatly facilitated by a ‘community genomics’ approach in which the modest sequencing throughputs of single PI laboratories are pooled to produce larger, complementary datasets. Although the polymerase chain reaction era of avian phylogenetics is far from complete, the comparative genomics era—with its ability to vastly increase the number and type of molecular characters and to provide a genomic context for these characters—will usher in a host of new perspectives and opportunities for integrating genome evolution and avian phylogenetics. PMID:16024355
Liu, Guangjin; Zhang, Wei; Lu, Chengping
2013-11-11
Streptococcus agalactiae, also referred to as Group B Streptococcus (GBS), is a frequent resident of the rectovaginal tract in humans, and a major cause of neonatal infection. In addition, S. agalactiae is a known fish pathogen, which compromises food safety and represents a zoonotic hazard. The complete genome sequence of the piscine S. agalactiae isolate GD201008-001 was compared with 14 other piscine, human and bovine strains to explore their virulence determinants, evolutionary relationships and the genetic basis of host tropism in S. agalactiae. The pan-genome of S. agalactiae is open and its size increases with the addition of newly sequenced genomes. The core genes shared by all isolates account for 50 ~ 70% of any single genome. The Chinese piscine isolates GD201008-001 and ZQ0910 are phylogenetically distinct from the Latin American piscine isolates SA20-06 and STIR-CD-17, but are closely related to the human strain A909, in the context of the clustered regularly interspaced short palindromic repeats (CRISPRs), prophage, virulence-associated genes and phylogenetic relationships. We identified a unique 10 kb gene locus in Chinese piscine strains. Isolates from cultured tilapia in China have a close genomic relationship with the human strain A909. Our findings provide insight into the pathogenesis and host-associated genome content of piscine S. agalactiae isolated in China.
Rangannan, Vetriselvi; Bansal, Manju
2009-12-01
The rapid increase in genome sequence information has necessitated the annotation of their functional elements, particularly those occurring in the non-coding regions, in the genomic context. Promoter region is the key regulatory region, which enables the gene to be transcribed or repressed, but it is difficult to determine experimentally. Hence an in silico identification of promoters is crucial in order to guide experimental work and to pin point the key region that controls the transcription initiation of a gene. In this analysis, we demonstrate that while the promoter regions are in general less stable than the flanking regions, their average free energy varies depending on the GC composition of the flanking genomic sequence. We have therefore obtained a set of free energy threshold values, for genomic DNA with varying GC content and used them as generic criteria for predicting promoter regions in several microbial genomes, using an in-house developed tool PromPredict. On applying it to predict promoter regions corresponding to the 1144 and 612 experimentally validated TSSs in E. coli (50.8% GC) and B. subtilis (43.5% GC) sensitivity of 99% and 95% and precision values of 58% and 60%, respectively, were achieved. For the limited data set of 81 TSSs available for M. tuberculosis (65.6% GC) a sensitivity of 100% and precision of 49% was obtained.
van Hal, Sebastiaan J.; Steen, Jason A.; Espedido, Björn A.; Grimmond, Sean M.; Cooper, Matthew A.; Holden, Matthew T. G.; Bentley, Stephen D.; Gosbell, Iain B.; Jensen, Slade O.
2014-01-01
Objectives To obtain an expanded understanding of antibiotic resistance evolution in vivo, particularly in the context of vancomycin exposure. Methods The whole genomes of six consecutive methicillin-resistant Staphylococcus aureus blood culture isolates (ST239-MRSA-III) from a single patient exposed to various antimicrobials (over a 77 day period) were sequenced and analysed. Results Variant analysis revealed the existence of non-susceptible sub-populations derived from a common susceptible ancestor, with the predominant circulating clone(s) selected for by type and duration of antimicrobial exposure. Conclusions This study highlights the dynamic nature of bacterial evolution and that non-susceptible sub-populations can emerge from clouds of variation upon antimicrobial exposure. Diagnostically, this has direct implications for sample selection when using whole-genome sequencing as a tool to guide clinical therapy. In the context of bacteraemia, deep sequencing of bacterial DNA directly from patient blood samples would avoid culture ‘bias’ and identify mutations associated with circulating non-susceptible sub-populations, some of which may confer cross-resistance to alternate therapies. PMID:24047554
van Hal, Sebastiaan J; Steen, Jason A; Espedido, Björn A; Grimmond, Sean M; Cooper, Matthew A; Holden, Matthew T G; Bentley, Stephen D; Gosbell, Iain B; Jensen, Slade O
2014-02-01
To obtain an expanded understanding of antibiotic resistance evolution in vivo, particularly in the context of vancomycin exposure. The whole genomes of six consecutive methicillin-resistant Staphylococcus aureus blood culture isolates (ST239-MRSA-III) from a single patient exposed to various antimicrobials (over a 77 day period) were sequenced and analysed. Variant analysis revealed the existence of non-susceptible sub-populations derived from a common susceptible ancestor, with the predominant circulating clone(s) selected for by type and duration of antimicrobial exposure. This study highlights the dynamic nature of bacterial evolution and that non-susceptible sub-populations can emerge from clouds of variation upon antimicrobial exposure. Diagnostically, this has direct implications for sample selection when using whole-genome sequencing as a tool to guide clinical therapy. In the context of bacteraemia, deep sequencing of bacterial DNA directly from patient blood samples would avoid culture 'bias' and identify mutations associated with circulating non-susceptible sub-populations, some of which may confer cross-resistance to alternate therapies.
Barret, Matthieu; Egan, Frank; Fargier, Emilie; Morrissey, John P; O'Gara, Fergal
2011-06-01
Bacteria encode multiple protein secretion systems that are crucial for interaction with the environment and with hosts. In recent years, attention has focused on type VI secretion systems (T6SSs), which are specialized transporters widely encoded in Proteobacteria. The myriad of processes associated with these secretion systems could be explained by subclasses of T6SS, each involved in specialized functions. To assess diversity and predict function associated with different T6SSs, comparative genomic analysis of 34 Pseudomonas genomes was performed. This identified 70 T6SSs, with at least one locus in every strain, except for Pseudomonas stutzeri A1501. By comparing 11 core genes of the T6SS, it was possible to identify five main Pseudomonas phylogenetic clusters, with strains typically carrying T6SSs from more than one clade. In addition, most strains encode additional vgrG and hcp genes, which encode extracellular structural components of the secretion apparatus. Using a combination of phylogenetic and meta-analysis of transcriptome datasets it was possible to associate specific subsets of VgrG and Hcp proteins with each Pseudomonas T6SS clade. Moreover, a closer examination of the genomic context of vgrG genes in multiple strains highlights a number of additional genes associated with these regions. It is proposed that these genes may play a role in secretion or alternatively could be new T6S effectors.
Bioinformatics challenges for genome-wide association studies.
Moore, Jason H; Asselbergs, Folkert W; Williams, Scott M
2010-02-15
The sequencing of the human genome has made it possible to identify an informative set of >1 million single nucleotide polymorphisms (SNPs) across the genome that can be used to carry out genome-wide association studies (GWASs). The availability of massive amounts of GWAS data has necessitated the development of new biostatistical methods for quality control, imputation and analysis issues including multiple testing. This work has been successful and has enabled the discovery of new associations that have been replicated in multiple studies. However, it is now recognized that most SNPs discovered via GWAS have small effects on disease susceptibility and thus may not be suitable for improving health care through genetic testing. One likely explanation for the mixed results of GWAS is that the current biostatistical analysis paradigm is by design agnostic or unbiased in that it ignores all prior knowledge about disease pathobiology. Further, the linear modeling framework that is employed in GWAS often considers only one SNP at a time thus ignoring their genomic and environmental context. There is now a shift away from the biostatistical approach toward a more holistic approach that recognizes the complexity of the genotype-phenotype relationship that is characterized by significant heterogeneity and gene-gene and gene-environment interaction. We argue here that bioinformatics has an important role to play in addressing the complexity of the underlying genetic basis of common human diseases. The goal of this review is to identify and discuss those GWAS challenges that will require computational methods.
Impacts of Genome-Wide Analyses on Our Understanding of Human Herpesvirus Diversity and Evolution.
Renner, Daniel W; Szpara, Moriah L
2018-01-01
Until fairly recently, genome-wide evolutionary dynamics and within-host diversity were more commonly examined in the context of small viruses than in the context of large double-stranded DNA viruses such as herpesviruses. The high mutation rates and more compact genomes of RNA viruses have inspired the investigation of population dynamics for these species, and recent data now suggest that herpesviruses might also be considered candidates for population modeling. High-throughput sequencing (HTS) and bioinformatics have expanded our understanding of herpesviruses through genome-wide comparisons of sequence diversity, recombination, allele frequency, and selective pressures. Here we discuss recent data on the mechanisms that generate herpesvirus genomic diversity and underlie the evolution of these virus families. We focus on human herpesviruses, with key insights drawn from veterinary herpesviruses and other large DNA virus families. We consider the impacts of cell culture on herpesvirus genomes and how to accurately describe the viral populations under study. The need for a strong foundation of high-quality genomes is also discussed, since it underlies all secondary genomic analyses such as RNA sequencing (RNA-Seq), chromatin immunoprecipitation, and ribosome profiling. Areas where we foresee future progress, such as the linking of viral genetic differences to phenotypic or clinical outcomes, are highlighted as well. Copyright © 2017 Renner and Szpara.
Tam, Annie S; Chu, Jeffrey S C; Rose, Ann M
2015-11-12
Cancer therapy largely depends on chemotherapeutic agents that generate DNA lesions. However, our understanding of the nature of the resulting lesions as well as the mutational profiles of these chemotherapeutic agents is limited. Among these lesions, DNA interstrand crosslinks are among the more toxic types of DNA damage. Here, we have characterized the mutational spectrum of the commonly used DNA interstrand crosslinking agent mitomycin C (MMC). Using a combination of genetic mapping, whole genome sequencing, and genomic analysis, we have identified and confirmed several genomic lesions linked to MMC-induced DNA damage in Caenorhabditis elegans. Our data indicate that MMC predominantly causes deletions, with a 5'-CpG-3' sequence context prevalent in the deleted regions of DNA. Furthermore, we identified microhomology flanking the deletion junctions, indicative of DNA repair via nonhomologous end joining. Based on these results, we propose a general repair mechanism that is likely to be involved in the biological response to this highly toxic agent. In conclusion, the systematic study we have described provides insight into potential sequence specificity of MMC with DNA. Copyright © 2016 Tam et al.
The UCSC Genome Browser: What Every Molecular Biologist Should Know.
Mangan, Mary E; Williams, Jennifer M; Kuhn, Robert M; Lathe, Warren C
2014-07-01
Electronic data resources can enable molecular biologists to quickly get information from around the world that a decade ago would have been buried in papers scattered throughout the library. The ability to access, query, and display these data makes benchwork much more efficient and drives new discoveries. Increasingly, mastery of software resources and corresponding data repositories is required to fully explore the volume of data generated in biomedical and agricultural research, because only small amounts of data are actually found in traditional publications. The UCSC Genome Browser provides a wealth of data and tools that advance understanding of genomic context for many species, enable detailed analysis of data, and provide the ability to interrogate regions of interest across disparate data sets from a wide variety of sources. Researchers can also supplement the standard display with their own data to query and share this with others. Effective use of these resources has become crucial to biological research today, and this unit describes some practical applications of the UCSC Genome Browser. Copyright © 2014 John Wiley & Sons, Inc.
Social networks to biological networks: systems biology of Mycobacterium tuberculosis.
Vashisht, Rohit; Bhardwaj, Anshu; Osdd Consortium; Brahmachari, Samir K
2013-07-01
Contextualizing relevant information to construct a network that represents a given biological process presents a fundamental challenge in the network science of biology. The quality of network for the organism of interest is critically dependent on the extent of functional annotation of its genome. Mostly the automated annotation pipelines do not account for unstructured information present in volumes of literature and hence large fraction of genome remains poorly annotated. However, if used, this information could substantially enhance the functional annotation of a genome, aiding the development of a more comprehensive network. Mining unstructured information buried in volumes of literature often requires manual intervention to a great extent and thus becomes a bottleneck for most of the automated pipelines. In this review, we discuss the potential of scientific social networking as a solution for systematic manual mining of data. Focusing on Mycobacterium tuberculosis, as a case study, we discuss our open innovative approach for the functional annotation of its genome. Furthermore, we highlight the strength of such collated structured data in the context of drug target prediction based on systems level analysis of pathogen.
Zhang, Liangzhi; Jia, Shangang; Plath, Martin; Huang, Yongzhen; Li, Congjun; Lei, Chuzhao; Zhao, Xin; Chen, Hong
2015-01-01
Copy number variation (CNV) is an important component of genomic structural variation and plays a role not only in evolutionary diversification but also in domestication. Chinese cattle were derived from Bos taurus and Bos indicus, and several breeds presumably are of hybrid origin, but the evolution of CNV regions (CNVRs) has not yet been examined in this context. Here, we of CNVRs, mtDNA D-loop sequence variation, and Y-chromosomal single nucleotide polymorphisms to assess the impact of maternal and paternal B. taurus and B. indicus origins on the distribution of CNVRs in 24 Chinese domesticated bulls. We discovered 470 genome-wide CNVRs, only 72 of which were shared by all three Y-lineages (B. taurus: Y1, Y2; B. indicus: Y3), whereas 265 were shared by inferred taurine or indicine paternal lineages, and 228 when considering their maternal taurine or indicine origins. Phylogenetic analysis uncovered eight taurine/indicine hybrids, and principal component analysis on CNVs corroborated genomic exchange during hybridization. The distribution patterns of CNVRs tended to be lineage-specific, and correlation analysis revealed significant positive or negative co-occurrences of CNVRs across lineages. Our study suggests that CNVs in Chinese cattle partly result from selective breeding during domestication, but also from hybridization and introgression. PMID:26260653
UCSC Xena | Informatics Technology for Cancer Research (ITCR)
UCSC Xena securely analyzes and visualizes your private functional genomics data set in the context of public and shared genomic/phenotypic data sets such as TCGA, ICGC, TARGET, GTEx, and GA4GH (TOIL).
Cestaro, Alessandro; Sterck, Lieven; Fontana, Paolo; Van de Peer, Yves; Viola, Roberto; Velasco, Riccardo; Salamini, Francesco
2012-01-01
Plants have followed a reticulate type of evolution and taxa have frequently merged via allopolyploidization. A polyploid structure of sequenced genomes has often been proposed, but the chromosomes belonging to putative component genomes are difficult to identify. The 19 grapevine chromosomes are evolutionary stable structures: their homologous triplets have strongly conserved gene order, interrupted by rare translocations. The aim of this study is to examine how the grapevine nucleotide-binding site (NBS)-encoding resistance (NBS-R) genes have evolved in the genomic context and to understand mechanisms for the genome evolution. We show that, in grapevine, i) helitrons have significantly contributed to transposition of NBS-R genes, and ii) NBS-R gene cluster similarity indicates the existence of two groups of chromosomes (named as Va and Vc) that may have evolved independently. Chromosome triplets consist of two Va and one Vc chromosomes, as expected from the tetraploid and diploid conditions of the two component genomes. The hexaploid state could have been derived from either allopolyploidy or the separation of the Va and Vc component genomes in the same nucleus before fusion, as known for Rosaceae species. Time estimation indicates that grapevine component genomes may have fused about 60 mya, having had at least 40–60 mya to evolve independently. Chromosome number variation in the Vitaceae and related families, and the gap between the time of eudicot radiation and the age of Vitaceae fossils, are accounted for by our hypothesis. PMID:22253773
Dankar, Fida K; Ptitsyn, Andrey; Dankar, Samar K
2018-04-10
Contemporary biomedical databases include a wide range of information types from various observational and instrumental sources. Among the most important features that unite biomedical databases across the field are high volume of information and high potential to cause damage through data corruption, loss of performance, and loss of patient privacy. Thus, issues of data governance and privacy protection are essential for the construction of data depositories for biomedical research and healthcare. In this paper, we discuss various challenges of data governance in the context of population genome projects. The various challenges along with best practices and current research efforts are discussed through the steps of data collection, storage, sharing, analysis, and knowledge dissemination.
Informatics and computational strategies for the study of lipids.
Yetukuri, Laxman; Ekroos, Kim; Vidal-Puig, Antonio; Oresic, Matej
2008-02-01
Recent advances in mass spectrometry (MS)-based techniques for lipidomic analysis have empowered us with the tools that afford studies of lipidomes at the systems level. However, these techniques pose a number of challenges for lipidomic raw data processing, lipid informatics, and the interpretation of lipidomic data in the context of lipid function and structure. Integration of lipidomic data with other systemic levels, such as genomic or proteomic, in the context of molecular pathways and biophysical processes provides a basis for the understanding of lipid function at the systems level. The present report, based on the limited literature, is an update on a young but rapidly emerging field of lipid informatics and related pathway reconstruction strategies.
Liang, Winnie S.; Fonseca, Rafael; Bryce, Alan H.; McCullough, Ann E.; Barrett, Michael T.; Hunt, Katherine; Patel, Maitray D.; Young, Scott W.; Collins, Joseph M.; Silva, Alvin C.; Condjella, Rachel M.; Block, Matthew; McWilliams, Robert R.; Lazaridis, Konstantinos N.; Klee, Eric W.; Bible, Keith C.; Harris, Pamela; Oliver, Gavin R.; Bhavsar, Jaysheel D.; Nair, Asha A.; Middha, Sumit; Asmann, Yan; Kocher, Jean-Pierre; Schahl, Kimberly; Kipp, Benjamin R.; Barr Fritcher, Emily G.; Baker, Angela; Aldrich, Jessica; Kurdoglu, Ahmet; Izatt, Tyler; Christoforides, Alexis; Cherni, Irene; Nasser, Sara; Reiman, Rebecca; Phillips, Lori; McDonald, Jackie; Adkins, Jonathan; Mastrian, Stephen D.; Placek, Pamela; Watanabe, Aprill T.; LoBello, Janine; Han, Haiyong; Von Hoff, Daniel; Craig, David W.; Stewart, A. Keith; Carpten, John D.
2014-01-01
Advanced cholangiocarcinoma continues to harbor a difficult prognosis and therapeutic options have been limited. During the course of a clinical trial of whole genomic sequencing seeking druggable targets, we examined six patients with advanced cholangiocarcinoma. Integrated genome-wide and whole transcriptome sequence analyses were performed on tumors from six patients with advanced, sporadic intrahepatic cholangiocarcinoma (SIC) to identify potential therapeutically actionable events. Among the somatic events captured in our analysis, we uncovered two novel therapeutically relevant genomic contexts that when acted upon, resulted in preliminary evidence of anti-tumor activity. Genome-wide structural analysis of sequence data revealed recurrent translocation events involving the FGFR2 locus in three of six assessed patients. These observations and supporting evidence triggered the use of FGFR inhibitors in these patients. In one example, preliminary anti-tumor activity of pazopanib (in vitro FGFR2 IC50≈350 nM) was noted in a patient with an FGFR2-TACC3 fusion. After progression on pazopanib, the same patient also had stable disease on ponatinib, a pan-FGFR inhibitor (in vitro, FGFR2 IC50≈8 nM). In an independent non-FGFR2 translocation patient, exome and transcriptome analysis revealed an allele specific somatic nonsense mutation (E384X) in ERRFI1, a direct negative regulator of EGFR activation. Rapid and robust disease regression was noted in this ERRFI1 inactivated tumor when treated with erlotinib, an EGFR kinase inhibitor. FGFR2 fusions and ERRFI mutations may represent novel targets in sporadic intrahepatic cholangiocarcinoma and trials should be characterized in larger cohorts of patients with these aberrations. PMID:24550739
Next Generation Sequence Analysis and Computational Genomics Using Graphical Pipeline Workflows
Torri, Federica; Dinov, Ivo D.; Zamanyan, Alen; Hobel, Sam; Genco, Alex; Petrosyan, Petros; Clark, Andrew P.; Liu, Zhizhong; Eggert, Paul; Pierce, Jonathan; Knowles, James A.; Ames, Joseph; Kesselman, Carl; Toga, Arthur W.; Potkin, Steven G.; Vawter, Marquis P.; Macciardi, Fabio
2012-01-01
Whole-genome and exome sequencing have already proven to be essential and powerful methods to identify genes responsible for simple Mendelian inherited disorders. These methods can be applied to complex disorders as well, and have been adopted as one of the current mainstream approaches in population genetics. These achievements have been made possible by next generation sequencing (NGS) technologies, which require substantial bioinformatics resources to analyze the dense and complex sequence data. The huge analytical burden of data from genome sequencing might be seen as a bottleneck slowing the publication of NGS papers at this time, especially in psychiatric genetics. We review the existing methods for processing NGS data, to place into context the rationale for the design of a computational resource. We describe our method, the Graphical Pipeline for Computational Genomics (GPCG), to perform the computational steps required to analyze NGS data. The GPCG implements flexible workflows for basic sequence alignment, sequence data quality control, single nucleotide polymorphism analysis, copy number variant identification, annotation, and visualization of results. These workflows cover all the analytical steps required for NGS data, from processing the raw reads to variant calling and annotation. The current version of the pipeline is freely available at http://pipeline.loni.ucla.edu. These applications of NGS analysis may gain clinical utility in the near future (e.g., identifying miRNA signatures in diseases) when the bioinformatics approach is made feasible. Taken together, the annotation tools and strategies that have been developed to retrieve information and test hypotheses about the functional role of variants present in the human genome will help to pinpoint the genetic risk factors for psychiatric disorders. PMID:23139896
2011-01-01
Background Because biotechnological uses of bacteriophage gene products as alternatives to conventional antibiotics will require a thorough understanding of their genomic context, we sequenced and analyzed the genomes of four closely related phages isolated from Clostridium perfringens, an important agricultural and human pathogen. Results Phage whole-genome tetra-nucleotide signatures and proteomic tree topologies correlated closely with host phylogeny. Comparisons of our phage genomes to 26 others revealed three shared COGs; of particular interest within this core genome was an endolysin (PF01520, an N-acetylmuramoyl-L-alanine amidase) and a holin (PF04531). Comparative analyses of the evolutionary history and genomic context of these common phage proteins revealed two important results: 1) strongly significant host-specific sequence variation within the endolysin, and 2) a protein domain architecture apparently unique to our phage genomes in which the endolysin is located upstream of its associated holin. Endolysin sequences from our phages were one of two very distinct genotypes distinguished by variability within the putative enzymatically-active domain. The shared or core genome was comprised of genes with multiple sequence types belonging to five pfam families, and genes belonging to 12 pfam families, including the holin genes, which were nearly identical. Conclusions Significant genomic diversity exists even among closely-related bacteriophages. Holins and endolysins represent conserved functions across divergent phage genomes and, as we demonstrate here, endolysins can have significant variability and host-specificity even among closely-related genomes. Endolysins in our phage genomes may be subject to different selective pressures than the rest of the genome. These findings may have important implications for potential biotechnological applications of phage gene products. PMID:21631945
Disclosure of Incidental Findings From Next-Generation Sequencing in Pediatric Genomic Research
Abdul-Karim, Ruqayyah; Berkman, Benjamin E.; Wendler, David; Rid, Annette; Khan, Javed; Badgett, Tom
2013-01-01
Next-generation sequencing technologies will likely be used with increasing frequency in pediatric research. One consequence will be the increased identification of individual genomic research findings that are incidental to the aims of the research. Although researchers and ethicists have raised theoretical concerns about incidental findings in the context of genetic research, next-generation sequencing will make this once largely hypothetical concern an increasing reality. Most commentators have begun to accept the notion that there is some duty to disclose individual genetic research results to research subjects; however, the scope of that duty remains unclear. These issues are especially complicated in the pediatric setting, where subjects cannot currently but typically will eventually be able to make their own medical decisions at the age of adulthood. This article discusses the management of incidental findings in the context of pediatric genomic research. We provide an overview of the current literature and propose a framework to manage incidental findings in this unique context, based on what we believe is a limited responsibility to disclose. We hope this will be a useful source of guidance for investigators, institutional review boards, and bioethicists that anticipates the complicated ethical issues raised by advances in genomic technology. PMID:23400601
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation
Faria, José P.; Davis, James J.; Edirisinghe, Janaka N.; ...
2016-11-24
Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. A multitude of technologies, abstractions, and interpretive frameworks have emerged to answer the challenges presented by genome function and regulatory network inference. Here, we propose a new approach for producing biologically meaningful clusters of coexpressed genes, called Atomic Regulons (ARs), based on expression data, gene context, and functional relationships. We demonstrate this new approach by computing ARs for Escherichia coli, which we compare with the coexpressed gene clusters predicted by two prevalent existing methods: hierarchical clustering and k-meansmore » clustering. We test the consistency of ARs predicted by all methods against expected interactions predicted by the Context Likelihood of Relatedness (CLR) mutual information based method, finding that the ARs produced by our approach show better agreement with CLR interactions. We then apply our method to compute ARs for four other genomes: Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus. We compare the AR clusters from all genomes to study the similarity of coexpression among a phylogenetically diverse set of species, identifying subsystems that show remarkable similarity over wide phylogenetic distances. We also study the sensitivity of our method for computing ARs to the expression data used in the computation, showing that our new approach requires less data than competing approaches to converge to a near final configuration of ARs. We go on to use our sensitivity analysis to identify the specific experiments that lead most rapidly to the final set of ARs for E. coli. As a result, this analysis produces insights into improving the design of gene expression experiments.« less
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faria, José P.; Davis, James J.; Edirisinghe, Janaka N.
Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. A multitude of technologies, abstractions, and interpretive frameworks have emerged to answer the challenges presented by genome function and regulatory network inference. Here, we propose a new approach for producing biologically meaningful clusters of coexpressed genes, called Atomic Regulons (ARs), based on expression data, gene context, and functional relationships. We demonstrate this new approach by computing ARs for Escherichia coli, which we compare with the coexpressed gene clusters predicted by two prevalent existing methods: hierarchical clustering and k-meansmore » clustering. We test the consistency of ARs predicted by all methods against expected interactions predicted by the Context Likelihood of Relatedness (CLR) mutual information based method, finding that the ARs produced by our approach show better agreement with CLR interactions. We then apply our method to compute ARs for four other genomes: Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus. We compare the AR clusters from all genomes to study the similarity of coexpression among a phylogenetically diverse set of species, identifying subsystems that show remarkable similarity over wide phylogenetic distances. We also study the sensitivity of our method for computing ARs to the expression data used in the computation, showing that our new approach requires less data than competing approaches to converge to a near final configuration of ARs. We go on to use our sensitivity analysis to identify the specific experiments that lead most rapidly to the final set of ARs for E. coli. As a result, this analysis produces insights into improving the design of gene expression experiments.« less
Xiang, Yezi; Huang, Chien-Hsun; Hu, Yi; Wen, Jun; Li, Shisheng; Yi, Tingshuang; Chen, Hongyi; Xiang, Jun; Ma, Hong
2017-02-01
Fruits are the defining feature of angiosperms, likely have contributed to angiosperm successes by protecting and dispersing seeds, and provide foods to humans and other animals, with many morphological types and important ecological and agricultural implications. Rosaceae is a family with ∼3000 species and an extraordinary spectrum of distinct fruits, including fleshy peach, apple, and strawberry prized by their consumers, as well as dry achenetum and follicetum with features facilitating seed dispersal, excellent for studying fruit evolution. To address Rosaceae fruit evolution and other questions, we generated 125 new transcriptomic and genomic datasets and identified hundreds of nuclear genes to reconstruct a well-resolved Rosaceae phylogeny with highly supported monophyly of all subfamilies and tribes. Molecular clock analysis revealed an estimated age of ∼101.6 Ma for crown Rosaceae and divergence times of tribes and genera, providing a geological and climate context for fruit evolution. Phylogenomic analysis yielded strong evidence for numerous whole genome duplications (WGDs), supporting the hypothesis that the apple tribe had a WGD and revealing another one shared by fleshy fruit-bearing members of this tribe, with moderate support for WGDs in the peach tribe and other groups. Ancestral character reconstruction for fruit types supports independent origins of fleshy fruits from dry-fruit ancestors, including the evolution of drupes (e.g., peach) and pomes (e.g., apple) from follicetum, and drupetum (raspberry and blackberry) from achenetum. We propose that WGDs and environmental factors, including animals, contributed to the evolution of the many fruits in Rosaceae, which provide a foundation for understanding fruit evolution. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Understanding the Human Genome Project: Using Stations to Provide a Comprehensive Overview
ERIC Educational Resources Information Center
Soto, Julio G.
2005-01-01
A lesson was designed for lower division general education, non-major biology lecture-only course that included the historical and scientific context, some of the skills used to study the human genome, results, conclusions and ethical consideration. Students learn to examine and compare the published Human Genome maps, and employ the strategies…
USDA-ARS?s Scientific Manuscript database
The process of speciation is impacted by the interaction between the genomic architecture of diverging lineages and the environmental context they occupy. Yet, while climate can have a significant impact on this interaction, its role in determining the patterns of geographic and genomic divergence i...
Satellite DNA: An Evolving Topic
Garrido-Ramos, Manuel A.
2017-01-01
Satellite DNA represents one of the most fascinating parts of the repetitive fraction of the eukaryotic genome. Since the discovery of highly repetitive tandem DNA in the 1960s, a lot of literature has extensively covered various topics related to the structure, organization, function, and evolution of such sequences. Today, with the advent of genomic tools, the study of satellite DNA has regained a great interest. Thus, Next-Generation Sequencing (NGS), together with high-throughput in silico analysis of the information contained in NGS reads, has revolutionized the analysis of the repetitive fraction of the eukaryotic genomes. The whole of the historical and current approaches to the topic gives us a broad view of the function and evolution of satellite DNA and its role in chromosomal evolution. Currently, we have extensive information on the molecular, chromosomal, biological, and population factors that affect the evolutionary fate of satellite DNA, knowledge that gives rise to a series of hypotheses that get on well with each other about the origin, spreading, and evolution of satellite DNA. In this paper, I review these hypotheses from a methodological, conceptual, and historical perspective and frame them in the context of chromosomal organization and evolution. PMID:28926993
SD-MSAEs: Promoter recognition in human genome based on deep feature extraction.
Xu, Wenxuan; Zhang, Li; Lu, Yaping
2016-06-01
The prediction and recognition of promoter in human genome play an important role in DNA sequence analysis. Entropy, in Shannon sense, of information theory is a multiple utility in bioinformatic details analysis. The relative entropy estimator methods based on statistical divergence (SD) are used to extract meaningful features to distinguish different regions of DNA sequences. In this paper, we choose context feature and use a set of methods of SD to select the most effective n-mers distinguishing promoter regions from other DNA regions in human genome. Extracted from the total possible combinations of n-mers, we can get four sparse distributions based on promoter and non-promoters training samples. The informative n-mers are selected by optimizing the differentiating extents of these distributions. Specially, we combine the advantage of statistical divergence and multiple sparse auto-encoders (MSAEs) in deep learning to extract deep feature for promoter recognition. And then we apply multiple SVMs and a decision model to construct a human promoter recognition method called SD-MSAEs. Framework is flexible that it can integrate new feature extraction or new classification models freely. Experimental results show that our method has high sensitivity and specificity. Copyright © 2016 Elsevier Inc. All rights reserved.
Gene Fusion: A Genome Wide Survey
NASA Technical Reports Server (NTRS)
Liang, Ping; Riley, Monica
2001-01-01
As a well known fact, organisms form larger and complex multimodular (composite or chimeric) and mostly multi-functional proteins through gene fusion of two or more individual genes which have independent evolution histories and functions. We call each of these components a module. The existence of multimodular proteins may improves the efficiency in gene regulation and in cellular functions, and thus may give the host organism advantages in adaptation to environments. Analysis of all gene fusions in present-day organisms should allow us to examine the patterns of gene fusion in context with cellular functions, to trace back the evolution processes from the ancient smaller and uni-functional proteins to the present-day larger and complex multi-functional proteins, and to estimate the minimal number of ancestor proteins that existed in the last common ancestor for all life on earth. Although many multimodular proteins have been experimentally known, identification of gene fusion events systematically at genome scale had not been possible until recently when large number of completed genome sequences have been becoming available. In addition, technical difficulties for such analysis also exist due to the complexity of this biological and evolutionary process. We report from this study a new strategy to computationally identify multimodular proteins using completed genome sequences and the results surveyed from 22 organisms with the data from over 40 organisms to be presented during the meeting. Additional information is contained in the original extended abstract.
Clustering analysis of proteins from microbial genomes at multiple levels of resolution.
Zaslavsky, Leonid; Ciufo, Stacy; Fedorov, Boris; Tatusova, Tatiana
2016-08-31
Microbial genomes at the National Center for Biotechnology Information (NCBI) represent a large collection of more than 35,000 assemblies. There are several complexities associated with the data: a great variation in sampling density since human pathogens are densely sampled while other bacteria are less represented; different protein families occur in annotations with different frequencies; and the quality of genome annotation varies greatly. In order to extract useful information from these sophisticated data, the analysis needs to be performed at multiple levels of phylogenomic resolution and protein similarity, with an adequate sampling strategy. Protein clustering is used to construct meaningful and stable groups of similar proteins to be used for analysis and functional annotation. Our approach is to create protein clusters at three levels. First, tight clusters in groups of closely-related genomes (species-level clades) are constructed using a combined approach that takes into account both sequence similarity and genome context. Second, clustroids of conservative in-clade clusters are organized into seed global clusters. Finally, global protein clusters are built around the the seed clusters. We propose filtering strategies that allow limiting the protein set included in global clustering. The in-clade clustering procedure, subsequent selection of clustroids and organization into seed global clusters provides a robust representation and high rate of compression. Seed protein clusters are further extended by adding related proteins. Extended seed clusters include a significant part of the data and represent all major known cell machinery. The remaining part, coming from either non-conservative (unique) or rapidly evolving proteins, from rare genomes, or resulting from low-quality annotation, does not group together well. Processing these proteins requires significant computational resources and results in a large number of questionable clusters. The developed filtering strategies allow to identify and exclude such peripheral proteins limiting the protein dataset in global clustering. Overall, the proposed methodology allows the relevant data at different levels of details to be obtained and data redundancy eliminated while keeping biologically interesting variations.
Analysis of the Genome Structure of the Nonpathogenic Probiotic Escherichia coli Strain Nissle 1917
Grozdanov, Lubomir; Raasch, Carsten; Schulze, Jürgen; Sonnenborn, Ulrich; Gottschalk, Gerhard; Hacker, Jörg; Dobrindt, Ulrich
2004-01-01
Nonpathogenic Escherichia coli strain Nissle 1917 (O6:K5:H1) is used as a probiotic agent in medicine, mainly for the treatment of various gastroenterological diseases. To gain insight on the genetic level into its properties of colonization and commensalism, this strain's genome structure has been analyzed by three approaches: (i) sequence context screening of tRNA genes as a potential indication of chromosomal integration of horizontally acquired DNA, (ii) sequence analysis of 280 kb of genomic islands (GEIs) coding for important fitness factors, and (iii) comparison of Nissle 1917 genome content with that of other E. coli strains by DNA-DNA hybridization. PCR-based screening of 324 nonpathogenic and pathogenic E. coli isolates of different origins revealed that some chromosomal regions are frequently detectable in nonpathogenic E. coli and also among extraintestinal and intestinal pathogenic strains. Many known fitness factor determinants of strain Nissle 1917 are localized on four GEIs which have been partially sequenced and analyzed. Comparison of these data with the available knowledge of the genome structure of E. coli K-12 strain MG1655 and of uropathogenic E. coli O6 strains CFT073 and 536 revealed structural similarities on the genomic level, especially between the E. coli O6 strains. The lack of defined virulence factors (i.e., alpha-hemolysin, P-fimbrial adhesins, and the semirough lipopolysaccharide phenotype) combined with the expression of fitness factors such as microcins, different iron uptake systems, adhesins, and proteases, which may support its survival and successful colonization of the human gut, most likely contributes to the probiotic character of E. coli strain Nissle 1917. PMID:15292145
Quail, Michael A; Smith, Miriam; Coupland, Paul; Otto, Thomas D; Harris, Simon R; Connor, Thomas R; Bertoni, Anna; Swerdlow, Harold P; Gu, Yong
2012-07-24
Next generation sequencing (NGS) technology has revolutionized genomic and genetic research. The pace of change in this area is rapid with three major new sequencing platforms having been released in 2011: Ion Torrent's PGM, Pacific Biosciences' RS and the Illumina MiSeq. Here we compare the results obtained with those platforms to the performance of the Illumina HiSeq, the current market leader. In order to compare these platforms, and get sufficient coverage depth to allow meaningful analysis, we have sequenced a set of 4 microbial genomes with mean GC content ranging from 19.3 to 67.7%. Together, these represent a comprehensive range of genome content. Here we report our analysis of that sequence data in terms of coverage distribution, bias, GC distribution, variant detection and accuracy. Sequence generated by Ion Torrent, MiSeq and Pacific Biosciences technologies displays near perfect coverage behaviour on GC-rich, neutral and moderately AT-rich genomes, but a profound bias was observed upon sequencing the extremely AT-rich genome of Plasmodium falciparum on the PGM, resulting in no coverage for approximately 30% of the genome. We analysed the ability to call variants from each platform and found that we could call slightly more variants from Ion Torrent data compared to MiSeq data, but at the expense of a higher false positive rate. Variant calling from Pacific Biosciences data was possible but higher coverage depth was required. Context specific errors were observed in both PGM and MiSeq data, but not in that from the Pacific Biosciences platform. All three fast turnaround sequencers evaluated here were able to generate usable sequence. However there are key differences between the quality of that data and the applications it will support.
Chowdhary, Nupoor; Selvaraj, Ashok; KrishnaKumaar, Lakshmi; Kumar, Gopal Ramesh
2015-01-01
Caldicellulosiruptor saccharolyticus has proven itself to be an excellent candidate for biological hydrogen (H2) production, but still it has major drawbacks like sensitivity to high osmotic pressure and low volumetric H2 productivity, which should be considered before it can be used industrially. A whole genome re-annotation work has been carried out as an attempt to update the incomplete genome information that causes gap in the knowledge especially in the area of metabolic engineering, to improve the H2 producing capabilities of C. saccharolyticus. Whole genome re-annotation was performed through manual means for 2,682 Coding Sequences (CDSs). Bioinformatics tools based on sequence similarity, motif search, phylogenetic analysis and fold recognition were employed for re-annotation. Our methodology could successfully add functions for 409 hypothetical proteins (HPs), 46 proteins previously annotated as putative and assigned more accurate functions for the known protein sequences. Homology based gene annotation has been used as a standard method for assigning function to novel proteins, but over the past few years many non-homology based methods such as genomic context approaches for protein function prediction have been developed. Using non-homology based functional prediction methods, we were able to assign cellular processes or physical complexes for 249 hypothetical sequences. Our re-annotation pipeline highlights the addition of 231 new CDSs generated from MicroScope Platform, to the original genome with functional prediction for 49 of them. The re-annotation of HPs and new CDSs is stored in the relational database that is available on the MicroScope web-based platform. In parallel, a comparative genome analyses were performed among the members of genus Caldicellulosiruptor to understand the function and evolutionary processes. Further, with results from integrated re-annotation studies (homology and genomic context approach), we strongly suggest that Csac_0437 and Csac_0424 encode for glycoside hydrolases (GH) and are proposed to be involved in the decomposition of recalcitrant plant polysaccharides. Similarly, HPs: Csac_0732, Csac_1862, Csac_1294 and Csac_0668 are suggested to play a significant role in biohydrogen production. Function prediction of these HPs by using our integrated approach will considerably enhance the interpretation of large-scale experiments targeting this industrially important organism. PMID:26196387
Chowdhary, Nupoor; Selvaraj, Ashok; KrishnaKumaar, Lakshmi; Kumar, Gopal Ramesh
2015-01-01
Caldicellulosiruptor saccharolyticus has proven itself to be an excellent candidate for biological hydrogen (H2) production, but still it has major drawbacks like sensitivity to high osmotic pressure and low volumetric H2 productivity, which should be considered before it can be used industrially. A whole genome re-annotation work has been carried out as an attempt to update the incomplete genome information that causes gap in the knowledge especially in the area of metabolic engineering, to improve the H2 producing capabilities of C. saccharolyticus. Whole genome re-annotation was performed through manual means for 2,682 Coding Sequences (CDSs). Bioinformatics tools based on sequence similarity, motif search, phylogenetic analysis and fold recognition were employed for re-annotation. Our methodology could successfully add functions for 409 hypothetical proteins (HPs), 46 proteins previously annotated as putative and assigned more accurate functions for the known protein sequences. Homology based gene annotation has been used as a standard method for assigning function to novel proteins, but over the past few years many non-homology based methods such as genomic context approaches for protein function prediction have been developed. Using non-homology based functional prediction methods, we were able to assign cellular processes or physical complexes for 249 hypothetical sequences. Our re-annotation pipeline highlights the addition of 231 new CDSs generated from MicroScope Platform, to the original genome with functional prediction for 49 of them. The re-annotation of HPs and new CDSs is stored in the relational database that is available on the MicroScope web-based platform. In parallel, a comparative genome analyses were performed among the members of genus Caldicellulosiruptor to understand the function and evolutionary processes. Further, with results from integrated re-annotation studies (homology and genomic context approach), we strongly suggest that Csac_0437 and Csac_0424 encode for glycoside hydrolases (GH) and are proposed to be involved in the decomposition of recalcitrant plant polysaccharides. Similarly, HPs: Csac_0732, Csac_1862, Csac_1294 and Csac_0668 are suggested to play a significant role in biohydrogen production. Function prediction of these HPs by using our integrated approach will considerably enhance the interpretation of large-scale experiments targeting this industrially important organism.
Jo, Yeong Deuk; Choi, Yoomi; Kim, Dong-Hwan; Kim, Byung-Dong; Kang, Byoung-Cheorl
2014-07-04
Cytoplasmic male sterility (CMS) is an inability to produce functional pollen that is caused by mutation of the mitochondrial genome. Comparative analyses of mitochondrial genomes of lines with and without CMS in several species have revealed structural differences between genomes, including extensive rearrangements caused by recombination. However, the mitochondrial genome structure and the DNA rearrangements that may be related to CMS have not been characterized in Capsicum spp. We obtained the complete mitochondrial genome sequences of the pepper CMS line FS4401 (507,452 bp) and the fertile line Jeju (511,530 bp). Comparative analysis between mitochondrial genomes of peppers and tobacco that are included in Solanaceae revealed extensive DNA rearrangements and poor conservation in non-coding DNA. In comparison between pepper lines, FS4401 and Jeju mitochondrial DNAs contained the same complement of protein coding genes except for one additional copy of an atp6 gene (ψatp6-2) in FS4401. In terms of genome structure, we found eighteen syntenic blocks in the two mitochondrial genomes, which have been rearranged in each genome. By contrast, sequences between syntenic blocks, which were specific to each line, accounted for 30,380 and 17,847 bp in FS4401 and Jeju, respectively. The previously-reported CMS candidate genes, orf507 and ψatp6-2, were located on the edges of the largest sequence segments that were specific to FS4401. In this region, large number of small sequence segments which were absent or found on different locations in Jeju mitochondrial genome were combined together. The incorporation of repeats and overlapping of connected sequence segments by a few nucleotides implied that extensive rearrangements by homologous recombination might be involved in evolution of this region. Further analysis using mtDNA pairs from other plant species revealed common features of DNA regions around CMS-associated genes. Although large portion of sequence context was shared by mitochondrial genomes of CMS and male-fertile pepper lines, extensive genome rearrangements were detected. CMS candidate genes located on the edges of highly-rearranged CMS-specific DNA regions and near to repeat sequences. These characteristics were detected among CMS-associated genes in other species, implying a common mechanism might be involved in the evolution of CMS-associated genes.
Soh, Jung; Gordon, Paul MK; Taschuk, Morgan L; Dong, Anguo; Ah-Seng, Andrew C; Turinsky, Andrei L; Sensen, Christoph W
2008-01-01
Background The Bluejay genome browser has been developed over several years to address the challenges posed by the ever increasing number of data types as well as the increasing volume of data in genome research. Beginning with a browser capable of rendering views of XML-based genomic information and providing scalable vector graphics output, we have now completed version 1.0 of the system with many additional features. Our development efforts were guided by our observation that biologists who use both gene expression profiling and comparative genomics gain functional insights above and beyond those provided by traditional per-gene analyses. Results Bluejay 1.0 is a genome viewer integrating genome annotation with: (i) gene expression information; and (ii) comparative analysis with an unlimited number of other genomes in the same view. This allows the biologist to see a gene not just in the context of its genome, but also its regulation and its evolution. Bluejay now has rich provision for personalization by users: (i) numerous display customization features; (ii) the availability of waypoints for marking multiple points of interest on a genome and subsequently utilizing them; and (iii) the ability to take user relevance feedback of annotated genes or textual items to offer personalized recommendations. Bluejay 1.0 also embeds the Seahawk browser for the Moby protocol, enabling users to seamlessly invoke hundreds of Web Services on genomic data of interest without any hard-coding. Conclusion Bluejay offers a unique set of customizable genome-browsing features, with the goal of allowing biologists to quickly focus on, analyze, compare, and retrieve related information on the parts of the genomic data they are most interested in. We expect these capabilities of Bluejay to benefit the many biologists who want to answer complex questions using the information available from completely sequenced genomes. PMID:18940007
Roach, David J.; Burton, Joshua N.; Lee, Choli; Stackhouse, Bethany; Butler-Wu, Susan M.; Cookson, Brad T.
2015-01-01
Bacterial whole genome sequencing holds promise as a disruptive technology in clinical microbiology, but it has not yet been applied systematically or comprehensively within a clinical context. Here, over the course of one year, we performed prospective collection and whole genome sequencing of nearly all bacterial isolates obtained from a tertiary care hospital’s intensive care units (ICUs). This unbiased collection of 1,229 bacterial genomes from 391 patients enables detailed exploration of several features of clinical pathogens. A sizable fraction of isolates identified as clinically relevant corresponded to previously undescribed species: 12% of isolates assigned a species-level classification by conventional methods actually qualified as distinct, novel genomospecies on the basis of genomic similarity. Pan-genome analysis of the most frequently encountered pathogens in the collection revealed substantial variation in pan-genome size (1,420 to 20,432 genes) and the rate of gene discovery (1 to 152 genes per isolate sequenced). Surprisingly, although potential nosocomial transmission of actively surveilled pathogens was rare, 8.7% of isolates belonged to genomically related clonal lineages that were present among multiple patients, usually with overlapping hospital admissions, and were associated with clinically significant infection in 62% of patients from which they were recovered. Multi-patient clonal lineages were particularly evident in the neonatal care unit, where seven separate Staphylococcus epidermidis clonal lineages were identified, including one lineage associated with bacteremia in 5/9 neonates. Our study highlights key differences in the information made available by conventional microbiological practices versus whole genome sequencing, and motivates the further integration of microbial genome sequencing into routine clinical care. PMID:26230489
Genome projects and the functional-genomic era.
Sauer, Sascha; Konthur, Zoltán; Lehrach, Hans
2005-12-01
The problems we face today in public health as a result of the -- fortunately -- increasing age of people and the requirements of developing countries create an urgent need for new and innovative approaches in medicine and in agronomics. Genomic and functional genomic approaches have a great potential to at least partially solve these problems in the future. Important progress has been made by procedures to decode genomic information of humans, but also of other key organisms. The basic comprehension of genomic information (and its transfer) should now give us the possibility to pursue the next important step in life science eventually leading to a basic understanding of biological information flow; the elucidation of the function of all genes and correlative products encoded in the genome, as well as the discovery of their interactions in a molecular context and the response to environmental factors. As a result of the sequencing projects, we are now able to ask important questions about sequence variation and can start to comprehensively study the function of expressed genes on different levels such as RNA, protein or the cell in a systematic context including underlying networks. In this article we review and comment on current trends in large-scale systematic biological research. A particular emphasis is put on technology developments that can provide means to accomplish the tasks of future lines of functional genomics.
Gonzalez-Escalona, Narjol; Jolley, Keith A; Reed, Elizabeth; Martinez-Urtaza, Jaime
2017-06-01
Vibrio parahaemolyticus is an important human foodborne pathogen whose transmission is associated with the consumption of contaminated seafood, with a growing number of infections reported over recent years worldwide. A multilocus sequence typing (MLST) database for V. parahaemolyticus was created in 2008, and a large number of clones have been identified, causing severe outbreaks worldwide (sequence type 3 [ST3]), recurrent outbreaks in certain regions (e.g., ST36), or spreading to other regions where they are nonendemic (e.g., ST88 or ST189). The current MLST scheme uses sequences of 7 genes to generate an ST, which results in a powerful tool for inferring the population structure of this pathogen, although with limited resolution, especially compared to pulsed-field gel electrophoresis (PFGE). The application of whole-genome sequencing (WGS) has become routine for trace back investigations, with core genome MLST (cgMLST) analysis as one of the most straightforward ways to explore complex genomic data in an epidemiological context. Therefore, there is a need to generate a new, portable, standardized, and more advanced system that provides higher resolution and discriminatory power among V. parahaemolyticus strains using WGS data. We sequenced 92 V. parahaemolyticus genomes and used the genome of strain RIMD 2210633 as a reference (with a total of 4,832 genes) to determine which genes were suitable for establishing a V. parahaemolyticus cgMLST scheme. This analysis resulted in the identification of 2,254 suitable core genes for use in the cgMLST scheme. To evaluate the performance of this scheme, we performed a cgMLST analysis of 92 newly sequenced genomes, plus an additional 142 strains with genomes available at NCBI. cgMLST analysis was able to distinguish related and unrelated strains, including those with the same ST, clearly showing its enhanced resolution over conventional MLST analysis. It also distinguished outbreak-related from non-outbreak-related strains within the same ST. The sequences obtained from this work were deposited and are available in the public database (http://pubmlst.org/vparahaemolyticus). The application of this cgMLST scheme to the characterization of V. parahaemolyticus strains provided by different laboratories from around the world will reveal the global picture of the epidemiology, spread, and evolution of this pathogen and will become a powerful tool for outbreak investigations, allowing for the unambiguous comparison of strains with global coverage. Copyright © 2017 Gonzalez-Escalona et al.
Experimental Induction of Genome Chaos.
Ye, Christine J; Liu, Guo; Heng, Henry H
2018-01-01
Genome chaos, or karyotype chaos, represents a powerful survival strategy for somatic cells under high levels of stress/selection. Since the genome context, not the gene content, encodes the genomic blueprint of the cell, stress-induced rapid and massive reorganization of genome topology functions as a very important mechanism for genome (karyotype) evolution. In recent years, the phenomenon of genome chaos has been confirmed by various sequencing efforts, and many different terms have been coined to describe different subtypes of the chaotic genome including "chromothripsis," "chromoplexy," and "structural mutations." To advance this exciting field, we need an effective experimental system to induce and characterize the karyotype reorganization process. In this chapter, an experimental protocol to induce chaotic genomes is described, following a brief discussion of the mechanism and implication of genome chaos in cancer evolution.
PROGRESS IN ACUTE MYELOID LEUKEMIA
Kadia, Tapan M.; Ravandi, Farhad; O’Brien, Susan; Cortes, Jorge; Kantarjian, Hagop M.
2014-01-01
Significant progress has been made in the treatment of acute myeloid leukemia (AML). Steady gains in clinical research and a renaissance of genomics in leukemia have led to improved outcomes. The recognition of tremendous heterogeneity in AML has allowed individualized treatments of specific disease entities within the context of patient age, cytogenetics, and mutational analysis. The following is a comprehensive review of the current state of AML therapy and a roadmap of our approach to these distinct disease entities. PMID:25441110
[The pathogenesis and regulation of autoimmunity].
Miyake, Sachiko
2008-06-01
The pathogenesis of autoimmunity has been studied extensively using animal models and genome-wide genetic analysis. Moreover, recent advance in the therapy for the autoimmune diseases using molecular-targeted drugs has provided us a lot of information in the pathogenesis of human autoimmune diseases. In this review, we overviewed the recent progress in the study of autoimmunity including central tolerance, regulatory cells and cytokines. Finally, we discuss the relationship of innate immunity and adoptive immunity in the context of autoimmunity.
Transposable Element Dynamics among Asymbiotic and Ectomycorrhizal Amanita Fungi
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hess, Jaqueline; Skrede, Inger; Wolfe, Benjamin E.
Transposable elements (TEs) are ubiquitous inhabitants of eukaryotic genomes and their proliferation and dispersal shape genome architectures and diversity. Nevertheless, TE dynamics are often explored for one species at a time and are rarely considered in ecological contexts. Recent work with plant pathogens suggests a link between symbiosis and TE abundance. The genomes of pathogenic fungi appear to house an increased abundance of TEs, and TEs are frequently associated with the genes involved in symbiosis. To investigate whether this pattern is general, and relevant to mutualistic plant-fungal symbioses, we sequenced the genomes of related asymbiotic (AS) and ectomycorrhizal (ECM) Amanitamore » fungi. We used methods developed to interrogate both assembled and unassembled sequences, and characterized and quantified TEs across three AS and three ECM species, including the AS outgroup Volvariella volvacea. The ECM genomes are characterized by abundant numbers of TEs, an especially prominent feature of unassembled sequencing libraries. Increased TE activity in ECM species is also supported by phylogenetic analysis of the three most abundant TE superfamilies; phylogenies revealed many radiations within contemporary ECM species. However, the AS species Amanita thiersii also houses extensive amplifications of elements, highlighting the influence of additional evolutionary parameters on TE abundance. Our analyses provide further evidence for a link between symbiotic associations among plants and fungi, and increased TE activity, while highlighting the importance individual species’ natural histories may have in shaping genome architecture.« less
Transposable Element Dynamics among Asymbiotic and Ectomycorrhizal Amanita Fungi
Hess, Jaqueline; Skrede, Inger; Wolfe, Benjamin E.; ...
2014-06-12
Transposable elements (TEs) are ubiquitous inhabitants of eukaryotic genomes and their proliferation and dispersal shape genome architectures and diversity. Nevertheless, TE dynamics are often explored for one species at a time and are rarely considered in ecological contexts. Recent work with plant pathogens suggests a link between symbiosis and TE abundance. The genomes of pathogenic fungi appear to house an increased abundance of TEs, and TEs are frequently associated with the genes involved in symbiosis. To investigate whether this pattern is general, and relevant to mutualistic plant-fungal symbioses, we sequenced the genomes of related asymbiotic (AS) and ectomycorrhizal (ECM) Amanitamore » fungi. We used methods developed to interrogate both assembled and unassembled sequences, and characterized and quantified TEs across three AS and three ECM species, including the AS outgroup Volvariella volvacea. The ECM genomes are characterized by abundant numbers of TEs, an especially prominent feature of unassembled sequencing libraries. Increased TE activity in ECM species is also supported by phylogenetic analysis of the three most abundant TE superfamilies; phylogenies revealed many radiations within contemporary ECM species. However, the AS species Amanita thiersii also houses extensive amplifications of elements, highlighting the influence of additional evolutionary parameters on TE abundance. Our analyses provide further evidence for a link between symbiotic associations among plants and fungi, and increased TE activity, while highlighting the importance individual species’ natural histories may have in shaping genome architecture.« less
Nuclear Mitochondrial DNA Activates Replication in Saccharomyces cerevisiae
Chatre, Laurent; Ricchetti, Miria
2011-01-01
The nuclear genome of eukaryotes is colonized by DNA fragments of mitochondrial origin, called NUMTs. These insertions have been associated with a variety of germ-line diseases in humans. The significance of this uptake of potentially dangerous sequences into the nuclear genome is unclear. Here we provide functional evidence that sequences of mitochondrial origin promote nuclear DNA replication in Saccharomyces cerevisiae. We show that NUMTs are rich in key autonomously replicating sequence (ARS) consensus motifs, whose mutation results in the reduction or loss of DNA replication activity. Furthermore, 2D-gel analysis of the mrc1 mutant exposed to hydroxyurea shows that several NUMTs function as late chromosomal origins. We also show that NUMTs located close to or within ARS provide key sequence elements for replication. Thus NUMTs can act as independent origins, when inserted in an appropriate genomic context or affect the efficiency of pre-existing origins. These findings show that migratory mitochondrial DNAs can impact on the replication of the nuclear region they are inserted in. PMID:21408151
Nuclear mitochondrial DNA activates replication in Saccharomyces cerevisiae.
Chatre, Laurent; Ricchetti, Miria
2011-03-08
The nuclear genome of eukaryotes is colonized by DNA fragments of mitochondrial origin, called NUMTs. These insertions have been associated with a variety of germ-line diseases in humans. The significance of this uptake of potentially dangerous sequences into the nuclear genome is unclear. Here we provide functional evidence that sequences of mitochondrial origin promote nuclear DNA replication in Saccharomyces cerevisiae. We show that NUMTs are rich in key autonomously replicating sequence (ARS) consensus motifs, whose mutation results in the reduction or loss of DNA replication activity. Furthermore, 2D-gel analysis of the mrc1 mutant exposed to hydroxyurea shows that several NUMTs function as late chromosomal origins. We also show that NUMTs located close to or within ARS provide key sequence elements for replication. Thus NUMTs can act as independent origins, when inserted in an appropriate genomic context or affect the efficiency of pre-existing origins. These findings show that migratory mitochondrial DNAs can impact on the replication of the nuclear region they are inserted in.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vamathevan, Jessica J., E-mail: jessica.j.vamathevan@gsk.com; Hall, Matthew D.; Hasan, Samiul
Improving drug attrition remains a challenge in pharmaceutical discovery and development. A major cause of early attrition is the demonstration of safety signals which can negate any therapeutic index previously established. Safety attrition needs to be put in context of clinical translation (i.e. human relevance) and is negatively impacted by differences between animal models and human. In order to minimize such an impact, an earlier assessment of pharmacological target homology across animal model species will enhance understanding of the context of animal safety signals and aid species selection during later regulatory toxicology studies. Here we sequenced the genomes of themore » Sus scrofa Göttingen minipig and the Canis familiaris beagle, two widely used animal species in regulatory safety studies. Comparative analyses of these new genomes with other key model organisms, namely mouse, rat, cynomolgus macaque, rhesus macaque, two related breeds (S. scrofa Duroc and C. familiaris boxer) and human reveal considerable variation in gene content. Key genes in toxicology and metabolism studies, such as the UGT2 family, CYP2D6, and SLCO1A2, displayed unique duplication patterns. Comparisons of 317 known human drug targets revealed surprising variation such as species-specific positive selection, duplication and higher occurrences of pseudogenized targets in beagle (41 genes) relative to minipig (19 genes). These data will facilitate the more effective use of animals in biomedical research. - Highlights: • Genomes of the minipig and beagle dog, two species used in pharmaceutical studies. • First systematic comparative genome analysis of human and six experimental animals. • Key drug toxicology genes display unique duplication patterns across species. • Comparison of 317 drug targets show species-specific evolutionary patterns.« less
Cerveau, Nicolas; Gilbert, Clément; Liu, Chao; Garrett, Roger A; Grève, Pierre; Bouchon, Didier; Cordaux, Richard
2015-06-10
Transposable elements (TEs) are DNA pieces that are present in almost all the living world at variable genomic density. Due to their mobility and density, TEs are involved in a large array of genomic modifications. In eukaryotes, TE expression has been studied in detail in several species. In prokaryotes, studies of IS expression are generally linked to particular copies that induce a modification of neighboring gene expression. Here we investigated global patterns of IS transcription in the Alphaproteobacterial endosymbiont Wolbachia wVulC, using both RT-PCR and bioinformatic analyses. We detected several transcriptional promoters in all IS groups. Nevertheless, only one of the potentially functional IS groups possesses a promoter located upstream of the transposase gene, that could lead up to the production of a functional protein. We found that the majority of IS groups are expressed whatever their functional status. RT-PCR analyses indicate that the transcription of two IS groups lacking internal promoters upstream of the transposase start codon may be driven by the genomic environment. We confirmed this observation with the transcription analysis of individual copies of one IS group. These results suggest that the genomic environment is important for IS expression and it could explain, at least partly, copy number variability of the various IS groups present in the wVulC genome and, more generally, in bacterial genomes. Copyright © 2015 Elsevier B.V. All rights reserved.
2013-01-01
Background Streptococcus agalactiae, also referred to as Group B Streptococcus (GBS), is a frequent resident of the rectovaginal tract in humans, and a major cause of neonatal infection. In addition, S. agalactiae is a known fish pathogen, which compromises food safety and represents a zoonotic hazard. The complete genome sequence of the piscine S. agalactiae isolate GD201008-001 was compared with 14 other piscine, human and bovine strains to explore their virulence determinants, evolutionary relationships and the genetic basis of host tropism in S. agalactiae. Results The pan-genome of S. agalactiae is open and its size increases with the addition of newly sequenced genomes. The core genes shared by all isolates account for 50 ~ 70% of any single genome. The Chinese piscine isolates GD201008-001 and ZQ0910 are phylogenetically distinct from the Latin American piscine isolates SA20-06 and STIR-CD-17, but are closely related to the human strain A909, in the context of the clustered regularly interspaced short palindromic repeats (CRISPRs), prophage, virulence-associated genes and phylogenetic relationships. We identified a unique 10 kb gene locus in Chinese piscine strains. Conclusions Isolates from cultured tilapia in China have a close genomic relationship with the human strain A909. Our findings provide insight into the pathogenesis and host-associated genome content of piscine S. agalactiae isolated in China. PMID:24215651
Social and Communicative Functions of Informed Consent Forms in East Asia and Beyond
Yoshizawa, Go; Sasongko, Teguh H.; Ho, Chih-Hsing; Kato, Kazuto
2017-01-01
The recent research and technology development in medical genomics has raised new issues that are profoundly different from those encountered in traditional clinical research for which informed consent was developed. Global initiatives for international collaboration and public participation in genomics research now face an increasing demand for new forms of informed consent which reflect local contexts. This article analyzes informed consent forms (ICFs) for genomic research formulated by four selected research programs and institutes in East Asia – the Medical Genome Science Program in Japan, Universiti Sains Malaysia Human Research Ethics Committee in Malaysia, and the Taiwan Biobank and the Taipei Medical University- Joint Institutional Review Board in Taiwan. The comparative text analysis highlights East Asian contexts as distinct from other regions by identifying communicative and social functions of consent forms. The communicative functions include re-contact options and offering interactive support for research participants, and setting opportunities for family or community engagement in the consent process. This implies that informed consent cannot be validated solely with the completion of a consent form at the initial stage of the research, and informed consent templates can facilitate interactions between researchers and participants through (even before and after) the research process. The social functions consist of informing participants of possible social risks that include genetic discrimination, sample and data sharing, and highlighting the role of ethics committees. Although international ethics harmonization and the subsequent coordination of consent forms may be necessary to maintain the quality and consistency of consent process for data-intensive international research, it is also worth paying more attention to the local values and different settings that exist where research participants are situated for research in medical genomics. More than simply tools to gain consent from research participants, ICFs function rather as a device of social communication between research communities and civic communities in liaison with intermediary agents like ethics committees, genetic counselors, and public biobanks and databases. PMID:28775738
Social and Communicative Functions of Informed Consent Forms in East Asia and Beyond.
Yoshizawa, Go; Sasongko, Teguh H; Ho, Chih-Hsing; Kato, Kazuto
2017-01-01
The recent research and technology development in medical genomics has raised new issues that are profoundly different from those encountered in traditional clinical research for which informed consent was developed. Global initiatives for international collaboration and public participation in genomics research now face an increasing demand for new forms of informed consent which reflect local contexts. This article analyzes informed consent forms (ICFs) for genomic research formulated by four selected research programs and institutes in East Asia - the Medical Genome Science Program in Japan, Universiti Sains Malaysia Human Research Ethics Committee in Malaysia, and the Taiwan Biobank and the Taipei Medical University- Joint Institutional Review Board in Taiwan. The comparative text analysis highlights East Asian contexts as distinct from other regions by identifying communicative and social functions of consent forms. The communicative functions include re-contact options and offering interactive support for research participants, and setting opportunities for family or community engagement in the consent process. This implies that informed consent cannot be validated solely with the completion of a consent form at the initial stage of the research, and informed consent templates can facilitate interactions between researchers and participants through (even before and after) the research process. The social functions consist of informing participants of possible social risks that include genetic discrimination, sample and data sharing, and highlighting the role of ethics committees. Although international ethics harmonization and the subsequent coordination of consent forms may be necessary to maintain the quality and consistency of consent process for data-intensive international research, it is also worth paying more attention to the local values and different settings that exist where research participants are situated for research in medical genomics. More than simply tools to gain consent from research participants, ICFs function rather as a device of social communication between research communities and civic communities in liaison with intermediary agents like ethics committees, genetic counselors, and public biobanks and databases.
Derived Immune and Ancestral Pigmentation Alleles in a 7,000-Year-old Mesolithic European
Olalde, Iñigo; Allentoft, Morten E.; Sánchez-Quinto, Federico; Santpere, Gabriel; Chiang, Charleston W. K.; DeGiorgio, Michael; Prado-Martínez, Javier; Rodríguez, Juan Antonio; Rasmussen, Simon; Quilez, Javier; Ramírez, Oscar; Marigorta, Urko M.; Fernández-Callejo, Marcos; Prada, María Encina; Encinas, Julio Manuel Vidal; Nielsen, Rasmus; Netea, Mihai G.; Novembre, John; Sturm, Richard A.; Sabeti, Pardis; Marquès-Bonet, Tomàs; Navarro, Arcadi; Willerslev, Eske; Lalueza-Fox, Carles
2014-01-01
Ancient genomic sequences have started revealing the origin and the demographic impact of Neolithic farmers spreading into Europe1–3. The adoption of farming, stock breeding and sedentary societies during the Neolithic may have resulted in adaptive changes in genes associated with immunity and diet4. However, the limited data available from earlier hunter-gatherers precludes an understanding of the selective processes associated with this crucial transition to agriculture in recent human evolution. By sequencing a ~7,000-year-old Mesolithic skeleton discovered at the La Braña-Arintero site in León (Spain), we retrieved the first complete pre-agricultural European human genome. Analysis of this genome in the context of other ancient samples suggests the existence of a common ancient genomic signature across Western and Central Eurasia from the Upper Paleolithic to the Mesolithic. The La Braña individual carries ancestral alleles in several skin pigmentation genes, suggesting that the light skin of modern Europeans was not yet ubiquitous in Mesolithic times. Moreover, we provide evidence that a significant number of derived, putatively adaptive variants associated with pathogen resistance in modern Europeans were already present in this hunter-gatherer. Hence, these genomic variants cannot represent novel mutations that occurred during the adaptation to the farming lifestyle. PMID:24463515
Saini, Vikram; Raghuvanshi, Saurabh; Khurana, Jitendra P.; Ahmed, Niyaz; Hasnain, Seyed E.; Tyagi, Akhilesh K.; Tyagi, Anil K.
2012-01-01
Understanding the evolutionary and genomic mechanisms responsible for turning the soil-derived saprophytic mycobacteria into lethal intracellular pathogens is a critical step towards the development of strategies for the control of mycobacterial diseases. In this context, Mycobacterium indicus pranii (MIP) is of specific interest because of its unique immunological and evolutionary significance. Evolutionarily, it is the progenitor of opportunistic pathogens belonging to M. avium complex and is endowed with features that place it between saprophytic and pathogenic species. Herein, we have sequenced the complete MIP genome to understand its unique life style, basis of immunomodulation and habitat diversification in mycobacteria. As a case of massive gene acquisitions, 50.5% of MIP open reading frames (ORFs) are laterally acquired. We show, for the first time for Mycobacterium, that MIP genome has mosaic architecture. These gene acquisitions have led to the enrichment of selected gene families critical to MIP physiology. Comparative genomic analysis indicates a higher antigenic potential of MIP imparting it a unique ability for immunomodulation. Besides, it also suggests an important role of genomic fluidity in habitat diversification within mycobacteria and provides a unique view of evolutionary divergence and putative bottlenecks that might have eventually led to intracellular survival and pathogenic attributes in mycobacteria. PMID:22965120
Goldstone, Robert J.; McLuckie, Joyce; Smith, David G. E.
2015-01-01
Typing of Mycobacterium avium subspecies paratuberculosis strains presents a challenge, since they are genetically monomorphic and traditional molecular techniques have limited discriminatory power. The recent advances and availability of whole-genome sequencing have extended possibilities for the characterization of Mycobacterium avium subspecies paratuberculosis, and whole-genome sequencing can provide a phylogenetic context to facilitate global epidemiology studies. In this study, we developed a single nucleotide polymorphism (SNP) assay based on PCR and restriction enzyme digestion or sequencing of the amplified product. The SNP analysis was performed using genome sequence data from 133 Mycobacterium avium subspecies paratuberculosis isolates with different genotypes from 8 different host species and 17 distinct geographic regions around the world. A total of 28,402 SNPs were identified among all of the isolates. The minimum number of SNPs required to distinguish between all of the 133 genomes was 93 and between only the type C isolates was 41. To reduce the number of SNPs and PCRs required, we adopted an approach based on sequential detection of SNPs and a decision tree. By the analysis of 14 SNPs Mycobacterium avium subspecies paratuberculosis isolates can be characterized within 14 phylogenetic groups with a higher discriminatory power than mycobacterial interspersed repetitive unit–variable number tandem repeat assay and other typing methods. Continuous updating of genome sequences is needed in order to better characterize new phylogenetic groups and SNP profiles. The novel SNP assay is a discriminative, simple, reproducible method and requires only basic laboratory equipment for the large-scale global typing of Mycobacterium avium subspecies paratuberculosis isolates. PMID:26677250
2011-01-01
Background The generation and analysis of high-throughput sequencing data are becoming a major component of many studies in molecular biology and medical research. Illumina's Genome Analyzer (GA) and HiSeq instruments are currently the most widely used sequencing devices. Here, we comprehensively evaluate properties of genomic HiSeq and GAIIx data derived from two plant genomes and one virus, with read lengths of 95 to 150 bases. Results We provide quantifications and evidence for GC bias, error rates, error sequence context, effects of quality filtering, and the reliability of quality values. By combining different filtering criteria we reduced error rates 7-fold at the expense of discarding 12.5% of alignable bases. While overall error rates are low in HiSeq data we observed regions of accumulated wrong base calls. Only 3% of all error positions accounted for 24.7% of all substitution errors. Analyzing the forward and reverse strands separately revealed error rates of up to 18.7%. Insertions and deletions occurred at very low rates on average but increased to up to 2% in homopolymers. A positive correlation between read coverage and GC content was found depending on the GC content range. Conclusions The errors and biases we report have implications for the use and the interpretation of Illumina sequencing data. GAIIx and HiSeq data sets show slightly different error profiles. Quality filtering is essential to minimize downstream analysis artifacts. Supporting previous recommendations, the strand-specificity provides a criterion to distinguish sequencing errors from low abundance polymorphisms. PMID:22067484
GoIFISH: a system for the quantification of single cell heterogeneity from IFISH images.
Trinh, Anne; Rye, Inga H; Almendro, Vanessa; Helland, Aslaug; Russnes, Hege G; Markowetz, Florian
2014-08-26
Molecular analysis has revealed extensive intra-tumor heterogeneity in human cancer samples, but cannot identify cell-to-cell variations within the tissue microenvironment. In contrast, in situ analysis can identify genetic aberrations in phenotypically defined cell subpopulations while preserving tissue-context specificity. GoIFISHGoIFISH is a widely applicable, user-friendly system tailored for the objective and semi-automated visualization, detection and quantification of genomic alterations and protein expression obtained from fluorescence in situ analysis. In a sample set of HER2-positive breast cancers GoIFISHGoIFISH is highly robust in visual analysis and its accuracy compares favorably to other leading image analysis methods. GoIFISHGoIFISH is freely available at www.sourceforge.net/projects/goifish/.
Bredenoord, Annelien L; Mostert, Menno; Isasi, Rosario; Knoppers, Bartha M
2015-01-01
Data and sample sharing constitute a scientific and ethical imperative but need to be conducted in a responsible manner in order to protect individual interests as well as maintain public trust. In 2014, the Global Alliance for Genomics and Health (GA4GH) adopted a common Framework for Responsible Sharing of Genomic and Health-Related Data. The GA4GH Framework is applicable to data sharing in the stem cell field, however, interpretation is required so as to provide guidance for this specific context. In this paper, the International Stem Cell Forum Ethics Working Party discusses those principles that are specific to translational stem cell science, including engagement, data quality and safety, privacy, security and confidentiality, risk-benefit analysis and sustainability.
Finding Our Way through Phenotypes
Deans, Andrew R.; Lewis, Suzanna E.; Huala, Eva; Anzaldo, Salvatore S.; Ashburner, Michael; Balhoff, James P.; Blackburn, David C.; Blake, Judith A.; Burleigh, J. Gordon; Chanet, Bruno; Cooper, Laurel D.; Courtot, Mélanie; Csösz, Sándor; Cui, Hong; Dahdul, Wasila; Das, Sandip; Dececchi, T. Alexander; Dettai, Agnes; Diogo, Rui; Druzinsky, Robert E.; Dumontier, Michel; Franz, Nico M.; Friedrich, Frank; Gkoutos, George V.; Haendel, Melissa; Harmon, Luke J.; Hayamizu, Terry F.; He, Yongqun; Hines, Heather M.; Ibrahim, Nizar; Jackson, Laura M.; Jaiswal, Pankaj; James-Zorn, Christina; Köhler, Sebastian; Lecointre, Guillaume; Lapp, Hilmar; Lawrence, Carolyn J.; Le Novère, Nicolas; Lundberg, John G.; Macklin, James; Mast, Austin R.; Midford, Peter E.; Mikó, István; Mungall, Christopher J.; Oellrich, Anika; Osumi-Sutherland, David; Parkinson, Helen; Ramírez, Martín J.; Richter, Stefan; Robinson, Peter N.; Ruttenberg, Alan; Schulz, Katja S.; Segerdell, Erik; Seltmann, Katja C.; Sharkey, Michael J.; Smith, Aaron D.; Smith, Barry; Specht, Chelsea D.; Squires, R. Burke; Thacker, Robert W.; Thessen, Anne; Fernandez-Triana, Jose; Vihinen, Mauno; Vize, Peter D.; Vogt, Lars; Wall, Christine E.; Walls, Ramona L.; Westerfeld, Monte; Wharton, Robert A.; Wirkner, Christian S.; Woolley, James B.; Yoder, Matthew J.; Zorn, Aaron M.; Mabee, Paula
2015-01-01
Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today's data barriers and facilitate analytical reproducibility. PMID:25562316
Finding our way through phenotypes.
Deans, Andrew R; Lewis, Suzanna E; Huala, Eva; Anzaldo, Salvatore S; Ashburner, Michael; Balhoff, James P; Blackburn, David C; Blake, Judith A; Burleigh, J Gordon; Chanet, Bruno; Cooper, Laurel D; Courtot, Mélanie; Csösz, Sándor; Cui, Hong; Dahdul, Wasila; Das, Sandip; Dececchi, T Alexander; Dettai, Agnes; Diogo, Rui; Druzinsky, Robert E; Dumontier, Michel; Franz, Nico M; Friedrich, Frank; Gkoutos, George V; Haendel, Melissa; Harmon, Luke J; Hayamizu, Terry F; He, Yongqun; Hines, Heather M; Ibrahim, Nizar; Jackson, Laura M; Jaiswal, Pankaj; James-Zorn, Christina; Köhler, Sebastian; Lecointre, Guillaume; Lapp, Hilmar; Lawrence, Carolyn J; Le Novère, Nicolas; Lundberg, John G; Macklin, James; Mast, Austin R; Midford, Peter E; Mikó, István; Mungall, Christopher J; Oellrich, Anika; Osumi-Sutherland, David; Parkinson, Helen; Ramírez, Martín J; Richter, Stefan; Robinson, Peter N; Ruttenberg, Alan; Schulz, Katja S; Segerdell, Erik; Seltmann, Katja C; Sharkey, Michael J; Smith, Aaron D; Smith, Barry; Specht, Chelsea D; Squires, R Burke; Thacker, Robert W; Thessen, Anne; Fernandez-Triana, Jose; Vihinen, Mauno; Vize, Peter D; Vogt, Lars; Wall, Christine E; Walls, Ramona L; Westerfeld, Monte; Wharton, Robert A; Wirkner, Christian S; Woolley, James B; Yoder, Matthew J; Zorn, Aaron M; Mabee, Paula
2015-01-01
Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today's data barriers and facilitate analytical reproducibility.
Voting with their mice: personal genome testing and the "participatory turn" in disease research.
Prainsack, Barbara
2011-05-01
While the availability of genome tests on the internet has given rise to heated debates about the likely impact on personal genome information on test-takers, on insurance, and on healthcare systems, in this article I argue that a more tangible effect of personal genomics is that it has started to change how participation in disease research is conceived and enacted. I examine three models of research participation that personal genomics customers are encouraged to engage in. I conclude with an evaluation of the pitfalls and benefits of "crowdsourcing" genetic disease research in the context of personal genomics.
Tacão, Marta; Araújo, Susana; Vendas, Maria; Alves, Artur; Henriques, Isabel
2018-03-01
Chromosome-encoded beta-lactamases of Shewanella spp. have been indicated as probable progenitors of bla OXA-48 -like genes. However, these have been detected in few Shewanella spp. and dissemination mechanisms are unclear. Thus, our main objective was to confirm the role of Shewanella species as progenitors of bla OXA-48 -like genes. In silico analysis of Shewanella genomes was performed to detect bla OXA-48 -like genes and context, and 43 environmental Shewanella spp. were characterised. Clonal relatedness was determined by BOX-PCR. Phylogenetic affiliation was assessed by 16S rDNA and gyrB sequencing. Antibiotic susceptibility phenotypes were determined. The bla OXA-48 -like genes and genetic context were inspected by PCR, hybridisation and sequence analysis. Gene variants were cloned in Escherichia coli and MICs were determined. Shewanella isolates were screened for integrons, plasmids and insertion sequences. Analysis of Shewanella spp. genomes showed that putative bla OXA-48 -like is present in the majority and in an identical context. Isolates presenting unique BOX profiles affiliated with 11 Shewanella spp. bla OXA-48 -like genes were detected in 22 isolates from 6 species. Genes encoded enzymes identical to OXA-48, OXA-204, OXA-181, and 7 new variants differing from OXA-48 from 2 to 82 amino acids. IS1999 was detected in 24 isolates, although not in the vicinity of bla OXA-48 genes. Recombinant E. coli strains presented altered MICs. The presence/absence of bla OXA-48 -like genes was species-related. Gene variants encoded enzymes with hydrolytic spectra similar to OXA-48-like from non-shewanellae. From the mobile elements previously described in association with bla OXA-48 -like genes, only the IS1999 was found in Shewanella, which indicates its relevance in bla OXA-48 -like genes transfer to other hosts. Copyright © 2017 Elsevier B.V. and International Society of Chemotherapy. All rights reserved.
ERIC Educational Resources Information Center
Temple, Louise; Cresawn, Steven G.; Monroe, Jonathan D.
2010-01-01
Emerging interest in genomics in the scientific community prompted biologists at James Madison University to create two courses at different levels to modernize the biology curriculum. The courses are hybrids of classroom and laboratory experiences. An upper level class uses raw sequence of a genome (plasmid or virus) as the subject on which to…
2011-01-01
Background The heterotrophic dinoflagellate Oxyrrhis marina is increasingly studied in experimental, ecological and evolutionary contexts. Its basal phylogenetic position within the dinoflagellates make O. marina useful for understanding the origin of numerous unusual features of the dinoflagellate lineage; its broad distribution has lent O. marina to the study of protist biogeography; and nutritive flexibility and eurytopy have made it a common lab rat for the investigation of physiological responses of marine heterotrophic flagellates. Nevertheless, genome-scale resources for O. marina are scarce. Here we present a 454-based transcriptome survey for this organism. In addition, we assess sequence read abundance, as a proxy for gene expression, in response to salinity, an environmental factor potentially important in determining O. marina spatial distributions. Results Sequencing generated ~57 Mbp of data which assembled into 7, 398 contigs. Approximately 24% of contigs were nominally identified by BLAST. A further clustering of contigs (at ≥ 90% identity) revealed 164 transcript variant clusters, the largest of which (Phosphoribosylaminoimidazole-succinocarboxamide synthase) was composed of 28 variants displaying predominately synonymous variation. In a genomic context, a sample of 5 different genes were demonstrated to occur as tandem repeats, separated by short (~200-340 bp) inter-genic regions. For HSP90 several intergenic variants were detected suggesting a potentially complex genomic arrangement. In response to salinity, analysis of 454 read abundance highlighted 9 and 20 genes over or under expressed at 50 PSU, respectively. However, 454 read abundance and subsequent qPCR validation did not correlate well - suggesting that measures of gene expression via ad hoc analysis of sequence read abundance require careful interpretation. Conclusion Here we indicate that tandem gene arrangements and the occurrence of multiple transcribed gene variants are common and indicate potentially complex genomic arrangements in O. marina. Comparison of the reported data set with existing O. marina and other dinoflagellates ESTs indicates little sequence overlap likely as a result of the relatively limited extent of genome scale sequence data currently available for the dinoflagellates. This is one of the first 454-based transcriptome surveys of an ancestral dinoflagellate taxon and will undoubtedly prove useful for future comparative studies aimed at reconstructing the origin of novel features of the dinoflagellates. PMID:22014029
Structured populations of Sulfolobus acidocaldarius with susceptibility to mobile genetic elements
Anderson, Rika E.; Kouris, Angela; Seward, Christopher H.; Campbell, Kate M.; Whitaker, Rachel J.
2017-01-01
The impact of a structured environment on genome evolution can be determined through comparative population genomics of species that live in the same habitat. Recent work comparing three genome sequences of Sulfolobus acidocaldarius suggested that highly structured, extreme, hot spring environments do not limit dispersal of this thermoacidophile, in contrast to other co-occurring Sulfolobus species. Instead, a high level of conservation among these three S. acidocaldarius genomes was hypothesized to result from rapid, global-scale dispersal promoted by low susceptibility to viruses that sets S. acidocaldarius apart from its sister Sulfolobus species. To test this hypothesis, we conducted a comparative analysis of 47 genomes of S. acidocaldarius from spatial and temporal sampling of two hot springs in Yellowstone National Park. While we confirm the low diversity in the core genome, we observe differentiation among S. acidocaldarius populations, likely resulting from low migration among hot spring “islands” in Yellowstone National Park. Patterns of genomic variation indicate that differing geological contexts result in the elimination or preservation of diversity among differentiated populations. We observe multiple deletions associated with a large genomic island rich in glycosyltransferases, differential integrations of the Sulfolobus turreted icosahedral virus, as well as two different plasmid elements. These data demonstrate that neither rapid dispersal nor lack of mobile genetic elements result in low diversity in the S. acidocaldariusgenomes. We suggest instead that significant differences in the recent evolutionary history, or the intrinsic evolutionary rates, of sister Sulfolobusspecies result in the relatively low diversity of the S. acidocaldarius genome.
Crandall, Eric D.; Liggins, Libby; Bongaerts, Pim; Treml, Eric A.
2016-01-01
Population genomic approaches are making rapid inroads in the study of non-model organisms, including marine taxa. To date, these marine studies have predominantly focused on rudimentary metrics describing the spatial and environmental context of their study region (e.g., geographical distance, average sea surface temperature, average salinity). We contend that a more nuanced and considered approach to quantifying seascape dynamics and patterns can strengthen population genomic investigations and help identify spatial, temporal, and environmental factors associated with differing selective regimes or demographic histories. Nevertheless, approaches for quantifying marine landscapes are complicated. Characteristic features of the marine environment, including pelagic living in flowing water (experienced by most marine taxa at some point in their life cycle), require a well-designed spatial-temporal sampling strategy and analysis. Many genetic summary statistics used to describe populations may be inappropriate for marine species with large population sizes, large species ranges, stochastic recruitment, and asymmetrical gene flow. Finally, statistical approaches for testing associations between seascapes and population genomic patterns are still maturing with no single approach able to capture all relevant considerations. None of these issues are completely unique to marine systems and therefore similar issues and solutions will be shared for many organisms regardless of habitat. Here, we outline goals and spatial approaches for landscape genomics with an emphasis on marine systems and review the growing empirical literature on seascape genomics. We review established tools and approaches and highlight promising new strategies to overcome select issues including a strategy to spatially optimize sampling. Despite the many challenges, we argue that marine systems may be especially well suited for identifying candidate genomic regions under environmentally mediated selection and that seascape genomic approaches are especially useful for identifying robust locus-by-environment associations. PMID:29491947
Riginos, Cynthia; Crandall, Eric D; Liggins, Libby; Bongaerts, Pim; Treml, Eric A
2016-12-01
Population genomic approaches are making rapid inroads in the study of non-model organisms, including marine taxa. To date, these marine studies have predominantly focused on rudimentary metrics describing the spatial and environmental context of their study region (e.g., geographical distance, average sea surface temperature, average salinity). We contend that a more nuanced and considered approach to quantifying seascape dynamics and patterns can strengthen population genomic investigations and help identify spatial, temporal, and environmental factors associated with differing selective regimes or demographic histories. Nevertheless, approaches for quantifying marine landscapes are complicated. Characteristic features of the marine environment, including pelagic living in flowing water (experienced by most marine taxa at some point in their life cycle), require a well-designed spatial-temporal sampling strategy and analysis. Many genetic summary statistics used to describe populations may be inappropriate for marine species with large population sizes, large species ranges, stochastic recruitment, and asymmetrical gene flow. Finally, statistical approaches for testing associations between seascapes and population genomic patterns are still maturing with no single approach able to capture all relevant considerations. None of these issues are completely unique to marine systems and therefore similar issues and solutions will be shared for many organisms regardless of habitat. Here, we outline goals and spatial approaches for landscape genomics with an emphasis on marine systems and review the growing empirical literature on seascape genomics. We review established tools and approaches and highlight promising new strategies to overcome select issues including a strategy to spatially optimize sampling. Despite the many challenges, we argue that marine systems may be especially well suited for identifying candidate genomic regions under environmentally mediated selection and that seascape genomic approaches are especially useful for identifying robust locus-by-environment associations.
Kraus, Christopher; Schiffer, Philipp H; Kagoshima, Hiroshi; Hiraki, Hideaki; Vogt, Theresa; Kroiher, Michael; Kohara, Yuji; Schierenberg, Einhard
2017-01-01
The free-living nematode Diploscapter coronatus is the closest known relative of Caenorhabditis elegans with parthenogenetic reproduction. It shows several developmental idiosyncracies, for example concerning the mode of reproduction, embryonic axis formation and early cleavage pattern (Lahl et al. in Int J Dev Biol 50:393-397, 2006). Our recent genome analysis (Hiraki et al. in BMC Genomics 18:478, 2017) provides a solid foundation to better understand the molecular basis of developmental idiosyncrasies in this species in an evolutionary context by comparison with selected other nematodes. Our genomic data also yielded indications for the view that D. coronatus is a product of interspecies hybridization. In a genomic comparison between D. coronatus , C. elegans , other representatives of the genus Caenorhabditis and the more distantly related Pristionchus pacificus and Panagrellus redivivus , certain genes required for central developmental processes in C. elegans like control of meiosis and establishment of embryonic polarity were found to be restricted to the genus Caenorhabditis . The mRNA content of early D. coronatus embryos was sequenced and compared with similar stages in C. elegans and Ascaris suum . We identified 350 gene families transcribed in the early embryo of D. coronatus but not in the other two nematodes. Looking at individual genes transcribed early in D. coronatus but not in C. elegans and A. suum , we found that orthologs of most of these are present in the genomes of the latter species as well, suggesting heterochronic shifts with respect to expression behavior. Considerable genomic heterozygosity and allelic divergence lend further support to the view that D. coronatus may be the result of an interspecies hybridization. Expression analysis of early acting single-copy genes yields no indication for silencing of one parental genome. Our comparative cellular and molecular studies support the view that the genus Caenorhabditis differs considerably from the other studied nematodes in its control of development and reproduction. The easy-to-culture parthenogenetic D. coronatus , with its high-quality draft genome and only a single chromosome when haploid, offers many new starting points on the cellular, molecular and genomic level to explore alternative routes of nematode development and reproduction.
Zhang, Liangzhi; Jia, Shangang; Plath, Martin; Huang, Yongzhen; Li, Congjun; Lei, Chuzhao; Zhao, Xin; Chen, Hong
2015-08-10
Copy number variation (CNV) is an important component of genomic structural variation and plays a role not only in evolutionary diversification but also in domestication. Chinese cattle were derived from Bos taurus and Bos indicus, and several breeds presumably are of hybrid origin, but the evolution of CNV regions (CNVRs) has not yet been examined in this context. Here, we of CNVRs, mtDNA D-loop sequence variation, and Y-chromosomal single nucleotide polymorphisms to assess the impact of maternal and paternal B. taurus and B. indicus origins on the distribution of CNVRs in 24 Chinese domesticated bulls. We discovered 470 genome-wide CNVRs, only 72 of which were shared by all three Y-lineages (B. taurus: Y1, Y2; B. indicus: Y3), whereas 265 were shared by inferred taurine or indicine paternal lineages, and 228 when considering their maternal taurine or indicine origins. Phylogenetic analysis uncovered eight taurine/indicine hybrids, and principal component analysis on CNVs corroborated genomic exchange during hybridization. The distribution patterns of CNVRs tended to be lineage-specific, and correlation analysis revealed significant positive or negative co-occurrences of CNVRs across lineages. Our study suggests that CNVs in Chinese cattle partly result from selective breeding during domestication, but also from hybridization and introgression. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Cohen, Clemens D; Klingenhoff, Andreas; Boucherot, Anissa; Nitsche, Almut; Henger, Anna; Brunner, Bodo; Schmid, Holger; Merkle, Monika; Saleem, Moin A; Koller, Klaus-Peter; Werner, Thomas; Gröne, Hermann-Josef; Nelson, Peter J; Kretzler, Matthias
2006-04-11
Shared transcription factor binding sites that are conserved in distance and orientation help control the expression of gene products that act together in the same biological context. New bioinformatics approaches allow the rapid characterization of shared promoter structures and can be used to find novel interacting molecules. Here, these principles are demonstrated by using molecules linked to the unique functional unit of the glomerular slit diaphragm. An evolutionarily conserved promoter model was generated by comparative genomics in the proximal promoter regions of the slit diaphragm-associated molecule nephrin. Phylogenetic promoter fingerprints of known elements of the slit diaphragm complex identified the nephrin model in the promoter region of zonula occludens-1 (ZO-1). Genome-wide scans using this promoter model effectively predicted a previously unrecognized slit diaphragm molecule, cadherin-5. Nephrin, ZO-1, and cadherin-5 mRNA showed stringent coexpression across a diverse set of human glomerular diseases. Comparative promoter analysis can identify regulatory pathways at work in tissue homeostasis and disease processes.
Waks, Zeev; Weissbrod, Omer; Carmeli, Boaz; Norel, Raquel; Utro, Filippo; Goldschmidt, Yaara
2016-12-23
Compiling a comprehensive list of cancer driver genes is imperative for oncology diagnostics and drug development. While driver genes are typically discovered by analysis of tumor genomes, infrequently mutated driver genes often evade detection due to limited sample sizes. Here, we address sample size limitations by integrating tumor genomics data with a wide spectrum of gene-specific properties to search for rare drivers, functionally classify them, and detect features characteristic of driver genes. We show that our approach, CAnceR geNe similarity-based Annotator and Finder (CARNAF), enables detection of potentially novel drivers that eluded over a dozen pan-cancer/multi-tumor type studies. In particular, feature analysis reveals a highly concentrated pool of known and putative tumor suppressors among the <1% of genes that encode very large, chromatin-regulating proteins. Thus, our study highlights the need for deeper characterization of very large, epigenetic regulators in the context of cancer causality.
Zhou, Jizhong; He, Zhili; Yang, Yunfeng; Deng, Ye; Tringe, Susannah G; Alvarez-Cohen, Lisa
2015-01-27
Understanding the structure, functions, activities and dynamics of microbial communities in natural environments is one of the grand challenges of 21st century science. To address this challenge, over the past decade, numerous technologies have been developed for interrogating microbial communities, of which some are amenable to exploratory work (e.g., high-throughput sequencing and phenotypic screening) and others depend on reference genes or genomes (e.g., phylogenetic and functional gene arrays). Here, we provide a critical review and synthesis of the most commonly applied "open-format" and "closed-format" detection technologies. We discuss their characteristics, advantages, and disadvantages within the context of environmental applications and focus on analysis of complex microbial systems, such as those in soils, in which diversity is high and reference genomes are few. In addition, we discuss crucial issues and considerations associated with applying complementary high-throughput molecular technologies to address important ecological questions. Copyright © 2015 Zhou et al.
He, Zhili; Yang, Yunfeng; Deng, Ye; Tringe, Susannah G.; Alvarez-Cohen, Lisa
2015-01-01
ABSTRACT Understanding the structure, functions, activities and dynamics of microbial communities in natural environments is one of the grand challenges of 21st century science. To address this challenge, over the past decade, numerous technologies have been developed for interrogating microbial communities, of which some are amenable to exploratory work (e.g., high-throughput sequencing and phenotypic screening) and others depend on reference genes or genomes (e.g., phylogenetic and functional gene arrays). Here, we provide a critical review and synthesis of the most commonly applied “open-format” and “closed-format” detection technologies. We discuss their characteristics, advantages, and disadvantages within the context of environmental applications and focus on analysis of complex microbial systems, such as those in soils, in which diversity is high and reference genomes are few. In addition, we discuss crucial issues and considerations associated with applying complementary high-throughput molecular technologies to address important ecological questions. PMID:25626903
Circulating Tumor Cell and Cell-free Circulating Tumor DNA in Lung Cancer.
Nurwidya, Fariz; Zaini, Jamal; Putra, Andika Chandra; Andarini, Sita; Hudoyo, Achmad; Syahruddin, Elisna; Yunus, Faisal
2016-09-01
Circulating tumor cells (CTCs) are tumor cells that are separated from the primary site or metastatic lesion and disseminate in blood circulation. CTCs are considered to be part of the long process of cancer metastasis. As a 'liquid biopsy', CTC molecular examination and investigation of single cancer cells create an important opportunity for providing an understanding of cancer biology and the process of metastasis. In the last decade, we have seen dramatic development in defining the role of CTCs in lung cancer in terms of diagnosis, genomic alteration determination, treatment response and, finally, prognosis prediction. The aims of this review are to understand the basic biology and to review methods of detection of CTCs that apply to the various types of solid tumor. Furthermore, we explored clinical applications, including treatment monitoring to anticipate therapy resistance as well as biomarker analysis, in the context of lung cancer. We also explored the potential use of cell-free circulating tumor DNA (ctDNA) in the genomic alteration analysis of lung cancer.
Zhou, Jizhong; He, Zhili; Yang, Yunfeng; ...
2015-01-27
Understanding the structure, functions, activities and dynamics of microbial communities in natural environments is one of the grand challenges of 21st century science. To address this challenge, over the past decade, numerous technologies have been developed for interrogating microbial communities, of which some are amenable to exploratory work (e.g., high-throughput sequencing and phenotypic screening) and others depend on reference genes or genomes (e.g., phylogenetic and functional gene arrays). Here, we provide a critical review and synthesis of the most commonly applied “open-format” and “closed-format” detection technologies. We discuss their characteristics, advantages, and disadvantages within the context of environmental applications andmore » focus on analysis of complex microbial systems, such as those in soils, in which diversity is high and reference genomes are few. In addition, we discuss crucial issues and considerations associated with applying complementary high-throughput molecular technologies to address important ecological questions.« less
Chipman, Ariel D; Ferrier, David E K; Brena, Carlo; Qu, Jiaxin; Hughes, Daniel S T; Schröder, Reinhard; Torres-Oliva, Montserrat; Znassi, Nadia; Jiang, Huaiyang; Almeida, Francisca C; Alonso, Claudio R; Apostolou, Zivkos; Aqrawi, Peshtewani; Arthur, Wallace; Barna, Jennifer C J; Blankenburg, Kerstin P; Brites, Daniela; Capella-Gutiérrez, Salvador; Coyle, Marcus; Dearden, Peter K; Du Pasquier, Louis; Duncan, Elizabeth J; Ebert, Dieter; Eibner, Cornelius; Erikson, Galina; Evans, Peter D; Extavour, Cassandra G; Francisco, Liezl; Gabaldón, Toni; Gillis, William J; Goodwin-Horn, Elizabeth A; Green, Jack E; Griffiths-Jones, Sam; Grimmelikhuijzen, Cornelis J P; Gubbala, Sai; Guigó, Roderic; Han, Yi; Hauser, Frank; Havlak, Paul; Hayden, Luke; Helbing, Sophie; Holder, Michael; Hui, Jerome H L; Hunn, Julia P; Hunnekuhl, Vera S; Jackson, LaRonda; Javaid, Mehwish; Jhangiani, Shalini N; Jiggins, Francis M; Jones, Tamsin E; Kaiser, Tobias S; Kalra, Divya; Kenny, Nathan J; Korchina, Viktoriya; Kovar, Christie L; Kraus, F Bernhard; Lapraz, François; Lee, Sandra L; Lv, Jie; Mandapat, Christigale; Manning, Gerard; Mariotti, Marco; Mata, Robert; Mathew, Tittu; Neumann, Tobias; Newsham, Irene; Ngo, Dinh N; Ninova, Maria; Okwuonu, Geoffrey; Ongeri, Fiona; Palmer, William J; Patil, Shobha; Patraquim, Pedro; Pham, Christopher; Pu, Ling-Ling; Putman, Nicholas H; Rabouille, Catherine; Ramos, Olivia Mendivil; Rhodes, Adelaide C; Robertson, Helen E; Robertson, Hugh M; Ronshaugen, Matthew; Rozas, Julio; Saada, Nehad; Sánchez-Gracia, Alejandro; Scherer, Steven E; Schurko, Andrew M; Siggens, Kenneth W; Simmons, DeNard; Stief, Anna; Stolle, Eckart; Telford, Maximilian J; Tessmar-Raible, Kristin; Thornton, Rebecca; van der Zee, Maurijn; von Haeseler, Arndt; Williams, James M; Willis, Judith H; Wu, Yuanqing; Zou, Xiaoyan; Lawson, Daniel; Muzny, Donna M; Worley, Kim C; Gibbs, Richard A; Akam, Michael; Richards, Stephen
2014-11-01
Myriapods (e.g., centipedes and millipedes) display a simple homonomous body plan relative to other arthropods. All members of the class are terrestrial, but they attained terrestriality independently of insects. Myriapoda is the only arthropod class not represented by a sequenced genome. We present an analysis of the genome of the centipede Strigamia maritima. It retains a compact genome that has undergone less gene loss and shuffling than previously sequenced arthropods, and many orthologues of genes conserved from the bilaterian ancestor that have been lost in insects. Our analysis locates many genes in conserved macro-synteny contexts, and many small-scale examples of gene clustering. We describe several examples where S. maritima shows different solutions from insects to similar problems. The insect olfactory receptor gene family is absent from S. maritima, and olfaction in air is likely effected by expansion of other receptor gene families. For some genes S. maritima has evolved paralogues to generate coding sequence diversity, where insects use alternate splicing. This is most striking for the Dscam gene, which in Drosophila generates more than 100,000 alternate splice forms, but in S. maritima is encoded by over 100 paralogues. We see an intriguing linkage between the absence of any known photosensory proteins in a blind organism and the additional absence of canonical circadian clock genes. The phylogenetic position of myriapods allows us to identify where in arthropod phylogeny several particular molecular mechanisms and traits emerged. For example, we conclude that juvenile hormone signalling evolved with the emergence of the exoskeleton in the arthropods and that RR-1 containing cuticle proteins evolved in the lineage leading to Mandibulata. We also identify when various gene expansions and losses occurred. The genome of S. maritima offers us a unique glimpse into the ancestral arthropod genome, while also displaying many adaptations to its specific life history.
Chipman, Ariel D.; Ferrier, David E. K.; Brena, Carlo; Qu, Jiaxin; Hughes, Daniel S. T.; Schröder, Reinhard; Torres-Oliva, Montserrat; Znassi, Nadia; Jiang, Huaiyang; Almeida, Francisca C.; Alonso, Claudio R.; Apostolou, Zivkos; Aqrawi, Peshtewani; Arthur, Wallace; Barna, Jennifer C. J.; Blankenburg, Kerstin P.; Brites, Daniela; Capella-Gutiérrez, Salvador; Coyle, Marcus; Dearden, Peter K.; Du Pasquier, Louis; Duncan, Elizabeth J.; Ebert, Dieter; Eibner, Cornelius; Erikson, Galina; Evans, Peter D.; Extavour, Cassandra G.; Francisco, Liezl; Gabaldón, Toni; Gillis, William J.; Goodwin-Horn, Elizabeth A.; Green, Jack E.; Griffiths-Jones, Sam; Grimmelikhuijzen, Cornelis J. P.; Gubbala, Sai; Guigó, Roderic; Han, Yi; Hauser, Frank; Havlak, Paul; Hayden, Luke; Helbing, Sophie; Holder, Michael; Hui, Jerome H. L.; Hunn, Julia P.; Hunnekuhl, Vera S.; Jackson, LaRonda; Javaid, Mehwish; Jhangiani, Shalini N.; Jiggins, Francis M.; Jones, Tamsin E.; Kaiser, Tobias S.; Kalra, Divya; Kenny, Nathan J.; Korchina, Viktoriya; Kovar, Christie L.; Kraus, F. Bernhard; Lapraz, François; Lee, Sandra L.; Lv, Jie; Mandapat, Christigale; Manning, Gerard; Mariotti, Marco; Mata, Robert; Mathew, Tittu; Neumann, Tobias; Newsham, Irene; Ngo, Dinh N.; Ninova, Maria; Okwuonu, Geoffrey; Ongeri, Fiona; Palmer, William J.; Patil, Shobha; Patraquim, Pedro; Pham, Christopher; Pu, Ling-Ling; Putman, Nicholas H.; Rabouille, Catherine; Ramos, Olivia Mendivil; Rhodes, Adelaide C.; Robertson, Helen E.; Robertson, Hugh M.; Ronshaugen, Matthew; Rozas, Julio; Saada, Nehad; Sánchez-Gracia, Alejandro; Scherer, Steven E.; Schurko, Andrew M.; Siggens, Kenneth W.; Simmons, DeNard; Stief, Anna; Stolle, Eckart; Telford, Maximilian J.; Tessmar-Raible, Kristin; Thornton, Rebecca; van der Zee, Maurijn; von Haeseler, Arndt; Williams, James M.; Willis, Judith H.; Wu, Yuanqing; Zou, Xiaoyan; Lawson, Daniel; Muzny, Donna M.; Worley, Kim C.; Gibbs, Richard A.; Akam, Michael; Richards, Stephen
2014-01-01
Myriapods (e.g., centipedes and millipedes) display a simple homonomous body plan relative to other arthropods. All members of the class are terrestrial, but they attained terrestriality independently of insects. Myriapoda is the only arthropod class not represented by a sequenced genome. We present an analysis of the genome of the centipede Strigamia maritima. It retains a compact genome that has undergone less gene loss and shuffling than previously sequenced arthropods, and many orthologues of genes conserved from the bilaterian ancestor that have been lost in insects. Our analysis locates many genes in conserved macro-synteny contexts, and many small-scale examples of gene clustering. We describe several examples where S. maritima shows different solutions from insects to similar problems. The insect olfactory receptor gene family is absent from S. maritima, and olfaction in air is likely effected by expansion of other receptor gene families. For some genes S. maritima has evolved paralogues to generate coding sequence diversity, where insects use alternate splicing. This is most striking for the Dscam gene, which in Drosophila generates more than 100,000 alternate splice forms, but in S. maritima is encoded by over 100 paralogues. We see an intriguing linkage between the absence of any known photosensory proteins in a blind organism and the additional absence of canonical circadian clock genes. The phylogenetic position of myriapods allows us to identify where in arthropod phylogeny several particular molecular mechanisms and traits emerged. For example, we conclude that juvenile hormone signalling evolved with the emergence of the exoskeleton in the arthropods and that RR-1 containing cuticle proteins evolved in the lineage leading to Mandibulata. We also identify when various gene expansions and losses occurred. The genome of S. maritima offers us a unique glimpse into the ancestral arthropod genome, while also displaying many adaptations to its specific life history. PMID:25423365
Evaluation of microRNA alignment techniques
Kaspi, Antony; El-Osta, Assam
2016-01-01
Genomic alignment of small RNA (smRNA) sequences such as microRNAs poses considerable challenges due to their short length (∼21 nucleotides [nt]) as well as the large size and complexity of plant and animal genomes. While several tools have been developed for high-throughput mapping of longer mRNA-seq reads (>30 nt), there are few that are specifically designed for mapping of smRNA reads including microRNAs. The accuracy of these mappers has not been systematically determined in the case of smRNA-seq. In addition, it is unknown whether these aligners accurately map smRNA reads containing sequence errors and polymorphisms. By using simulated read sets, we determine the alignment sensitivity and accuracy of 16 short-read mappers and quantify their robustness to mismatches, indels, and nontemplated nucleotide additions. These were explored in the context of a plant genome (Oryza sativa, ∼500 Mbp) and a mammalian genome (Homo sapiens, ∼3.1 Gbp). Analysis of simulated and real smRNA-seq data demonstrates that mapper selection impacts differential expression results and interpretation. These results will inform on best practice for smRNA mapping and enable more accurate smRNA detection and quantification of expression and RNA editing. PMID:27284164
Cohn, Elizabeth Gross; Husamudeen, Maryam; Larson, Elaine L.; Williams, Janet K.
2016-01-01
Achieving equitable minority representation in genomic biobanking is one of the most difficult challenges faced by researchers today. Capacity building—a framework for research that includes collaborations and on-going engagement—can be used to help researchers, clinicians and communities better understand the process, utility, and clinical application of genomic science. The purpose of this exploratory descriptive study was to examine factors that influence the decision to participate in genomic research, and identify essential components of capacity building with a community at risk of being under-represented in biobanks. Results of focus groups conducted in Central Harlem with 46 participants were analyzed by a collaborative team of community and academic investigators using content analysis and AtlisTi. Key themes identified were: (1) the potential contribution of biobanking to individual and community health, for example the effect of the environment on health, (2) the societal context of the science, such as DNA criminal databases and paternity testing, that may affect the decision to participate, and (3) the researchers’ commitment to community health as an outcome of capacity building. These key factors can contribute to achieving equity in biobank participation, and guide genetic specialists in biobank planning and implementation. PMID:25228357
The human genome: Some assembly required. Final report
DOE Office of Scientific and Technical Information (OSTI.GOV)
NONE
1994-12-31
The Human Genome Project promises to be one of the most rewarding endeavors in modern biology. The cost and the ethical and social implications, however, have made this project the source of considerable debate both in the scientific community and in the public at large. The 1994 Graduate Student Symposium addresses the scientific merits of the project, the technical issues involved in accomplishing the task, as well as the medical and social issues which stem from the wealth of knowledge which the Human Genome Project will help create. To this end, speakers were brought together who represent the diverse areasmore » of expertise characteristic of this multidisciplinary project. The keynote speaker addresses the project`s motivations and goals in the larger context of biological and medical sciences. The first two sessions address relevant technical issues, data collection with a focus on high-throughput sequencing methods and data analysis with an emphasis on identification of coding sequences. The third session explores recent advances in the understanding of genetic diseases and possible routes to treatment. Finally, the last session addresses some of the ethical, social and legal issues which will undoubtedly arise from having a detailed knowledge of the human genome.« less
Deng, Yangqing; Pan, Wei
2018-06-01
Due to issues of practicality and confidentiality of genomic data sharing on a large scale, typically only meta- or mega-analyzed genome-wide association study (GWAS) summary data, not individual-level data, are publicly available. Reanalyses of such GWAS summary data for a wide range of applications have become more and more common and useful, which often require the use of an external reference panel with individual-level genotypic data to infer linkage disequilibrium (LD) among genetic variants. However, with a small sample size in only hundreds, as for the most popular 1000 Genomes Project European sample, estimation errors for LD are not negligible, leading to often dramatically increased numbers of false positives in subsequent analyses of GWAS summary data. To alleviate the problem in the context of association testing for a group of SNPs, we propose an alternative estimator of the covariance matrix with an idea similar to multiple imputation. We use numerical examples based on both simulated and real data to demonstrate the severe problem with the use of the 1000 Genomes Project reference panels, and the improved performance of our new approach. Copyright © 2018 by the Genetics Society of America.
Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European.
Olalde, Iñigo; Allentoft, Morten E; Sánchez-Quinto, Federico; Santpere, Gabriel; Chiang, Charleston W K; DeGiorgio, Michael; Prado-Martinez, Javier; Rodríguez, Juan Antonio; Rasmussen, Simon; Quilez, Javier; Ramírez, Oscar; Marigorta, Urko M; Fernández-Callejo, Marcos; Prada, María Encina; Encinas, Julio Manuel Vidal; Nielsen, Rasmus; Netea, Mihai G; Novembre, John; Sturm, Richard A; Sabeti, Pardis; Marquès-Bonet, Tomàs; Navarro, Arcadi; Willerslev, Eske; Lalueza-Fox, Carles
2014-03-13
Ancient genomic sequences have started to reveal the origin and the demographic impact of farmers from the Neolithic period spreading into Europe. The adoption of farming, stock breeding and sedentary societies during the Neolithic may have resulted in adaptive changes in genes associated with immunity and diet. However, the limited data available from earlier hunter-gatherers preclude an understanding of the selective processes associated with this crucial transition to agriculture in recent human evolution. Here we sequence an approximately 7,000-year-old Mesolithic skeleton discovered at the La Braña-Arintero site in León, Spain, to retrieve a complete pre-agricultural European human genome. Analysis of this genome in the context of other ancient samples suggests the existence of a common ancient genomic signature across western and central Eurasia from the Upper Paleolithic to the Mesolithic. The La Braña individual carries ancestral alleles in several skin pigmentation genes, suggesting that the light skin of modern Europeans was not yet ubiquitous in Mesolithic times. Moreover, we provide evidence that a significant number of derived, putatively adaptive variants associated with pathogen resistance in modern Europeans were already present in this hunter-gatherer.
Temple, Louise; Cresawn, Steven G; Monroe, Jonathan D
2010-01-01
Emerging interest in genomics in the scientific community prompted biologists at James Madison University to create two courses at different levels to modernize the biology curriculum. The courses are hybrids of classroom and laboratory experiences. An upper level class uses raw sequence of a genome (plasmid or virus) as the subject on which to base the experience of genomic analysis. Students also learn bioinformatics and software programs needed to support a project linking structure and function in proteins and showing evolutionary relatedness of similar genes. An optional entry-level course taken in addition to the required first-year curriculum and sponsored in part by the Howard Hughes Medical Institute, engages first year students in a primary research project. In the first semester, they isolate and characterize novel bacteriophages that infect soil bacteria. In the second semester, these young scientists annotate the genes on one or more of the unique viruses they discovered. These courses are demanding but exciting for both faculty and students and should be accessible to any interested faculty member. Copyright © 2010 International Union of Biochemistry and Molecular Biology, Inc.
Ingouff, Mathieu; Selles, Benjamin; Michaud, Caroline; Vu, Thiet M; Berger, Frédéric; Schorn, Andrea J; Autran, Daphné; Van Durme, Matthias; Nowack, Moritz K; Martienssen, Robert A; Grimanelli, Daniel
2017-01-01
Cytosine methylation is a key epigenetic mark in many organisms, important for both transcriptional control and genome integrity. While relatively stable during somatic growth, DNA methylation is reprogrammed genome-wide during mammalian reproduction. Reprogramming is essential for zygotic totipotency and to prevent transgenerational inheritance of epimutations. However, the extent of DNA methylation reprogramming in plants remains unclear. Here, we developed sensors reporting with single-cell resolution CG and non-CG methylation in Arabidopsis. Live imaging during reproduction revealed distinct and sex-specific dynamics for both contexts. We found that CHH methylation in the egg cell depends on DOMAINS REARRANGED METHYLASE 2 (DRM2) and RNA polymerase V (Pol V), two main actors of RNA-directed DNA methylation, but does not depend on Pol IV. Our sensors provide insight into global DNA methylation dynamics at the single-cell level with high temporal resolution and offer a powerful tool to track CG and non-CG methylation both during development and in response to environmental cues in all organisms with methylated DNA, as we illustrate in mouse embryonic stem cells. © 2017 Ingouff et al.; Published by Cold Spring Harbor Laboratory Press.
The role of DNA methylation in directing the functional organization of the cancer epigenome.
Lay, Fides D; Liu, Yaping; Kelly, Theresa K; Witt, Heather; Farnham, Peggy J; Jones, Peter A; Berman, Benjamin P
2015-04-01
The holistic role of DNA methylation in the organization of the cancer epigenome is not well understood. Here we perform a comprehensive, high-resolution analysis of chromatin structure to compare the landscapes of HCT116 colon cancer cells and a DNA methylation-deficient derivative. The NOMe-seq accessibility assay unexpectedly revealed symmetrical and transcription-independent nucleosomal phasing across active, poised, and inactive genomic elements. DNA methylation abolished this phasing primarily at enhancers and CpG island (CGI) promoters, with little effect on insulators and non-CGI promoters. Abolishment of DNA methylation led to the context-specific reestablishment of the poised and active states of normal colon cells, which were marked in methylation-deficient cells by distinct H3K27 modifications and the presence of either well-phased nucleosomes or nucleosome-depleted regions, respectively. At higher-order genomic scales, we found that long, H3K9me3-marked domains had lower accessibility, consistent with a more compact chromatin structure. Taken together, our results demonstrate the nuanced and context-dependent role of DNA methylation in the functional, multiscale organization of cancer epigenomes. © 2015 Lay et al.; Published by Cold Spring Harbor Laboratory Press.
Ingouff, Mathieu; Selles, Benjamin; Michaud, Caroline; Vu, Thiet M.; Berger, Frédéric; Schorn, Andrea J.; Autran, Daphné; Van Durme, Matthias; Nowack, Moritz K.; Martienssen, Robert A.; Grimanelli, Daniel
2017-01-01
Cytosine methylation is a key epigenetic mark in many organisms, important for both transcriptional control and genome integrity. While relatively stable during somatic growth, DNA methylation is reprogrammed genome-wide during mammalian reproduction. Reprogramming is essential for zygotic totipotency and to prevent transgenerational inheritance of epimutations. However, the extent of DNA methylation reprogramming in plants remains unclear. Here, we developed sensors reporting with single-cell resolution CG and non-CG methylation in Arabidopsis. Live imaging during reproduction revealed distinct and sex-specific dynamics for both contexts. We found that CHH methylation in the egg cell depends on DOMAINS REARRANGED METHYLASE 2 (DRM2) and RNA polymerase V (Pol V), two main actors of RNA-directed DNA methylation, but does not depend on Pol IV. Our sensors provide insight into global DNA methylation dynamics at the single-cell level with high temporal resolution and offer a powerful tool to track CG and non-CG methylation both during development and in response to environmental cues in all organisms with methylated DNA, as we illustrate in mouse embryonic stem cells. PMID:28115468
2010-01-01
One of the important challenges to post-genomic biology is relating observed phenotypic alterations to the underlying collective alterations in genes. Current inferential methods, however, invariably omit large bodies of information on the relationships between genes. We present a method that takes account of such information - expressed in terms of the topology of a correlation network - and we apply the method in the context of current procedures for gene set enrichment analysis. PMID:20187943
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schadt, Christopher
2013-03-01
Christopher Schadt of Oak Ridge National Laboratory on Plant-Microbe Interactions in the context of poplar trees at the 8th Annual Genomics of Energy Environment Meeting on March 27, 2013 held in Walnut Creek, CA.
Genomic-Enabled Prediction of Ordinal Data with Bayesian Logistic Ordinal Regression.
Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Burgueño, Juan; Eskridge, Kent
2015-08-18
Most genomic-enabled prediction models developed so far assume that the response variable is continuous and normally distributed. The exception is the probit model, developed for ordered categorical phenotypes. In statistical applications, because of the easy implementation of the Bayesian probit ordinal regression (BPOR) model, Bayesian logistic ordinal regression (BLOR) is implemented rarely in the context of genomic-enabled prediction [sample size (n) is much smaller than the number of parameters (p)]. For this reason, in this paper we propose a BLOR model using the Pólya-Gamma data augmentation approach that produces a Gibbs sampler with similar full conditional distributions of the BPOR model and with the advantage that the BPOR model is a particular case of the BLOR model. We evaluated the proposed model by using simulation and two real data sets. Results indicate that our BLOR model is a good alternative for analyzing ordinal data in the context of genomic-enabled prediction with the probit or logit link. Copyright © 2015 Montesinos-López et al.
Joost, Stéphane; Kalbermatten, Michael; Bezault, Etienne; Seehausen, Ole
2012-01-01
When searching for loci possibly under selection in the genome, an alternative to population genetics theoretical models is to establish allele distribution models (ADM) for each locus to directly correlate allelic frequencies and environmental variables such as precipitation, temperature, or sun radiation. Such an approach implementing multiple logistic regression models in parallel was implemented within a computing program named MATSAM: . Recently, this application was improved in order to support qualitative environmental predictors as well as to permit the identification of associations between genomic variation and individual phenotypes, allowing the detection of loci involved in the genetic architecture of polymorphic characters. Here, we present the corresponding methodological developments and compare the results produced by software implementing population genetics theoretical models (DFDIST: and BAYESCAN: ) and ADM (MATSAM: ) in an empirical context to detect signatures of genomic divergence associated with speciation in Lake Victoria cichlid fishes.
NIBBS-search for fast and accurate prediction of phenotype-biased metabolic systems.
Schmidt, Matthew C; Rocha, Andrea M; Padmanabhan, Kanchana; Shpanskaya, Yekaterina; Banfield, Jill; Scott, Kathleen; Mihelcic, James R; Samatova, Nagiza F
2012-01-01
Understanding of genotype-phenotype associations is important not only for furthering our knowledge on internal cellular processes, but also essential for providing the foundation necessary for genetic engineering of microorganisms for industrial use (e.g., production of bioenergy or biofuels). However, genotype-phenotype associations alone do not provide enough information to alter an organism's genome to either suppress or exhibit a phenotype. It is important to look at the phenotype-related genes in the context of the genome-scale network to understand how the genes interact with other genes in the organism. Identification of metabolic subsystems involved in the expression of the phenotype is one way of placing the phenotype-related genes in the context of the entire network. A metabolic system refers to a metabolic network subgraph; nodes are compounds and edges labels are the enzymes that catalyze the reaction. The metabolic subsystem could be part of a single metabolic pathway or span parts of multiple pathways. Arguably, comparative genome-scale metabolic network analysis is a promising strategy to identify these phenotype-related metabolic subsystems. Network Instance-Based Biased Subgraph Search (NIBBS) is a graph-theoretic method for genome-scale metabolic network comparative analysis that can identify metabolic systems that are statistically biased toward phenotype-expressing organismal networks. We set up experiments with target phenotypes like hydrogen production, TCA expression, and acid-tolerance. We show via extensive literature search that some of the resulting metabolic subsystems are indeed phenotype-related and formulate hypotheses for other systems in terms of their role in phenotype expression. NIBBS is also orders of magnitude faster than MULE, one of the most efficient maximal frequent subgraph mining algorithms that could be adjusted for this problem. Also, the set of phenotype-biased metabolic systems output by NIBBS comes very close to the set of phenotype-biased subgraphs output by an exact maximally-biased subgraph enumeration algorithm ( MBS-Enum ). The code (NIBBS and the module to visualize the identified subsystems) is available at http://freescience.org/cs/NIBBS.
NIBBS-Search for Fast and Accurate Prediction of Phenotype-Biased Metabolic Systems
Padmanabhan, Kanchana; Shpanskaya, Yekaterina; Banfield, Jill; Scott, Kathleen; Mihelcic, James R.; Samatova, Nagiza F.
2012-01-01
Understanding of genotype-phenotype associations is important not only for furthering our knowledge on internal cellular processes, but also essential for providing the foundation necessary for genetic engineering of microorganisms for industrial use (e.g., production of bioenergy or biofuels). However, genotype-phenotype associations alone do not provide enough information to alter an organism's genome to either suppress or exhibit a phenotype. It is important to look at the phenotype-related genes in the context of the genome-scale network to understand how the genes interact with other genes in the organism. Identification of metabolic subsystems involved in the expression of the phenotype is one way of placing the phenotype-related genes in the context of the entire network. A metabolic system refers to a metabolic network subgraph; nodes are compounds and edges labels are the enzymes that catalyze the reaction. The metabolic subsystem could be part of a single metabolic pathway or span parts of multiple pathways. Arguably, comparative genome-scale metabolic network analysis is a promising strategy to identify these phenotype-related metabolic subsystems. Network Instance-Based Biased Subgraph Search (NIBBS) is a graph-theoretic method for genome-scale metabolic network comparative analysis that can identify metabolic systems that are statistically biased toward phenotype-expressing organismal networks. We set up experiments with target phenotypes like hydrogen production, TCA expression, and acid-tolerance. We show via extensive literature search that some of the resulting metabolic subsystems are indeed phenotype-related and formulate hypotheses for other systems in terms of their role in phenotype expression. NIBBS is also orders of magnitude faster than MULE, one of the most efficient maximal frequent subgraph mining algorithms that could be adjusted for this problem. Also, the set of phenotype-biased metabolic systems output by NIBBS comes very close to the set of phenotype-biased subgraphs output by an exact maximally-biased subgraph enumeration algorithm ( MBS-Enum ). The code (NIBBS and the module to visualize the identified subsystems) is available at http://freescience.org/cs/NIBBS. PMID:22589706
Gerlt, John A
2017-08-22
The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of "genomic enzymology" web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence-function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems.
2017-01-01
The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of “genomic enzymology” web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence–function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems. PMID:28826221
Kang, Yu; Gu, Chaohao; Yuan, Lina; Wang, Yue; Zhu, Yanmin; Li, Xinna; Luo, Qibin; Xiao, Jingfa; Jiang, Daquan; Qian, Minping; Ahmed Khan, Aftab; Chen, Fei; Zhang, Zhang; Yu, Jun
2014-11-25
The prokaryotic pangenome partitions genes into core and dispensable genes. The order of core genes, albeit assumed to be stable under selection in general, is frequently interrupted by horizontal gene transfer and rearrangement, but how a core-gene-defined genome maintains its stability or flexibility remains to be investigated. Based on data from 30 species, including 425 genomes from six phyla, we grouped core genes into syntenic blocks in the context of a pangenome according to their stability across multiple isolates. A subset of the core genes, often species specific and lineage associated, formed a core-gene-defined genome organizational framework (cGOF). Such cGOFs are either single segmental (one-third of the species analyzed) or multisegmental (the rest). Multisegment cGOFs were further classified into symmetric or asymmetric according to segment orientations toward the origin-terminus axis. The cGOFs in Gram-positive species are exclusively symmetric and often reversible in orientation, as opposed to those of the Gram-negative bacteria, which are all asymmetric and irreversible. Meanwhile, all species showing strong strand-biased gene distribution contain symmetric cGOFs and often specific DnaE (α subunit of DNA polymerase III) isoforms. Furthermore, functional evaluations revealed that cGOF genes are hub associated with regard to cellular activities, and the stability of cGOF provides efficient indexes for scaffold orientation as demonstrated by assembling virtual and empirical genome drafts. cGOFs show species specificity, and the symmetry of multisegmental cGOFs is conserved among taxa and constrained by DNA polymerase-centric strand-biased gene distribution. The definition of species-specific cGOFs provides powerful guidance for genome assembly and other structure-based analysis. Prokaryotic genomes are frequently interrupted by horizontal gene transfer (HGT) and rearrangement. To know whether there is a set of genes not only conserved in position among isolates but also functionally essential for a given species and to further evaluate the stability or flexibility of such genome structures across lineages are of importance. Based on a large number of multi-isolate pangenomic data, our analysis reveals that a subset of core genes is organized into a core-gene-defined genome organizational framework, or cGOF. Furthermore, the lineage-associated cGOFs among Gram-positive and Gram-negative bacteria behave differently: the former, composed of 2 to 4 segments, have their fragments symmetrically rearranged around the origin-terminus axis, whereas the latter show more complex segmentation and are partitioned asymmetrically into chromosomal structures. The definition of cGOFs provides new insights into prokaryotic genome organization and efficient guidance for genome assembly and analysis. Copyright © 2014 Kang et al.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fliedner Theodor M.; Feinendegen Ludwig E.; Meineke Viktor
2005-02-28
First results of this feasibility study showed that evaluation of the stored material of the chronically irradiated dogs with modern molecular biological techniques proved to be successful and extremely promising. Therefore an in deep analysis of at least part of the huge amount of remaining material is of outmost interest. The methods applied in this feasibility study were pathological evaluation with different staining methods, protein analysis by means of immunohistochemistry, strand break analysis with the TdT-assay, DNA- and RNA-analysis as well as genomic examination by gene array. Overall more than 50% of the investigated material could be used. In particularmore » the results of an increased stimulation of the immune system within the dogs of the 3mSv group as both compared to the control and higher dose groups gives implications for the in depth study of the cellular events occurring in context with low dose radiation. Based on the findings of this study a further evaluation and statistically analysis of more material can help to identify promising biomarkers for low dose radiation. A systematic evaluation of a correlation of dose rates and strand breaks within the dog tissue might moreover help to explain mechanisms of tolerance to IR. One central problem is that most sequences for dog specific primers are not known yet. The discovery of the dog genome is still under progress. In this study the isolation of RNA within the dog tissue was successful. But up to now there are no gene arrays or gene chips commercially available, tested and adapted for canine tissue. The uncritical use of untested genomic test systems for canine tissue seems to be ineffective at the moment, time consuming and ineffective. Next steps in the investigation of genomic changes after IR within the stored dog tissue should be limited to quantitative RT-PCR of tested primer sequences for the dog. A collaboration with institutions working in the field of the discovery of the dog genome could have synergistic effects.« less
Garita-Cambronero, Jerson; Palacio-Bielsa, Ana; López, María M.; Cubero, Jaime
2017-01-01
Xanthomonas arboricola is a plant-associated bacterial species that causes diseases on several plant hosts. One of the most virulent pathovars within this species is X. arboricola pv. pruni (Xap), the causal agent of bacterial spot disease of stone fruit trees and almond. Recently, a non-virulent Xap-look-a-like strain isolated from Prunus was characterized and its genome compared to pathogenic strains of Xap, revealing differences in the profile of virulence factors, such as the genes related to the type III secretion system (T3SS) and type III effectors (T3Es). The existence of this atypical strain arouses several questions associated with the abundance, the pathogenicity, and the evolutionary context of X. arboricola on Prunus hosts. After an initial characterization of a collection of Xanthomonas strains isolated from Prunus bacterial spot outbreaks in Spain during the past decade, six Xap-look-a-like strains, that did not clustered with the pathogenic strains of Xap according to a multi locus sequence analysis, were identified. Pathogenicity of these strains was analyzed and the genome sequences of two Xap-look-a-like strains, CITA 14 and CITA 124, non-virulent to Prunus spp., were obtained and compared to those available genomes of X. arboricola associated with this host plant. Differences were found among the genomes of the virulent and the Prunus non-virulent strains in several characters related to the pathogenesis process. Additionally, a pan-genomic analysis that included the available genomes of X. arboricola, revealed that the atypical strains associated with Prunus were related to a group of non-virulent or low virulent strains isolated from a wide host range. The repertoire of the genes related to T3SS and T3Es varied among the strains of this cluster and those strains related to the most virulent pathovars of the species, corylina, juglandis, and pruni. This variability provides information about the potential evolutionary process associated to the acquisition of pathogenicity and host specificity in X. arboricola. Finally, based in the genomic differences observed between the virulent and the non-virulent strains isolated from Prunus, a sensitive and specific real-time PCR protocol was designed to detect and identify Xap strains. This method avoids miss-identifications due to atypical strains of X. arboricola that can cohabit Prunus. PMID:28450852
DEFINING THE MANDATE OF PROTEOMICS IN THE POST-GENOMIC ERA: WORKSHOP REPORT
Research in proteomics is the next step after genomics in understanding life processes at the molecular level. In the largest sense proteomics encompasses knowledge of the structure, function and expression of all proteins in the biochemical or biological contexts of all organism...
The 3D genome in transcriptional regulation and pluripotency.
Gorkin, David U; Leung, Danny; Ren, Bing
2014-06-05
It can be convenient to think of the genome as simply a string of nucleotides, the linear order of which encodes an organism's genetic blueprint. However, the genome does not exist as a linear entity within cells where this blueprint is actually utilized. Inside the nucleus, the genome is organized in three-dimensional (3D) space, and lineage-specific transcriptional programs that direct stem cell fate are implemented in this native 3D context. Here, we review principles of 3D genome organization in mammalian cells. We focus on the emerging relationship between genome organization and lineage-specific transcriptional regulation, which we argue are inextricably linked. Copyright © 2014 Elsevier Inc. All rights reserved.
Complementation of a red-light-indifferent cyanobacterial mutant.
Chiang, G G; Schaefer, M R; Grossman, A R
1992-01-01
Many cyanobacteria alter their phycobilisome composition in response to changes in light wavelength in a process termed complementary chromatic adaptation. Mutant strains FdR1 and FdR2 of the filamentous cyanobacterium Fremyella diplosiphon are characterized by aberrant chromatic adaptation. Instead of adjusting to different wavelengths of light, FdR1 and FdR2 behave as if they are always in green light; they do not respond to red light. We have previously reported complementation of FdR1 by conjugal transfer of a wild-type genomic library. The complementing DNA has now been localized by genetic analysis to a region on the rescued genomic subclone that contains a gene designated rcaC. This region of DNA is also able to complement FdR2. Southern blot analysis of genomic DNA from FdR1 and FdR2 indicates that these strains harbor DNA insertions within the rcaC sequence that may have resulted from the activity of transposable genetic elements. The predicted amino acid sequence of RcaC shares strong identity to response regulators of bacterial two-component regulatory systems. This relationship is discussed in the context of the signal-transduction pathway mediating regulation of genes encoding phycobilisome polypeptides during chromatic adaptation. Images PMID:1409650
Steichen, Clara; Luce, Eléanor; Maluenda, Jérôme; Tosca, Lucie; Moreno-Gimeno, Inmaculada; Desterke, Christophe; Dianat, Noushin; Goulinet-Mainot, Sylvie; Awan-Toor, Sarah; Burks, Deborah; Marie, Joëlle; Weber, Anne; Tachdjian, Gérard; Melki, Judith; Dubart-Kupperschmitt, Anne
2014-06-01
The use of synthetic messenger RNAs to generate human induced pluripotent stem cells (iPSCs) is particularly appealing for potential regenerative medicine applications, because it overcomes the common drawbacks of DNA-based or virus-based reprogramming strategies, including transgene integration in particular. We compared the genomic integrity of mRNA-derived iPSCs with that of retrovirus-derived iPSCs generated in strictly comparable conditions, by single-nucleotide polymorphism (SNP) and copy number variation (CNV) analyses. We showed that mRNA-derived iPSCs do not differ significantly from the parental fibroblasts in SNP analysis, whereas retrovirus-derived iPSCs do. We found that the number of CNVs seemed independent of the reprogramming method, instead appearing to be clone-dependent. Furthermore, differentiation studies indicated that mRNA-derived iPSCs differentiated efficiently into hepatoblasts and that these cells did not load additional CNVs during differentiation. The integration-free hepatoblasts that were generated constitute a new tool for the study of diseased hepatocytes derived from patients' iPSCs and their use in the context of stem cell-derived hepatocyte transplantation. Our findings also highlight the need to conduct careful studies on genome integrity for the selection of iPSC lines before using them for further applications. ©AlphaMed Press.
Grenville-Briggs, Laura J; Stansfield, Ian
2011-01-01
This report describes a linked series of Masters-level computer practical workshops. They comprise an advanced functional genomics investigation, based upon analysis of a microarray dataset probing yeast DNA damage responses. The workshops require the students to analyse highly complex transcriptomics datasets, and were designed to stimulate active learning through experience of current research methods in bioinformatics and functional genomics. They seek to closely mimic a realistic research environment, and require the students first to propose research hypotheses, then test those hypotheses using specific sections of the microarray dataset. The complexity of the microarray data provides students with the freedom to propose their own unique hypotheses, tested using appropriate sections of the microarray data. This research latitude was highly regarded by students and is a strength of this practical. In addition, the focus on DNA damage by radiation and mutagenic chemicals allows them to place their results in a human medical context, and successfully sparks broad interest in the subject material. In evaluation, 79% of students scored the practical workshops on a five-point scale as 4 or 5 (totally effective) for student learning. More broadly, the general use of microarray data as a "student research playground" is also discussed. Copyright © 2011 Wiley Periodicals, Inc.
Reverse engineering and analysis of large genome-scale gene networks
Aluru, Maneesha; Zola, Jaroslaw; Nettleton, Dan; Aluru, Srinivas
2013-01-01
Reverse engineering the whole-genome networks of complex multicellular organisms continues to remain a challenge. While simpler models easily scale to large number of genes and gene expression datasets, more accurate models are compute intensive limiting their scale of applicability. To enable fast and accurate reconstruction of large networks, we developed Tool for Inferring Network of Genes (TINGe), a parallel mutual information (MI)-based program. The novel features of our approach include: (i) B-spline-based formulation for linear-time computation of MI, (ii) a novel algorithm for direct permutation testing and (iii) development of parallel algorithms to reduce run-time and facilitate construction of large networks. We assess the quality of our method by comparison with ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) and GeneNet and demonstrate its unique capability by reverse engineering the whole-genome network of Arabidopsis thaliana from 3137 Affymetrix ATH1 GeneChips in just 9 min on a 1024-core cluster. We further report on the development of a new software Gene Network Analyzer (GeNA) for extracting context-specific subnetworks from a given set of seed genes. Using TINGe and GeNA, we performed analysis of 241 Arabidopsis AraCyc 8.0 pathways, and the results are made available through the web. PMID:23042249
Draye, Xavier; Lin, Yann-Rong; Qian, Xiao-yin; Bowers, John E.; Burow, Gloria B.; Morrell, Peter L.; Peterson, Daniel G.; Presting, Gernot G.; Ren, Shu-xin; Wing, Rod A.; Paterson, Andrew H.
2001-01-01
The small genome of sorghum (Sorghum bicolor L. Moench.) provides an important template for study of closely related large-genome crops such as maize (Zea mays) and sugarcane (Saccharum spp.), and is a logical complement to distantly related rice (Oryza sativa) as a “grass genome model.” Using a high-density RFLP map as a framework, a robust physical map of sorghum is being assembled by integrating hybridization and fingerprint data with comparative data from related taxa such as rice and using new methods to resolve genomic duplications into locus-specific groups. By taking advantage of allelic variation revealed by heterologous probes, the positions of corresponding loci on the wheat (Triticum aestivum), rice, maize, sugarcane, and Arabidopsis genomes are being interpolated on the sorghum physical map. Bacterial artificial chromosomes for the small genome of rice are shown to close several gaps in the sorghum contigs; the emerging rice physical map and assembled sequence will further accelerate progress. An important motivation for developing genomic tools is to relate molecular level variation to phenotypic diversity. “Diversity maps,” which depict the levels and patterns of variation in different gene pools, shed light on relationships of allelic diversity with chromosome organization, and suggest possible locations of genomic regions that are under selection due to major gene effects (some of which may be revealed by quantitative trait locus mapping). Both physical maps and diversity maps suggest interesting features that may be integrally related to the chromosomal context of DNA—progress in cytology promises to provide a means to elucidate such relationships. We seek to provide a detailed picture of the structure, function, and evolution of the genome of sorghum and its relatives, together with molecular tools such as locus-specific sequence-tagged site DNA markers and bacterial artificial chromosome contigs that will have enduring value for many aspects of genome analysis. PMID:11244113
Genomic diversity of necrotic enteritis-associated strains of Clostridium perfringens: a review.
Lacey, Jake A; Johanesen, Priscilla A; Lyras, Dena; Moore, Robert J
2016-06-01
The investigation of genomic variation between Clostridium perfringens isolates from poultry has been an important tool to enhance our understanding of the genetic basis of strain pathogenicity and the epidemiology of virulent and avirulent strains within the context of necrotic enteritis (NE). The earliest studies used whole genome profiling techniques such as pulsed-field gel electrophoresis to differentiate isolates and determine their relative levels of relatedness. DNA sequencing has been used to investigate genetic variation in (a) individual genes, such as those encoding the alpha and NetB toxins; (b) panels of housekeeping genes for multi-locus sequence typing and (c) most recently whole genome sequencing to build a more complete picture of genomic differences between isolates. Conclusions drawn from these studies include: differential carriage of large conjugative plasmids accounts for a large proportion of inter-strain differences; plasmid-encoded genes are more highly conserved than chromosomal genes, perhaps indicating a relatively recent origin for the plasmids; isolates from NE-affected birds fall into three distinct sequence-based clades while non-pathogenic isolates from healthy birds tend to be more genomically diverse. Overall, the NE causing strains are closely related to C. perfringens isolates from other birds and other diseases whereas the non-pathogenic poultry strains are generally more remotely related to either the pathogenic strains or the strains from other birds. Genomic analysis has indicated that genes in addition to netB are associated with NE pathogenic isolates. Collectively, this work has resulted in a deeper understanding of the pathogenesis of this important poultry disease.
Genomics and Public Health Research: Can the State Allow Access to Genomic Databases?
Cousineau, J; Girard, N; Monardes, C; Leroux, T; Jean, M Stanton
2012-01-01
Because many diseases are multifactorial disorders, the scientific progress in genomics and genetics should be taken into consideration in public health research. In this context, genomic databases will constitute an important source of information. Consequently, it is important to identify and characterize the State’s role and authority on matters related to public health, in order to verify whether it has access to such databases while engaging in public health genomic research. We first consider the evolution of the concept of public health, as well as its core functions, using a comparative approach (e.g. WHO, PAHO, CDC and the Canadian province of Quebec). Following an analysis of relevant Quebec legislation, the precautionary principle is examined as a possible avenue to justify State access to and use of genomic databases for research purposes. Finally, we consider the Influenza pandemic plans developed by WHO, Canada, and Quebec, as examples of key tools framing public health decision-making process. We observed that State powers in public health, are not, in Quebec, well adapted to the expansion of genomics research. We propose that the scope of the concept of research in public health should be clear and include the following characteristics: a commitment to the health and well-being of the population and to their determinants; the inclusion of both applied research and basic research; and, an appropriate model of governance (authorization, follow-up, consent, etc.). We also suggest that the strategic approach version of the precautionary principle could guide collective choices in these matters. PMID:23113174
Issues surrounding the health economic evaluation of genomic technologies
Buchanan, James; Wordsworth, Sarah; Schuh, Anna
2014-01-01
Aim Genomic interventions could enable improved disease stratification and individually tailored therapies. However, they have had a limited impact on clinical practice to date due to a lack of evidence, particularly economic evidence. This is partly because health economists are yet to reach consensus on whether existing methods are sufficient to evaluate genomic technologies. As different approaches may produce conflicting adoption decisions, clarification is urgently required. This article summarizes the methodological issues associated with conducting economic evaluations of genomic interventions. Materials & methods A structured literature review was conducted to identify references that considered the methodological challenges faced when conducting economic evaluations of genomic interventions. Results Methodological challenges related to the analytical approach included the choice of comparator, perspective and timeframe. Challenges in costing centered around the need to collect a broad range of costs, frequently, in a data-limited environment. Measuring outcomes is problematic as standard measures have limited applicability, however, alternative metrics (e.g., personal utility) are underdeveloped and alternative approaches (e.g., cost–benefit analysis) underused. Effectiveness data quality is weak and challenging to incorporate into standard economic analyses, while little is known about patient and clinician behavior in this context. Comprehensive value of information analyses are likely to be helpful. Conclusion Economic evaluations of genomic technologies present a particular challenge for health economists. New methods may be required to resolve these issues, but the evidence to justify alternative approaches is yet to be produced. This should be the focus of future work in this field. PMID:24236483
WordCluster: detecting clusters of DNA words and genomic elements
2011-01-01
Background Many k-mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds. Results We introduce here an algorithm to detect clusters of DNA words (k-mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used WordCluster to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome. Conclusions WordCluster seems to predict biological meaningful clusters of DNA words (k-mers) and genomic entities. The implementation of the method into a web server is available at http://bioinfo2.ugr.es/wordCluster/wordCluster.php including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes. PMID:21261981
Genomics in the land of regulatory science.
Tong, Weida; Ostroff, Stephen; Blais, Burton; Silva, Primal; Dubuc, Martine; Healy, Marion; Slikker, William
2015-06-01
Genomics science has played a major role in the generation of new knowledge in the basic research arena, and currently question arises as to its potential to support regulatory processes. However, the integration of genomics in the regulatory decision-making process requires rigorous assessment and would benefit from consensus amongst international partners and research communities. To that end, the Global Coalition for Regulatory Science Research (GCRSR) hosted the fourth Global Summit on Regulatory Science (GSRS2014) to discuss the role of genomics in regulatory decision making, with a specific emphasis on applications in food safety and medical product development. Challenges and issues were discussed in the context of developing an international consensus for objective criteria in the analysis, interpretation and reporting of genomics data with an emphasis on transparency, traceability and "fitness for purpose" for the intended application. It was recognized that there is a need for a global path in the establishment of a regulatory bioinformatics framework for the development of transparent, reliable, reproducible and auditable processes in the management of food and medical product safety risks. It was also recognized that training is an important mechanism in achieving internationally consistent outcomes. GSRS2014 provided an effective venue for regulators andresearchers to meet, discuss common issues, and develop collaborations to address the challenges posed by the application of genomics to regulatory science, with the ultimate goal of wisely integrating novel technical innovations into regulatory decision-making. Published by Elsevier Inc.
Kuo, Kevin H M
2017-01-01
The issue of multiple testing, also termed multiplicity, is ubiquitous in studies where multiple hypotheses are tested simultaneously. Genome-wide association study (GWAS), a type of genetic association study that has gained popularity in the past decade, is most susceptible to the issue of multiple testing. Different methodologies have been employed to address the issue of multiple testing in GWAS. The purpose of the review is to examine the methodologies employed in dealing with multiple testing in the context of gene discovery using GWAS in sickle cell disease complications.
Dynamics and Context-Dependent Roles of DNA Methylation.
Ambrosi, Christina; Manzo, Massimiliano; Baubec, Tuncay
2017-05-19
DNA methylation is one of the most extensively studied epigenetic marks. It is involved in transcriptional gene silencing and plays important roles during mammalian development. Its perturbation is often associated with human diseases. In mammalian genomes, DNA methylation is a prevalent modification that decorates the majority of cytosines. It is found at the promoters and enhancers of inactive genes, at repetitive elements, and within transcribed gene bodies. Its presence at promoters is dynamically linked to gene activity, suggesting that it could directly influence gene expression patterns and cellular identity. The genome-wide distribution and dynamic behaviour of this mark have been studied in great detail in a variety of tissues and cell lines, including early embryonic development and in embryonic stem cells. In combination with functional studies, these genome-wide maps of DNA methylation revealed interesting features of this mark and provided important insights into its dynamic nature and potential functional role in genome regulation. In this review, we discuss how these recent observations, in combination with insights obtained from biochemical and functional genetics studies, have expanded our current knowledge about the regulation and context-dependent roles of DNA methylation in mammalian genomes. Copyright © 2017 Elsevier Ltd. All rights reserved.
Development of biosensors and their application in metabolic engineering.
Zhang, Jie; Jensen, Michael K; Keasling, Jay D
2015-10-01
In a sustainable bioeconomy, many commodities and high value chemicals, including pharmaceuticals, will be manufactured using microbial cell factories from renewable feedstocks. These cell factories can be efficiently generated by constructing libraries of diversified genomes followed by screening for the desired phenotypes. However, methods available for microbial genome diversification far exceed our ability to screen and select for those variants with optimal performance. Genetically encoded biosensors have shown the potential to address this gap, given their ability to respond to small molecule binding and ease of implementation with high-throughput analysis. Here we describe recent progress in biosensor development and their applications in a metabolic engineering context. We also highlight examples of how biosensors can be integrated with synthetic circuits to exert feedback regulation on the metabolism for improved performance of cell factories. Copyright © 2015 Elsevier Ltd. All rights reserved.
Resources, challenges and way forward in rare mitochondrial diseases research.
Rajput, Neeraj Kumar; Singh, Vipin; Bhardwaj, Anshu
2015-01-01
Over 300 million people are affected by about 7000 rare diseases globally. There are tremendous resource limitations and challenges in driving research and drug development for rare diseases. Hence, innovative approaches are needed to identify potential solutions. This review focuses on the resources developed over the past years for analysis of genome data towards understanding disease biology especially in the context of mitochondrial diseases, given that mitochondria are central to major cellular pathways and their dysfunction leads to a broad spectrum of diseases. Platforms for collaboration of research groups, clinicians and patients and the advantages of community collaborative efforts in addressing rare diseases are also discussed. The review also describes crowdsourcing and crowdfunding efforts in rare diseases research and how the upcoming initiatives for understanding disease biology including analyses of large number of genomes are also applicable to rare diseases.
Beyond sequencing: optical mapping of DNA in the age of nanotechnology and nanoscopy.
Levy-Sakin, Michal; Ebenstein, Yuval
2013-08-01
Next generation sequencing (NGS) is revolutionizing all fields of biological research but it fails to extract the full range of information associated with genetic material. Optical mapping of DNA grants access to genetic and epigenetic information on individual DNA molecules up to ∼1 Mbp in length. Fluorescent labeling of specific sequence motifs, epigenetic marks and other genomic information on individual DNA molecules generates a high content optical barcode along the DNA. By stretching the DNA to a linear configuration this barcode may be directly visualized by fluorescence microscopy. We discuss the advances of these methods in light of recent developments in nano-fabrication and super-resolution optical imaging (nanoscopy) and review the latest achievements of optical mapping in the context of genomic analysis. Copyright © 2013 Elsevier Ltd. All rights reserved.
Drost, Derek R; Novaes, Evandro; Boaventura-Novaes, Carolina; Benedict, Catherine I; Brown, Ryan S; Yin, Tongming; Tuskan, Gerald A; Kirst, Matias
2009-06-01
Microarrays have demonstrated significant power for genome-wide analyses of gene expression, and recently have also revolutionized the genetic analysis of segregating populations by genotyping thousands of loci in a single assay. Although microarray-based genotyping approaches have been successfully applied in yeast and several inbred plant species, their power has not been proven in an outcrossing species with extensive genetic diversity. Here we have developed methods for high-throughput microarray-based genotyping in such species using a pseudo-backcross progeny of 154 individuals of Populus trichocarpa and P. deltoides analyzed with long-oligonucleotide in situ-synthesized microarray probes. Our analysis resulted in high-confidence genotypes for 719 single-feature polymorphism (SFP) and 1014 gene expression marker (GEM) candidates. Using these genotypes and an established microsatellite (SSR) framework map, we produced a high-density genetic map comprising over 600 SFPs, GEMs and SSRs. The abundance of gene-based markers allowed us to localize over 35 million base pairs of previously unplaced whole-genome shotgun (WGS) scaffold sequence to putative locations in the genome of P. trichocarpa. A high proportion of sampled scaffolds could be verified for their placement with independently mapped SSRs, demonstrating the previously un-utilized power that high-density genotyping can provide in the context of map-based WGS sequence reassembly. Our results provide a substantial contribution to the continued improvement of the Populus genome assembly, while demonstrating the feasibility of microarray-based genotyping in a highly heterozygous population. The strategies presented are applicable to genetic mapping efforts in all plant species with similarly high levels of genetic diversity.
Identifying elemental genomic track types and representing them uniformly
2011-01-01
Background With the recent advances and availability of various high-throughput sequencing technologies, data on many molecular aspects, such as gene regulation, chromatin dynamics, and the three-dimensional organization of DNA, are rapidly being generated in an increasing number of laboratories. The variation in biological context, and the increasingly dispersed mode of data generation, imply a need for precise, interoperable and flexible representations of genomic features through formats that are easy to parse. A host of alternative formats are currently available and in use, complicating analysis and tool development. The issue of whether and how the multitude of formats reflects varying underlying characteristics of data has to our knowledge not previously been systematically treated. Results We here identify intrinsic distinctions between genomic features, and argue that the distinctions imply that a certain variation in the representation of features as genomic tracks is warranted. Four core informational properties of tracks are discussed: gaps, lengths, values and interconnections. From this we delineate fifteen generic track types. Based on the track type distinctions, we characterize major existing representational formats and find that the track types are not adequately supported by any single format. We also find, in contrast to the XML formats, that none of the existing tabular formats are conveniently extendable to support all track types. We thus propose two unified formats for track data, an improved XML format, BioXSD 1.1, and a new tabular format, GTrack 1.0. Conclusions The defined track types are shown to capture relevant distinctions between genomic annotation tracks, resulting in varying representational needs and analysis possibilities. The proposed formats, GTrack 1.0 and BioXSD 1.1, cater to the identified track distinctions and emphasize preciseness, flexibility and parsing convenience. PMID:22208806
Comparative genomics of pyridoxal 5′-phosphate-dependent transcription factor regulons in Bacteria
Suvorova, Inna A.
2016-01-01
The MocR-subfamily transcription factors (MocR-TFs) characterized by the GntR-family DNA-binding domain and aminotransferase-like sensory domain are broadly distributed among certain lineages of Bacteria. Characterized MocR-TFs bind pyridoxal 5′-phosphate (PLP) and control transcription of genes involved in PLP, gamma aminobutyric acid (GABA) and taurine metabolism via binding specific DNA operator sites. To identify putative target genes and DNA binding motifs of MocR-TFs, we performed comparative genomics analysis of over 250 bacterial genomes. The reconstructed regulons for 825 MocR-TFs comprise structural genes from over 200 protein families involved in diverse biological processes. Using the genome context and metabolic subsystem analysis we tentatively assigned functional roles for 38 out of 86 orthologous groups of studied regulators. Most of these MocR-TF regulons are involved in PLP metabolism, as well as utilization of GABA, taurine and ectoine. The remaining studied MocR-TF regulators presumably control genes encoding enzymes involved in reduction/oxidation processes, various transporters and PLP-dependent enzymes, for example aminotransferases. Predicted DNA binding motifs of MocR-TFs are generally similar in each orthologous group and are characterized by two to four repeated sequences. Identified motifs were classified according to their structures. Motifs with direct and/or inverted repeat symmetry constitute the majority of inferred DNA motifs, suggesting preferable TF dimerization in head-to-tail or head-to-head configuration. The obtained genomic collection of in silico reconstructed MocR-TF motifs and regulons in Bacteria provides a basis for future experimental characterization of molecular mechanisms for various regulators in this family. PMID:28348826
Goswami, Sathi; Sanyal, Sulagna; Chakraborty, Payal; Das, Chandrima; Sarkar, Munna
2017-08-01
NSAIDs are the most common class of painkillers and anti-inflammatory agents. They also show other functions like chemoprevention and chemosuppression for which they act at the protein but not at the genome level since they are mostly anions at physiological pH, which prohibit their approach to the poly-anionic DNA. Complexing the drugs with bioactive metal obliterate their negative charge and allow them to bind to the DNA, thereby, opening the possibility of genome level interaction. To test this hypothesis, we present the interaction of a traditional NSAID, Piroxicam and its copper complex with core histone and chromatin. Spectroscopy, DLS, and SEM studies were applied to see the effect of the interaction on the structure of histone/chromatin. This was coupled with MTT assay, immunoblot analysis, confocal microscopy, micro array analysis and qRT-PCR. The interaction of Piroxicam and its copper complex with histone/chromatin results in structural alterations. Such structural alterations can have different biological manifestations, but to test our hypothesis, we have focused only on the accompanied modulations at the epigenomic/genomic level. The complex, showed alteration of key epigenetic signatures implicated in transcription in the global context, although Piroxicam caused no significant changes. We have correlated such alterations caused by the complex with the changes in global gene expression and validated the candidate gene expression alterations. Our results provide the proof of concept that DNA binding ability of the copper complexes of a traditional NSAID, opens up the possibility of modulations at the epigenomic/genomic level. Copyright © 2017 Elsevier B.V. All rights reserved.
Whole genome sequencing of one complex pedigree illustrates challenges with genomic medicine.
Fang, Han; Wu, Yiyang; Yang, Hui; Yoon, Margaret; Jiménez-Barrón, Laura T; Mittelman, David; Robison, Reid; Wang, Kai; Lyon, Gholson J
2017-02-23
Human Phenotype Ontology (HPO) has risen as a useful tool for precision medicine by providing a standardized vocabulary of phenotypic abnormalities to describe presentations of human pathologies; however, there have been relatively few reports combining whole genome sequencing (WGS) and HPO, especially in the context of structural variants. We illustrate an integrative analysis of WGS and HPO using an extended pedigree, which involves Prader-Willi Syndrome (PWS), hereditary hemochromatosis (HH), and dysautonomia-like symptoms. A comprehensive WGS pipeline was used to ensure reliable detection of genomic variants. Beyond variant filtering, we pursued phenotypic prioritization of candidate genes using Phenolyzer. Regarding PWS, WGS confirmed a 5.5 Mb de novo deletion of the parental allele at 15q11.2 to 15q13.1. Phenolyzer successfully returned the diagnosis of PWS, and pinpointed clinically relevant genes in the deletion. Further, Phenolyzer revealed how each of the genes is linked with the phenotypes represented by HPO terms. For HH, WGS identified a known disease variant (p.C282Y) in HFE of an affected female. Analysis of HPO terms alone fails to provide a correct diagnosis, but Phenolyzer successfully revealed the phenotype-genotype relationship using a disease-centric approach. Finally, Phenolyzer also revealed the complexity behind dysautonomia-like symptoms, and seven variants that might be associated with the phenotypes were identified by manual filtering based on a dominant inheritance model. The integration of WGS and HPO can inform comprehensive molecular diagnosis for patients, eliminate false positives and reveal novel insights into undiagnosed diseases. Due to extreme heterogeneity and insufficient knowledge of human diseases, it is also important that phenotypic and genomic data are standardized and shared simultaneously.
Use of causative variants and SNP weighting in a single-step GBLUP context
USDA-ARS?s Scientific Manuscript database
Much effort has been recently put into identifying causative quantitative trait nucleotides (QTN) in animal breeding, aiming genomic prediction. Among the genomic methods available, single-step GBLUP (ssGBLUP) became the choice because of its simplicity and potentially higher accuracy. When QTN are ...
Woodbury-Smith, M; Bilder, D A; Morgan, J; Jerominski, L; Darlington, T; Dyer, T; Paterson, A D; Coon, H
2017-01-01
It has long been recognized that there is an association between enlarged head circumference (HC) and autism spectrum disorder (ASD), but the genetics of HC in ASD is not well understood. In order to investigate the genetic underpinning of HC in ASD, we undertook a genome-wide linkage study of HC followed by linkage signal targeted association among a sample of 67 extended pedigrees with ASD. HC measurements on members of 67 multiplex ASD extended pedigrees were used as a quantitative trait in a genome-wide linkage analysis. The Illumina 6K SNP linkage panel was used, and analyses were carried out using the SOLAR implemented variance components model. Loci identified in this way formed the target for subsequent association analysis using the Illumina OmniExpress chip and imputed genotypes. A modification of the qTDT was used as implemented in SOLAR. We identified a linkage signal spanning 6p21.31 to 6p22.2 (maximum LOD = 3.4). Although targeted association did not find evidence of association with any SNP overall, in one family with the strongest evidence of linkage, there was evidence for association (rs17586672, p = 1.72E-07). Although this region does not overlap with ASD linkage signals in these same samples, it has been associated with other psychiatric risk, including ADHD, developmental dyslexia, schizophrenia, specific language impairment, and juvenile bipolar disorder. The genome-wide significant linkage signal represents the first reported observation of a potential quantitative trait locus for HC in ASD and may be relevant in the context of complex multivariate risk likely leading to ASD.
Krauss, Scott; Stucker, Karla M; Schobel, Seth A; Danner, Angela; Friedman, Kimberly; Knowles, James P; Kayali, Ghazi; Niles, Lawrence J; Dey, Amanda D; Raven, Garnet; Pryor, Paul; Lin, Xudong; Das, Suman R; Stockwell, Timothy B; Wentworth, David E; Webster, Robert G
2015-01-01
The emergence of influenza A virus (IAV) in domestic avian species and associated transmissions to mammals is unpredictable. In the Americas, the H7 IAVs are of particular concern, and there have been four separate outbreaks of highly pathogenic (HP) H7N3 in domestic poultry in North and South America between 2002 and 2012, with occasional spillover into humans. Here, we use long-term IAV surveillance in North American shorebirds at Delaware Bay, USA, from 1985 to 2012 and in ducks in Alberta, Canada, from 1976 to 2012 to determine which hemagglutinin (HA)–neuraminidase (NA) combinations predominated in Anseriformes (ducks) and Charadriiformes (shorebirds) and whether there is concordance between peaks of H7 prevalence and transmission in wild aquatic birds and the emergence of H7 IAVs in poultry and humans. Whole-genome sequencing supported phylogenetic and genomic constellation analyses to determine whether HP IAVs emerge in the context of specific internal gene segment sequences. Phylogenetic analysis of whole-genome sequences of the H7N3 influenza viruses from wild birds and HP H7N3 outbreaks in the Americas indicate that each HP outbreak was an independent emergence event and that the low pathogenic (LP) avian influenza precursors were most likely from dabbling ducks. The different polybasic cleavage sites in the four HP outbreaks support independent origins. At the 95% nucleotide percent identity-level phylogenetic analysis showed that the wild duck HA, PB1, and M sequences clustered with the poultry and human outbreak sequences. The genomic constellation analysis strongly suggests that gene segments/virus flow from wild birds to domestic poultry. PMID:26954883
Optimizing complex phenotypes through model-guided multiplex genome engineering
Kuznetsov, Gleb; Goodman, Daniel B.; Filsinger, Gabriel T.; ...
2017-05-25
Here, we present a method for identifying genomic modifications that optimize a complex phenotype through multiplex genome engineering and predictive modeling. We apply our method to identify six single nucleotide mutations that recover 59% of the fitness defect exhibited by the 63-codon E. coli strain C321.ΔA. By introducing targeted combinations of changes in multiplex we generate rich genotypic and phenotypic diversity and characterize clones using whole-genome sequencing and doubling time measurements. Regularized multivariate linear regression accurately quantifies individual allelic effects and overcomes bias from hitchhiking mutations and context-dependence of genome editing efficiency that would confound other strategies.
Optimizing complex phenotypes through model-guided multiplex genome engineering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kuznetsov, Gleb; Goodman, Daniel B.; Filsinger, Gabriel T.
Here, we present a method for identifying genomic modifications that optimize a complex phenotype through multiplex genome engineering and predictive modeling. We apply our method to identify six single nucleotide mutations that recover 59% of the fitness defect exhibited by the 63-codon E. coli strain C321.ΔA. By introducing targeted combinations of changes in multiplex we generate rich genotypic and phenotypic diversity and characterize clones using whole-genome sequencing and doubling time measurements. Regularized multivariate linear regression accurately quantifies individual allelic effects and overcomes bias from hitchhiking mutations and context-dependence of genome editing efficiency that would confound other strategies.
methylPipe and compEpiTools: a suite of R packages for the integrative analysis of epigenomics data.
Kishore, Kamal; de Pretis, Stefano; Lister, Ryan; Morelli, Marco J; Bianchi, Valerio; Amati, Bruno; Ecker, Joseph R; Pelizzola, Mattia
2015-09-29
Numerous methods are available to profile several epigenetic marks, providing data with different genome coverage and resolution. Large epigenomic datasets are then generated, and often combined with other high-throughput data, including RNA-seq, ChIP-seq for transcription factors (TFs) binding and DNase-seq experiments. Despite the numerous computational tools covering specific steps in the analysis of large-scale epigenomics data, comprehensive software solutions for their integrative analysis are still missing. Multiple tools must be identified and combined to jointly analyze histone marks, TFs binding and other -omics data together with DNA methylation data, complicating the analysis of these data and their integration with publicly available datasets. To overcome the burden of integrating various data types with multiple tools, we developed two companion R/Bioconductor packages. The former, methylPipe, is tailored to the analysis of high- or low-resolution DNA methylomes in several species, accommodating (hydroxy-)methyl-cytosines in both CpG and non-CpG sequence context. The analysis of multiple whole-genome bisulfite sequencing experiments is supported, while maintaining the ability of integrating targeted genomic data. The latter, compEpiTools, seamlessly incorporates the results obtained with methylPipe and supports their integration with other epigenomics data. It provides a number of methods to score these data in regions of interest, leading to the identification of enhancers, lncRNAs, and RNAPII stalling/elongation dynamics. Moreover, it allows a fast and comprehensive annotation of the resulting genomic regions, and the association of the corresponding genes with non-redundant GeneOntology terms. Finally, the package includes a flexible method based on heatmaps for the integration of various data types, combining annotation tracks with continuous or categorical data tracks. methylPipe and compEpiTools provide a comprehensive Bioconductor-compliant solution for the integrative analysis of heterogeneous epigenomics data. These packages are instrumental in providing biologists with minimal R skills a complete toolkit facilitating the analysis of their own data, or in accelerating the analyses performed by more experienced bioinformaticians.
Hematopoietic transcriptional mechanisms: from locus-specific to genome-wide vantage points.
DeVilbiss, Andrew W; Sanalkumar, Rajendran; Johnson, Kirby D; Keles, Sunduz; Bresnick, Emery H
2014-08-01
Hematopoiesis is an exquisitely regulated process in which stem cells in the developing embryo and the adult generate progenitor cells that give rise to all blood lineages. Master regulatory transcription factors control hematopoiesis by integrating signals from the microenvironment and dynamically establishing and maintaining genetic networks. One of the most rudimentary aspects of cell type-specific transcription factor function, how they occupy a highly restricted cohort of cis-elements in chromatin, remains poorly understood. Transformative technologic advances involving the coupling of next-generation DNA sequencing technology with the chromatin immunoprecipitation assay (ChIP-seq) have enabled genome-wide mapping of factor occupancy patterns. However, formidable problems remain; notably, ChIP-seq analysis yields hundreds to thousands of chromatin sites occupied by a given transcription factor, and only a fraction of the sites appear to be endowed with critical, non-redundant function. It has become en vogue to map transcription factor occupancy patterns genome-wide, while using powerful statistical tools to establish correlations to inform biology and mechanisms. With the advent of revolutionary genome editing technologies, one can now reach beyond correlations to conduct definitive hypothesis testing. This review focuses on key discoveries that have emerged during the path from single loci to genome-wide analyses, specifically in the context of hematopoietic transcriptional mechanisms. Copyright © 2014 ISEH - International Society for Experimental Hematology. Published by Elsevier Inc. All rights reserved.
Ho, Wai Kuan; Muchugi, Alice; Muthemba, Samuel; Kariba, Robert; Mavenkeni, Busiso Olga; Hendre, Prasad; Song, Bo; Van Deynze, Allen; Massawe, Festo; Mayes, Sean
2016-06-01
Maximizing the research output from a limited investment is often the major challenge for minor and underutilized crops. However, such crops may be tolerant to biotic and abiotic stresses and are adapted to local, marginal, and low-input environments. Their development through breeding will provide an important resource for future agricultural system resilience and diversification in the context of changing climates and the need to achieve food security. The African Orphan Crops Consortium recognizes the values of genomic resources in facilitating the improvement of such crops. Prior to beginning genome sequencing there is a need for an assessment of line varietal purity and to estimate any residual heterozygosity. Here we present an example from bambara groundnut (Vigna subterranea (L.) Verdc.), an underutilized drought tolerant African legume. Two released varieties from Zimbabwe, identified as potential genotypes for whole genome sequencing (WGS), were genotyped with 20 species-specific SSR markers. The results indicate that the cultivars are actually a mix of related inbred genotypes, and the analysis allowed a strategy of single plant selection to be used to generate non-heterogeneous DNA for WGS. The markers also confirmed very low levels of heterozygosity within individual plants. The application of a pre-screen using co-dominant microsatellite markers is expected to substantially improve the genome assembly, compared to a cultivar bulking approach that could have been adopted.
Non-B DB: a database of predicted non-B DNA-forming motifs in mammalian genomes.
Cer, Regina Z; Bruce, Kevin H; Mudunuri, Uma S; Yi, Ming; Volfovsky, Natalia; Luke, Brian T; Bacolla, Albino; Collins, Jack R; Stephens, Robert M
2011-01-01
Although the capability of DNA to form a variety of non-canonical (non-B) structures has long been recognized, the overall significance of these alternate conformations in biology has only recently become accepted en masse. In order to provide access to genome-wide locations of these classes of predicted structures, we have developed non-B DB, a database integrating annotations and analysis of non-B DNA-forming sequence motifs. The database provides the most complete list of alternative DNA structure predictions available, including Z-DNA motifs, quadruplex-forming motifs, inverted repeats, mirror repeats and direct repeats and their associated subsets of cruciforms, triplex and slipped structures, respectively. The database also contains motifs predicted to form static DNA bends, short tandem repeats and homo(purine•pyrimidine) tracts that have been associated with disease. The database has been built using the latest releases of the human, chimp, dog, macaque and mouse genomes, so that the results can be compared directly with other data sources. In order to make the data interpretable in a genomic context, features such as genes, single-nucleotide polymorphisms and repetitive elements (SINE, LINE, etc.) have also been incorporated. The database is accessed through query pages that produce results with links to the UCSC browser and a GBrowse-based genomic viewer. It is freely accessible at http://nonb.abcc.ncifcrf.gov.
Nielsen, Tue Kjærgaard; Rasmussen, Morten; Demanèche, Sandrine; Cecillon, Sébastien; Vogel, Timothy M.
2017-01-01
Abstract Bacterial degraders of chlorophenoxy herbicides have been isolated from various ecosystems, including pristine environments. Among these degraders, the sphingomonads constitute a prominent group that displays versatile xenobiotic-degradation capabilities. Four separate sequencing strategies were required to provide the complete sequence of the complex and plastic genome of the canonical chlorophenoxy herbicide-degrading Sphingobium herbicidovorans MH. The genome has an intricate organization of the chlorophenoxy-herbicide catabolic genes sdpA, rdpA, and cadABCD that encode the (R)- and (S)-enantiomer-specific 2,4-dichlorophenoxypropionate dioxygenases and four subunits of a Rieske non-heme iron oxygenase involved in 2-methyl-chlorophenoxyacetic acid degradation, respectively. Several major genomic rearrangements are proposed to help understand the evolution and mobility of these important genes and their genetic context. Single-strain mobilomic sequence analysis uncovered plasmids and insertion sequence-associated circular intermediates in this environmentally important bacterium and enabled the description of evolutionary models for pesticide degradation in strain MH and related organisms. The mobilome presented a complex mosaic of mobile genetic elements including four plasmids and several circular intermediate DNA molecules of insertion-sequence elements and transposons that are central to the evolution of xenobiotics degradation. Furthermore, two individual chromosomally integrated prophages were shown to excise and form free circular DNA molecules. This approach holds great potential for improving the understanding of genome plasticity, evolution, and microbial ecology. PMID:28961970
Ebner, Hubert; Hayn, Dieter; Falgenhauer, Markus; Nitzlnader, Michael; Schleiermacher, Gudrun; Haupt, Riccardo; Erminio, Giovanni; Defferrari, Raffaella; Mazzocco, Katia; Kohler, Jan; Tonini, Gian Paolo; Ladenstein, Ruth; Schreier, Guenter
2016-01-01
Data from two contexts, i.e. the European Unresectable Neuroblastoma (EUNB) clinical trial and results from comparative genomic hybridisation (CGH) analyses from corresponding tumour samples shall be provided to existing repositories for secondary use. Utilizing the European Unified Patient IDentity Management (EUPID) as developed in the course of the ENCCA project, the following processes were applied to the data: standardization (providing interoperability), pseudonymization (generating distinct but linkable pseudonyms for both contexts), and linking both data sources. The applied procedures resulted in a joined dataset that did not contain any identifiers that would allow to backtrack the records to either data sources. This provided a high degree of privacy to the involved patients as required by data protection regulations, without preventing proper analysis.
The amphioxus genome and the evolution of the chordate karyotype
DOE Office of Scientific and Technical Information (OSTI.GOV)
Putnam, Nicholas H.; Butts, Thomas; Ferrier, David E.K.
2008-04-01
Lancelets ('amphioxus') are the modern survivors of an ancient chordate lineage with a fossil record dating back to the Cambrian. We describe the structure and gene content of the highly polymorphic {approx}520 million base pair genome of the Florida lancelet Branchiostoma floridae, and analyze it in the context of chordate evolution. Whole genome comparisons illuminate the murky relationships among the three chordate groups (tunicates, lancelets, and vertebrates), and allow reconstruction of not only the gene complement of the last common chordate ancestor, but also a partial reconstruction of its genomic organization, as well as a description of two genome-wide duplicationsmore » and subsequent reorganizations in the vertebrate lineage. These genome-scale events shaped the vertebrate genome and provided additional genetic variation for exploitation during vertebrate evolution.« less
Hazin, Ribhi; Brothers, Kyle B; Malin, Bradley A; Koenig, Barbara A; Sanderson, Saskia C; Rothstein, Mark A; Williams, Marc S; Clayton, Ellen W; Kullo, Iftikhar J
2013-10-01
The inclusion of genomic data in the electronic health record raises important ethical, legal, and social issues. In this article, we highlight these challenges and discuss potential solutions. We provide a brief background on the current state of electronic health records in the context of genomic medicine, discuss the importance of equitable access to genome-enabled electronic health records, and consider the potential use of electronic health records for improving genomic literacy in patients and providers. We highlight the importance of privacy, access, and security, and of determining which genomic information is included in the electronic health record. Finally, we discuss the challenges of reporting incidental findings, storing and reinterpreting genomic data, and nondocumentation and duty to warn family members at potential genetic risk.
Ali, S; Azfer, M A; Bashamboo, A; Mathur, P K; Malik, P K; Mathur, V B; Raha, A K; Ansari, S
1999-03-04
We have cloned and sequenced a 906bp EcoRI repeat DNA fraction from Rhinoceros unicornis genome. The contig pSS(R)2 is AT rich with 340 A (37.53%), 187 C (20.64%), 173 G (19.09%) and 206 T (22.74%). The sequence contains MALT box, NF-E1, Poly-A signal, lariat consensus sequences, TATA box, translational initiation sequences and several stop codons. Translation of the contig showed seven different types of protein motifs, among which, EGF-like domain cysteine pattern signatures and Bowman-Birk serine protease inhibitor family signatures were prominent. The presence of eukaryotic transcriptional elements, protein signatures and analysis of subset sequences in the 5' region from 1 to 165nt indicating coding potential (test code value=0.97) suggest possible regulatory and/or functional role(s) of these sequences in the rhino genome. Translation of the complementary strand from 906 to 706nt and 190 to 2nt showed proteins of more than 7kDa rich in non-polar residues. This suggests that pSS(R)2 is either a part of, or adjacent to, a functional gene. The contig contains mostly non-consecutive simple repeat units from 2 to 17nt with varying frequencies, of which four base motifs were found to be predominant. Zoo-blot hybridization revealed that pSS(R)2 sequences are unique to R. unicornis genome because they do not cross-hybridize, even with the genomic DNA of South African black rhino Diceros bicornis. Southern blot analysis of R. unicornis genomic DNA with pSS(R)2 and other synthetic oligo probes revealed a high level of genetic homogeneity, which was also substantiated by microsatellite associated sequence amplification (MASA). Owing to its uniqueness, the pSS(R)2 probe has a potential application in the area of conservation biology for unequivocal identification of horn or other body tissues of R. unicornis. The evolutionary aspect of this repeat fraction in the context of comparative genome analysis is discussed.
From Biophysics to Evolutionary Genetics: Statistical Aspects of Gene Regulation
NASA Astrophysics Data System (ADS)
Lässig, Michael
Genomic functions often cannot be understood at the level of single genes but require the study of gene networks. This systems biology credo is nearly commonplace by now. Evidence comes from the comparative analysis of entire genomes: current estimates put, for example, the number of human genes at around 22,000, hardly more than the 14,000 of the fruit fly, and not even an order of magnitude higher than the 6,000 of baker's yeast. The complexity and diversity of higher animals, therefore, cannot be explained in terms of their gene numbers. If, however, a biological function requires the concerted action of several genes, and conversely, a gene takes part in several functional contexts, an organism may be defined less by its individual genes but by their interactions. The emerging picture of the genome as a strongly interacting system with many degrees of freedom brings new challenges for experiment and theory, many of which are of a statistical nature. And indeed, this picture continues to make the subject attractive to a growing number of statistical physicists.
Genome editing reveals a role for OCT4 in human embryogenesis.
Fogarty, Norah M E; McCarthy, Afshan; Snijders, Kirsten E; Powell, Benjamin E; Kubikova, Nada; Blakeley, Paul; Lea, Rebecca; Elder, Kay; Wamaitha, Sissy E; Kim, Daesik; Maciulyte, Valdone; Kleinjung, Jens; Kim, Jin-Soo; Wells, Dagan; Vallier, Ludovic; Bertero, Alessandro; Turner, James M A; Niakan, Kathy K
2017-10-05
Despite their fundamental biological and clinical importance, the molecular mechanisms that regulate the first cell fate decisions in the human embryo are not well understood. Here we use CRISPR-Cas9-mediated genome editing to investigate the function of the pluripotency transcription factor OCT4 during human embryogenesis. We identified an efficient OCT4-targeting guide RNA using an inducible human embryonic stem cell-based system and microinjection of mouse zygotes. Using these refined methods, we efficiently and specifically targeted the gene encoding OCT4 (POU5F1) in diploid human zygotes and found that blastocyst development was compromised. Transcriptomics analysis revealed that, in POU5F1-null cells, gene expression was downregulated not only for extra-embryonic trophectoderm genes, such as CDX2, but also for regulators of the pluripotent epiblast, including NANOG. By contrast, Pou5f1-null mouse embryos maintained the expression of orthologous genes, and blastocyst development was established, but maintenance was compromised. We conclude that CRISPR-Cas9-mediated genome editing is a powerful method for investigating gene function in the context of human development.
Biocuration at the Saccharomyces genome database.
Skrzypek, Marek S; Nash, Robert S
2015-08-01
Saccharomyces Genome Database is an online resource dedicated to managing information about the biology and genetics of the model organism, yeast (Saccharomyces cerevisiae). This information is derived primarily from scientific publications through a process of human curation that involves manual extraction of data and their organization into a comprehensive system of knowledge. This system provides a foundation for further analysis of experimental data coming from research on yeast as well as other organisms. In this review we will demonstrate how biocuration and biocurators add a key component, the biological context, to our understanding of how genes, proteins, genomes and cells function and interact. We will explain the role biocurators play in sifting through the wealth of biological data to incorporate and connect key information. We will also discuss the many ways we assist researchers with their various research needs. We hope to convince the reader that manual curation is vital in converting the flood of data into organized and interconnected knowledge, and that biocurators play an essential role in the integration of scientific information into a coherent model of the cell. © 2015 Wiley Periodicals, Inc.
Biocuration at the Saccharomyces Genome Database
Skrzypek, Marek S.; Nash, Robert S.
2015-01-01
Saccharomyces Genome Database is an online resource dedicated to managing information about the biology and genetics of the model organism, yeast (Saccharomyces cerevisiae). This information is derived primarily from scientific publications through a process of human curation that involves manual extraction of data and their organization into a comprehensive system of knowledge. This system provides a foundation for further analysis of experimental data coming from research on yeast as well as other organisms. In this review we will demonstrate how biocuration and biocurators add a key component, the biological context, to our understanding of how genes, proteins, genomes and cells function and interact. We will explain the role biocurators play in sifting through the wealth of biological data to incorporate and connect key information. We will also discuss the many ways we assist researchers with their various research needs. We hope to convince the reader that manual curation is vital in converting the flood of data into organized and interconnected knowledge, and that biocurators play an essential role in the integration of scientific information into a coherent model of the cell. PMID:25997651
Closing the gap between knowledge and clinical application: challenges for genomic translation.
Burke, Wylie; Korngiebel, Diane M
2015-01-01
Despite early predictions and rapid progress in research, the introduction of personal genomics into clinical practice has been slow. Several factors contribute to this translational gap between knowledge and clinical application. The evidence available to support genetic test use is often limited, and implementation of new testing programs can be challenging. In addition, the heterogeneity of genomic risk information points to the need for strategies to select and deliver the information most appropriate for particular clinical needs. Accomplishing these tasks also requires recognition that some expectations for personal genomics are unrealistic, notably expectations concerning the clinical utility of genomic risk assessment for common complex diseases. Efforts are needed to improve the body of evidence addressing clinical outcomes for genomics, apply implementation science to personal genomics, and develop realistic goals for genomic risk assessment. In addition, translational research should emphasize the broader benefits of genomic knowledge, including applications of genomic research that provide clinical benefit outside the context of personal genomic risk.
A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data.
Bertl, Johanna; Guo, Qianyun; Juul, Malene; Besenbacher, Søren; Nielsen, Morten Muhlig; Hornshøj, Henrik; Pedersen, Jakob Skou; Hobolth, Asger
2018-04-19
Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures. We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.
Haspel, Richard L.; Olsen, Randall J.; Berry, Anna; Hill, Charles E.; Pfeifer, John D.; Schrijver, Iris; Kaul, Karen L.
2014-01-01
Context Genomic medicine is revolutionizing patient care. Physicians in areas as diverse as oncology, obstetrics, and infectious disease have begun using next-generation sequencing assays as standard diagnostic tools. Objective To review the role of pathologists in genomic testing as well as current educational programs and future training needs in genomic pathology. Data Sources Published literature as well as personal experience based on committee membership and genomic pathology curricular design. Conclusion Pathologists, as the directors of the clinical laboratories, must be prepared to integrate genomic testing into their practice. The pathology community has made significant progress in genomics-related education. A continued coordinated and proactive effort will ensure a future vital role for pathologists in the evolving health care system and also the best possible patient care. PMID:24678680
The interface of genomic technologies and nursing.
Loescher, Lois J; Merkle, Carrie J
2005-01-01
(a) to summarize views of the interface of technology, genomic technology, and nursing; (b) provide an overview of current and emerging genomic technologies; (c) present clinical exemplars of uses of genomic technology in two disease conditions; and (d) list genomic-focused nursing research on genomic technologies. A discussion of genomic technology in the context of nurses' views of technology, the importance of genomic technology for nurses, linking the central dogma of molecular biology to state-of-the-art tests and assays, and nurses' current use of technologies. Human genome discoveries will continue to be an integral part of disease prevention, diagnosis, treatment, and management. These discoveries also have the potential for being integrated into nursing science. Genomic technologies are becoming a driving force in patient management, so that nurses will be unable to provide quality care without knowledge of the types of genomic technologies, the rationale for their use, and the possible sequelae that can result from genetic diagnosis or treatment. Many nurses already are using genomic technologies to conduct genomic-focused nursing research. The biobehavioral nature of much of this research further indicates the important contributions of nurses in genomics.
Wu, Xiao-Lin; Sun, Chuanyu; Beissinger, Timothy M; Rosa, Guilherme Jm; Weigel, Kent A; Gatti, Natalia de Leon; Gianola, Daniel
2012-09-25
Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs.
2012-01-01
Background Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Results Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Conclusions Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs. PMID:23009363
2013-01-01
Background Sympatric species pairs are particularly common in freshwater fishes associated with postglacial lakes in northern temperate environments. The nature of divergences between co-occurring sympatric species, factors contributing to reproductive isolation and modes of genome evolution is a much debated topic in evolutionary biology addressed by various experimental tools. To the best of our knowledge, nobody approached this field using molecular cytogenetics. We examined chromosomes and genomes of one postglacial species pair, sympatric European winter-spawning Coregonus albula and the local endemic dwarf-sized spring-spawning C. fontanae, both originating in Lake Stechlin. We have employed molecular cytogenetic tools to identify the genomic differences between the two species of the sympatric pair on the sub-chromosomal level of resolution. Results Fluorescence in situ hybridization (FISH) experiments consistently revealed a distinct variation in the copy number of loci of the major ribosomal DNA (the 45S unit) between C. albula and C. fontanae genomes. In C. fontanae, up to 40 chromosomes were identified to bear a part of the major ribosomal DNA, while in C. albula only 8–10 chromosomes possessed these genes. To determine mechanisms how such extensive genome alternation might have arisen, a PCR screening for retrotransposons from genomic DNA of both species was performed. The amplified retrotransposon Rex1 was used as a probe for FISH mapping onto chromosomes of both species. These experiments showed a clear co-localization of the ribosomal DNA and the retrotransposon Rex1 in a pericentromeric region of one or two acrocentric chromosomes in both species. Conclusion We demonstrated genomic consequences of a rapid ecological speciation on the level undetectable by neither sequence nor karyotype analysis. We provide indirect evidence that ribosomal DNA probably utilized the spreading mechanism of retrotransposons subsequently affecting recombination rates in both genomes, thus, leading to a rapid genome divergence. We attribute these extensive genome re-arrangements associated with speciation event to stress-induced retrotransposons (re)activation. Such causal interplay between genome differentiation, retrotransposons (re)activation and environmental conditions may become a topic to be explored in a broader genomic context in future evolutionary studies. PMID:23410024
Tang, Guo-Qing; Maxwell, E. Stuart
2008-01-01
The amphibian Xenopus provides a model organism for investigating microRNA expression during vertebrate embryogenesis and development. Searching available Xenopus genome databases using known human pre-miRNAs as query sequences, more than 300 genes encoding 142 Xenopus tropicalis miRNAs were identified. Analysis of Xenopus tropicalis miRNA genes revealed a predominate positioning within introns of protein-coding and nonprotein-coding RNA Pol II-transcribed genes. MiRNA genes were also located in pre-mRNA exons and positioned intergenically between known protein-coding genes. Many miRNA species were found in multiple locations and in more than one genomic context. MiRNA genes were also clustered throughout the genome, indicating the potential for the cotranscription and coordinate expression of miRNAs located in a given cluster. Northern blot analysis confirmed the expression of many identified miRNAs in both X. tropicalis and X. laevis. Comparison of X. tropicalis and X. laevis blots revealed comparable expression profiles, although several miRNAs exhibited species-specific expression in different tissues. More detailed analysis revealed that for some miRNAs, the tissue-specific expression profile of the pri-miRNA precursor was distinctly different from that of the mature miRNA profile. Differential miRNA precursor processing in both the nucleus and cytoplasm was implicated in the observed tissue-specific differences. These observations indicated that post-transcriptional processing plays an important role in regulating miRNA expression in the amphibian Xenopus. PMID:18032731
Panczyk, Mariusz
2013-01-01
Nowadays nutrigenetics and nutrigenomics are perceived as one of the most important research areas ensuring better understanding of an impact of nutrition on human health. Since such researches are interdisciplinary in type, there is a problem with their widespread acceptance and practical clinical application of obtained results. Understanding the new ideas and hypotheses published in researches on nutrigenetics/nutrigenomics requires some knowledge of genetics, biochemistry, molecular biology, and capabilities and limitations that are associated with the use of statistical and bioinformatic analysis, and above all omics research technologies (genomics, transcriptomics, proteomics, metabolomics). Highly efficient genome and proteome analysis techniques allow to obtain data necessary for profiling of an individual patient. The main problem is still our insufficient knowledge of cell physiology and biochemistry. The vast amount of information is obtained with the use of omics technologies what makes it difficult to interpret and infer. An unquestionable advantage of this type of research is the possibility to utilize system analysis (system biology) which is important in the context of a holistic interpretation of biological phenomena. This review is an attempt to present the main hypotheses and objectives which are carried out by researchers in nutrigenetics/nutrigenomics. This article describes the most important directions of research and anticipated results that are related to the practical use of nutritional genomics as well as the critical assessment of the possible impact of future developments on public health.
Opinion piece: genomics and crop plant science in Europe.
Hughes, Steve
2006-01-01
Recent report reviews and funding initiatives in the field of plant genomic research are considered in the context of their translation into practical and economic value via plant breeding. It is concluded that there is a deficit in investment and that a change in working styles towards knowledge sharing and connectivity is required.
Fast neutron mutants database and web displays at SoyBase
USDA-ARS?s Scientific Manuscript database
SoyBase, the USDA-ARS soybean genetics and genomics database, has been expanded to include data for the fast neutron mutants produced by Bolon, Vance, et al. In addition to the expected text and sequence homology searches and visualization of the indels in the context of the genome sequence viewer, ...
The UCSC genome browser: what every molecular biologist should know.
Mangan, Mary E; Williams, Jennifer M; Kuhn, Robert M; Lathe, Warren C
2009-10-01
Electronic data resources can enable molecular biologists to query and display many useful features that make benchwork more efficient and drive new discoveries. The UCSC Genome Browser provides a wealth of data and tools that advance one's understanding of genomic context for many species, enable detailed understanding of data, and provide the ability to interrogate regions of interest. Researchers can also supplement the standard display with their own data to query and share with others. Effective use of these resources has become crucial to biological research today, and this unit describes some practical applications of the UCSC Genome Browser.
A decision tool to guide the ethics review of a challenging breed of emerging genomic projects.
Joly, Yann; So, Derek; Osien, Gladys; Crimi, Laura; Bobrow, Martin; Chalmers, Don; Wallace, Susan E; Zeps, Nikolajs; Knoppers, Bartha
2016-08-01
Recent projects conducted by the International Cancer Genome Consortium (ICGC) have raised the important issue of distinguishing quality assurance (QA) activities from research in the context of genomics. Research was historically defined as a systematic effort to expand a shared body of knowledge, whereas QA was defined as an effort to ascertain whether a specific project met desired standards. However, the two categories increasingly overlap due to advances in bioinformatics and the shift toward open science. As few ethics review policies take these changes into account, it is often difficult to determine the appropriate level of review. Mislabeling can result in unnecessary burdens for the investigators or, conversely, in underestimation of the risks to participants. Therefore, it is important to develop a consistent method of selecting the review process for genomics and bioinformatics projects. This paper begins by discussing two case studies from the ICGC, followed by a literature review on the distinction between QA and research and a comparative analysis of ethics review policies from Canada, the United States, the United Kingdom, and Australia. These results are synthesized into a novel two-step decision tool for researchers and policymakers, which uses traditional criteria to sort clearly defined activities while requiring the use of actual risk levels to decide more complex cases.
Abebe-Akele, Feseha; Tisa, Louis S; Cooper, Vaughn S; Hatcher, Philip J; Abebe, Eyualem; Thomas, W Kelley
2015-07-18
Entomopathogenic associations between nematodes in the genera Steinernema and Heterorhabdus with their cognate bacteria from the bacterial genera Xenorhabdus and Photorhabdus, respectively, are extensively studied for their potential as biological control agents against invasive insect species. These two highly coevolved associations were results of convergent evolution. Given the natural abundance of bacteria, nematodes and insects, it is surprising that only these two associations with no intermediate forms are widely studied in the entomopathogenic context. Discovering analogous systems involving novel bacterial and nematode species would shed light on the evolutionary processes involved in the transition from free living organisms to obligatory partners in entomopathogenicity. We report the complete genome sequence of a new member of the enterobacterial genus Serratia that forms a putative entomopathogenic complex with Caenorhabditis briggsae. Analysis of the 5.04 MB chromosomal genome predicts 4599 protein coding genes, seven sets of ribosomal RNA genes, 84 tRNA genes and a 64.8 KB plasmid encoding 74 genes. Comparative genomic analysis with three of the previously sequenced Serratia species, S. marcescens DB11 and S. proteamaculans 568, and Serratia sp. AS12, revealed that these four representatives of the genus share a core set of ~3100 genes and extensive structural conservation. The newly identified species shares a more recent common ancestor with S. marcescens with 99% sequence identity in rDNA sequence and orthology across 85.6% of predicted genes. Of the 39 genes/operons implicated in the virulence, symbiosis, recolonization, immune evasion and bioconversion, 21 (53.8%) were present in Serratia while 33 (84.6%) and 35 (89%) were present in Xenorhabdus and Photorhabdus EPN bacteria respectively. The majority of unique sequences in Serratia sp. SCBI (South African Caenorhabditis briggsae Isolate) are found in ~29 genomic islands of 5 to 65 genes and are enriched in putative functions that are biologically relevant to an entomopathogenic lifestyle, including non-ribosomal peptide synthetases, bacteriocins, fimbrial biogenesis, ushering proteins, toxins, secondary metabolite secretion and multiple drug resistance/efflux systems. By revealing the early stages of adaptation to this lifestyle, the Serratia sp. SCBI genome underscores the fact that in EPN formation the composite end result - killing, bioconversion, cadaver protection and recolonization- can be achieved by dissimilar mechanisms. This genome sequence will enable further study of the evolution of entomopathogenic nematode-bacteria complexes.
Pathway Analysis in Attention Deficit Hyperactivity Disorder: An Ensemble Approach
Mooney, Michael A.; McWeeney, Shannon K.; Faraone, Stephen V.; Hinney, Anke; Hebebrand, Johannes; Nigg, Joel T.; Wilmot, Beth
2016-01-01
Despite a wealth of evidence for the role of genetics in attention deficit hyperactivity disorder (ADHD), specific and definitive genetic mechanisms have not been identified. Pathway analyses, a subset of gene-set analyses, extend the knowledge gained from genome-wide association studies (GWAS) by providing functional context for genetic associations. However, there are numerous methods for association testing of gene sets and no real consensus regarding the best approach. The present study applied six pathway analysis methods to identify pathways associated with ADHD in two GWAS datasets from the Psychiatric Genomics Consortium. Methods that utilize genotypes to model pathway-level effects identified more replicable pathway associations than methods using summary statistics. In addition, pathways implicated by more than one method were significantly more likely to replicate. A number of brain-relevant pathways, such as RhoA signaling, glycosaminoglycan biosynthesis, fibroblast growth factor receptor activity, and pathways containing potassium channel genes, were nominally significant by multiple methods in both datasets. These results support previous hypotheses about the role of regulation of neurotransmitter release, neurite outgrowth and axon guidance in contributing to the ADHD phenotype and suggest the value of cross-method convergence in evaluating pathway analysis results. PMID:27004716
Aberration hubs in protein interaction networks highlight actionable targets in cancer.
Karimzadeh, Mehran; Jandaghi, Pouria; Papadakis, Andreas I; Trainor, Sebastian; Rung, Johan; Gonzàlez-Porta, Mar; Scelo, Ghislaine; Vasudev, Naveen S; Brazma, Alvis; Huang, Sidong; Banks, Rosamonde E; Lathrop, Mark; Najafabadi, Hamed S; Riazalhosseini, Yasser
2018-05-18
Despite efforts for extensive molecular characterization of cancer patients, such as the international cancer genome consortium (ICGC) and the cancer genome atlas (TCGA), the heterogeneous nature of cancer and our limited knowledge of the contextual function of proteins have complicated the identification of targetable genes. Here, we present Aberration Hub Analysis for Cancer (AbHAC) as a novel integrative approach to pinpoint aberration hubs, i.e. individual proteins that interact extensively with genes that show aberrant mutation or expression. Our analysis of the breast cancer data of the TCGA and the renal cancer data from the ICGC shows that aberration hubs are involved in relevant cancer pathways, including factors promoting cell cycle and DNA replication in basal-like breast tumors, and Src kinase and VEGF signaling in renal carcinoma. Moreover, our analysis uncovers novel functionally relevant and actionable targets, among which we have experimentally validated abnormal splicing of spleen tyrosine kinase as a key factor for cell proliferation in renal cancer. Thus, AbHAC provides an effective strategy to uncover novel disease factors that are only identifiable by examining mutational and expression data in the context of biological networks.
Precision Medicine: Functional Advancements.
Caskey, Thomas
2018-01-29
Precision medicine was conceptualized on the strength of genomic sequence analysis. High-throughput functional metrics have enhanced sequence interpretation and clinical precision. These technologies include metabolomics, magnetic resonance imaging, and I rhythm (cardiac monitoring), among others. These technologies are discussed and placed in clinical context for the medical specialties of internal medicine, pediatrics, obstetrics, and gynecology. Publications in these fields support the concept of a higher level of precision in identifying disease risk. Precise disease risk identification has the potential to enable intervention with greater specificity, resulting in disease prevention-an important goal of precision medicine.
Evolution and genome architecture in fungal plant pathogens.
Möller, Mareike; Stukenbrock, Eva H
2017-12-01
The fungal kingdom comprises some of the most devastating plant pathogens. Sequencing the genomes of fungal pathogens has shown a remarkable variability in genome size and architecture. Population genomic data enable us to understand the mechanisms and the history of changes in genome size and adaptive evolution in plant pathogens. Although transposable elements predominantly have negative effects on their host, fungal pathogens provide prominent examples of advantageous associations between rapidly evolving transposable elements and virulence genes that cause variation in virulence phenotypes. By providing homogeneous environments at large regional scales, managed ecosystems, such as modern agriculture, can be conducive for the rapid evolution and dispersal of pathogens. In this Review, we summarize key examples from fungal plant pathogen genomics and discuss evolutionary processes in pathogenic fungi in the context of molecular evolution, population genomics and agriculture.
Personal Genome Sequencing in Ostensibly Healthy Individuals and the PeopleSeq Consortium
Linderman, Michael D.; Nielsen, Daiva E.; Green, Robert C.
2016-01-01
Thousands of ostensibly healthy individuals have had their exome or genome sequenced, but a much smaller number of these individuals have received any personal genomic results from that sequencing. We term those projects in which ostensibly healthy participants can receive sequencing-derived genetic findings and may also have access to their genomic data as participatory predispositional personal genome sequencing (PPGS). Here we are focused on genome sequencing applied in a pre-symptomatic context and so define PPGS to exclude diagnostic genome sequencing intended to identify the molecular cause of suspected or diagnosed genetic disease. In this report we describe the design of completed and underway PPGS projects, briefly summarize the results reported to date and introduce the PeopleSeq Consortium, a newly formed collaboration of PPGS projects designed to collect much-needed longitudinal outcome data. PMID:27023617
Harnessing CRISPR-Cas systems for bacterial genome editing.
Selle, Kurt; Barrangou, Rodolphe
2015-04-01
Manipulation of genomic sequences facilitates the identification and characterization of key genetic determinants in the investigation of biological processes. Genome editing via clustered regularly interspaced short palindromic repeats (CRISPR)-CRISPR-associated (Cas) constitutes a next-generation method for programmable and high-throughput functional genomics. CRISPR-Cas systems are readily reprogrammed to induce sequence-specific DNA breaks at target loci, resulting in fixed mutations via host-dependent DNA repair mechanisms. Although bacterial genome editing is a relatively unexplored and underrepresented application of CRISPR-Cas systems, recent studies provide valuable insights for the widespread future implementation of this technology. This review summarizes recent progress in bacterial genome editing and identifies fundamental genetic and phenotypic outcomes of CRISPR targeting in bacteria, in the context of tool development, genome homeostasis, and DNA repair. Copyright © 2015 Elsevier Ltd. All rights reserved.
40 years of bovine IVF in the new genomic selection context.
Sirard, Marc-Andre
2018-04-10
The development of a complex technology such as in vitro fertilization (IVF) requires years of experimentation, sometimes comparing several species to learn how to create the right in vitro environment for oocytes, spermatozoa, and early embryos. At the same time, individual species characteristics such as gamete physiology and gamete interaction are recently evolved traits and must be analysed within the context of each species. In the last 40 years since the birth of Louise Brown, IVF techniques progressed and are now used in multiple domestic and non-domestic animal species around the world. This does not mean that the technology is completely matured or satisfactory; a number of problems remain to be solved and several procedures still need to be optimized. The development of IVF in cattle is particularly interesting since agriculture practices permitted the commercial development of the procedure and it is now used at a scale comparable to human IVF (millions of newborns). The genomic selection of young animals or even embryos combined with sexing and freezing technologies is driving a new era of IVF in the Dairy sector. The time has come for a retrospective analysis of the success and pitfalls of the last 40 years of bovine IVF and for the description of the challenges to overcome in the years to come.
Huang, Dandan; Yi, Xianfu; Zhang, Shijie; Zheng, Zhanye; Wang, Panwen; Xuan, Chenghao; Sham, Pak Chung; Wang, Junwen; Li, Mulin Jun
2018-05-16
Genome-wide association studies have generated over thousands of susceptibility loci for many human complex traits, and yet for most of these associations the true causal variants remain unknown. Tissue/cell type-specific prediction and prioritization of non-coding regulatory variants will facilitate the identification of causal variants and underlying pathogenic mechanisms for particular complex diseases and traits. By leveraging recent large-scale functional genomics/epigenomics data, we develop an intuitive web server, GWAS4D (http://mulinlab.tmu.edu.cn/gwas4d or http://mulinlab.org/gwas4d), that systematically evaluates GWAS signals and identifies context-specific regulatory variants. The updated web server includes six major features: (i) updates the regulatory variant prioritization method with our new algorithm; (ii) incorporates 127 tissue/cell type-specific epigenomes data; (iii) integrates motifs of 1480 transcriptional regulators from 13 public resources; (iv) uniformly processes Hi-C data and generates significant interactions at 5 kb resolution across 60 tissues/cell types; (v) adds comprehensive non-coding variant functional annotations; (vi) equips a highly interactive visualization function for SNP-target interaction. Using a GWAS fine-mapped set for 161 coronary artery disease risk loci, we demonstrate that GWAS4D is able to efficiently prioritize disease-causal regulatory variants.
Glinsky, Gennadi V
2018-03-01
Transposable elements have made major evolutionary impacts on creation of primate-specific and human-specific genomic regulatory loci and species-specific genomic regulatory networks (GRNs). Molecular and genetic definitions of human-specific changes to GRNs contributing to development of unique to human phenotypes remain a highly significant challenge. Genome-wide proximity placement analysis of diverse families of human-specific genomic regulatory loci (HSGRL) identified topologically associating domains (TADs) that are significantly enriched for HSGRL and designated rapidly evolving in human TADs. Here, the analysis of HSGRL, hESC-enriched enhancers, super-enhancers (SEs), and specific sub-TAD structures termed super-enhancer domains (SEDs) has been performed. In the hESC genome, 331 of 504 (66%) of SED-harboring TADs contain HSGRL and 68% of SEDs co-localize with HSGRL, suggesting that emergence of HSGRL may have rewired SED-associated GRNs within specific TADs by inserting novel and/or erasing existing non-coding regulatory sequences. Consequently, markedly distinct features of the principal regulatory structures of interphase chromatin evolved in the hESC genome compared to mouse: the SED quantity is 3-fold higher and the median SED size is significantly larger. Concomitantly, the overall TAD quantity is increased by 42% while the median TAD size is significantly decreased (p = 9.11E-37) in the hESC genome. Present analyses illustrate a putative global role for transposable elements and HSGRL in shaping the human-specific features of the interphase chromatin organization and functions, which are facilitated by accelerated creation of novel transcription factor binding sites and new enhancers driven by targeted placement of HSGRL at defined genomic coordinates. A trend toward the convergence of TAD and SED architectures of interphase chromatin in the hESC genome may reflect changes of 3D-folding patterns of linear chromatin fibers designed to enhance both regulatory complexity and functional precision of GRNs by creating predominantly a single gene (or a set of functionally linked genes) per regulatory domain structures. Collectively, present analyses reveal critical evolutionary contributions of transposable elements and distal enhancers to creation of thousands primate- and human-specific elements of a chromatin folding code, which defines the 3D context of interphase chromatin both restricting and facilitating biological functions of GRNs.
de Vries, Jantina; Munung, Syntia Nchangwi; Matimba, Alice; McCurdy, Sheryl; Ouwe Missi Oukem-Boyer, Odile; Staunton, Ciara; Yakubu, Aminu; Tindana, Paulina
2017-02-02
The introduction of genomics and biobanking methodologies to the African research context has also introduced novel ways of doing science, based on values of sharing and reuse of data and samples. This shift raises ethical challenges that need to be considered when research is reviewed by ethics committees, relating for instance to broad consent, the feedback of individual genetic findings, and regulation of secondary sample access and use. Yet existing ethics guidelines and regulations in Africa do not successfully regulate research based on sharing, causing confusion about what is allowed, where and when. In order to understand better the ethics regulatory landscape around genomic research and biobanking, we conducted a comprehensive analysis of existing ethics guidelines, policies and other similar sources. We sourced 30 ethics regulatory documents from 22 African countries. We used software that assists with qualitative data analysis to conduct a thematic analysis of these documents. Surprisingly considering how contentious broad consent is in Africa, we found that most countries allow the use of this consent model, with its use banned in only three of the countries we investigated. In a likely response to fears about exploitation, the export of samples outside of the continent is strictly regulated, sometimes in conjunction with regulations around international collaboration. We also found that whilst an essential and critical component of ensuring ethical best practice in genomics research relates to the governance framework that accompanies sample and data sharing, this was most sparingly covered in the guidelines. There is a need for ethics guidelines in African countries to be adapted to the changing science policy landscape, which increasingly supports principles of openness, storage, sharing and secondary use. Current guidelines are not pertinent to the ethical challenges that such a new orientation raises, and therefore fail to provide accurate guidance to ethics committees and researchers.
Divergent genome evolution caused by regional variation in DNA gain and loss between human and mouse
Kortschak, R. Daniel
2018-01-01
The forces driving the accumulation and removal of non-coding DNA and ultimately the evolution of genome size in complex organisms are intimately linked to genome structure and organisation. Our analysis provides a novel method for capturing the regional variation of lineage-specific DNA gain and loss events in their respective genomic contexts. To further understand this connection we used comparative genomics to identify genome-wide individual DNA gain and loss events in the human and mouse genomes. Focusing on the distribution of DNA gains and losses, relationships to important structural features and potential impact on biological processes, we found that in autosomes, DNA gains and losses both followed separate lineage-specific accumulation patterns. However, in both species chromosome X was particularly enriched for DNA gain, consistent with its high L1 retrotransposon content required for X inactivation. We found that DNA loss was associated with gene-rich open chromatin regions and DNA gain events with gene-poor closed chromatin regions. Additionally, we found that DNA loss events tended to be smaller than DNA gain events suggesting that they were able to accumulate in gene-rich open chromatin regions due to their reduced capacity to interrupt gene regulatory architecture. GO term enrichment showed that mouse loss hotspots were strongly enriched for terms related to developmental processes. However, these genes were also located in regions with a high density of conserved elements, suggesting that despite high levels of DNA loss, gene regulatory architecture remained conserved. This is consistent with a model in which DNA gain and loss results in turnover or “churning” in regulatory element dense regions of open chromatin, where interruption of regulatory elements is selected against. PMID:29677183
Inferring causal genomic alterations in breast cancer using gene expression data
2011-01-01
Background One of the primary objectives in cancer research is to identify causal genomic alterations, such as somatic copy number variation (CNV) and somatic mutations, during tumor development. Many valuable studies lack genomic data to detect CNV; therefore, methods that are able to infer CNVs from gene expression data would help maximize the value of these studies. Results We developed a framework for identifying recurrent regions of CNV and distinguishing the cancer driver genes from the passenger genes in the regions. By inferring CNV regions across many datasets we were able to identify 109 recurrent amplified/deleted CNV regions. Many of these regions are enriched for genes involved in many important processes associated with tumorigenesis and cancer progression. Genes in these recurrent CNV regions were then examined in the context of gene regulatory networks to prioritize putative cancer driver genes. The cancer driver genes uncovered by the framework include not only well-known oncogenes but also a number of novel cancer susceptibility genes validated via siRNA experiments. Conclusions To our knowledge, this is the first effort to systematically identify and validate drivers for expression based CNV regions in breast cancer. The framework where the wavelet analysis of copy number alteration based on expression coupled with the gene regulatory network analysis, provides a blueprint for leveraging genomic data to identify key regulatory components and gene targets. This integrative approach can be applied to many other large-scale gene expression studies and other novel types of cancer data such as next-generation sequencing based expression (RNA-Seq) as well as CNV data. PMID:21806811
Quan, Phenix-Lan; Junglen, Sandra; Tashmukhamedova, Alla; Conlan, Sean; Hutchison, Stephen K.; Kurth, Andreas; Ellerbrok, Heinz; Egholm, Michael; Briese, Thomas; Leendertz, Fabian H.; Ian Lipkin, W
2009-01-01
Characterization of arboviruses at the interface of pristine habitats and anthropogenic landscapes is crucial to comprehensive emergent disease surveillance and forecasting efforts. In context of surveillance campaign in and around a West African rainforest, particles morphologically consistent with rhabdoviruses were identified in cell cultures infected with homogenates of trapped mosquitoes. RNA recovered from these cultures was used to derive the first complete genome sequence of a rhabdovirus isolated from Culex decens mosquitoes in Côte d’Ivoire, tentatively named Moussa virus (MOUV). MOUV shows the classical genome organization of rhabdoviruses, with five open reading frames (ORF) in a linear order. However, sequences show only limited conservation (12–33% identity at amino acid level), and ORF2 and ORF3 have no significant similarity to sequences deposited in GenBank. Phylogenetic analysis indicates a potential new species with distant relationship to Tupaia and Tibrogargan virus. PMID:19804801
Resources, challenges and way forward in rare mitochondrial diseases research
Rajput, Neeraj Kumar; Singh, Vipin; Bhardwaj, Anshu
2015-01-01
Over 300 million people are affected by about 7000 rare diseases globally. There are tremendous resource limitations and challenges in driving research and drug development for rare diseases. Hence, innovative approaches are needed to identify potential solutions. This review focuses on the resources developed over the past years for analysis of genome data towards understanding disease biology especially in the context of mitochondrial diseases, given that mitochondria are central to major cellular pathways and their dysfunction leads to a broad spectrum of diseases. Platforms for collaboration of research groups, clinicians and patients and the advantages of community collaborative efforts in addressing rare diseases are also discussed. The review also describes crowdsourcing and crowdfunding efforts in rare diseases research and how the upcoming initiatives for understanding disease biology including analyses of large number of genomes are also applicable to rare diseases. PMID:26180633
Quan, Phenix-Lan; Junglen, Sandra; Tashmukhamedova, Alla; Conlan, Sean; Hutchison, Stephen K; Kurth, Andreas; Ellerbrok, Heinz; Egholm, Michael; Briese, Thomas; Leendertz, Fabian H; Lipkin, W Ian
2010-01-01
Characterization of arboviruses at the interface of pristine habitats and anthropogenic landscapes is crucial to comprehensive emergent disease surveillance and forecasting efforts. In context of a surveillance campaign in and around a West African rainforest, particles morphologically consistent with rhabdoviruses were identified in cell cultures infected with homogenates of trapped mosquitoes. RNA recovered from these cultures was used to derive the first complete genome sequence of a rhabdovirus isolated from Culex decens mosquitoes in Côte d'Ivoire, tentatively named Moussa virus (MOUV). MOUV shows the classical genome organization of rhabdoviruses, with five open reading frames (ORF) in a linear order. However, sequences show only limited conservation (12-33% identity at amino acid level), and ORF2 and ORF3 have no significant similarity to sequences deposited in GenBank. Phylogenetic analysis indicates a potential new species with distant relationship to Tupaia and Tibrogargan virus.
Postdoctoral Fellow | Center for Cancer Research
A postdoctoral position is available in Dr. Efsun Arda’s Developmental Genomics Group within the Laboratory of Receptor Biology and Gene Expression Branch at the National Cancer Institute (NCI), National Institutes of Health (NIH). Our research is focused on understanding the regulatory networks that govern pancreas cell identity and function in the context of diabetes and cancer. The lab is highly interdisciplinary and uses state-of-the-art technologies to address outstanding questions in human pancreas biology. The appointment is renewed annually upon performance evaluation for a maximum of five years. The candidate will be fully funded by a competitive intramural Center for Cancer Research (CCR) fellowship. Other fellowship opportunities outside NIH are also available and applications will be supported. CCR provides a highly collaborative, enabling environment for research fellows with more than 40 core facilities ranging from bioinformatics and computing, chemistry and structural biology, flow cytometry, genomics, imaging and microscopy, pharmacology, proteomics and single cell analysis.
Douzery, Emmanuel J P; Scornavacca, Celine; Romiguier, Jonathan; Belkhir, Khalid; Galtier, Nicolas; Delsuc, Frédéric; Ranwez, Vincent
2014-07-01
Comparative genomic studies extensively rely on alignments of orthologous sequences. Yet, selecting, gathering, and aligning orthologous exons and protein-coding sequences (CDS) that are relevant for a given evolutionary analysis can be a difficult and time-consuming task. In this context, we developed OrthoMaM, a database of ORTHOlogous MAmmalian Markers describing the evolutionary dynamics of orthologous genes in mammalian genomes using a phylogenetic framework. Since its first release in 2007, OrthoMaM has regularly evolved, not only to include newly available genomes but also to incorporate up-to-date software in its analytic pipeline. This eighth release integrates the 40 complete mammalian genomes available in Ensembl v73 and provides alignments, phylogenies, evolutionary descriptor information, and functional annotations for 13,404 single-copy orthologous CDS and 6,953 long exons. The graphical interface allows to easily explore OrthoMaM to identify markers with specific characteristics (e.g., taxa availability, alignment size, %G+C, evolutionary rate, chromosome location). It hence provides an efficient solution to sample preprocessed markers adapted to user-specific needs. OrthoMaM has proven to be a valuable resource for researchers interested in mammalian phylogenomics, evolutionary genomics, and has served as a source of benchmark empirical data sets in several methodological studies. OrthoMaM is available for browsing, query and complete or filtered downloads at http://www.orthomam.univ-montp2.fr/. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Grandi, Nicole; Cadeddu, Marta; Blomberg, Jonas; Tramontano, Enzo
2016-09-09
Human endogenous retroviruses (HERVs) are ancient sequences integrated in the germ line cells and vertically transmitted through the offspring constituting about 8 % of our genome. In time, HERVs accumulated mutations that compromised their coding capacity. A prominent exception is HERV-W locus 7q21.2, producing a functional Env protein (Syncytin-1) coopted for placental syncytiotrophoblast formation. While expression of HERV-W sequences has been investigated for their correlation to disease, an exhaustive description of the group composition and characteristics is still not available and current HERV-W group information derive from studies published a few years ago that, of course, used the rough assemblies of the human genome available at that time. This hampers the comparison and correlation with current human genome assemblies. In the present work we identified and described in detail the distribution and genetic composition of 213 HERV-W elements. The bioinformatics analysis led to the characterization of several previously unreported features and provided a phylogenetic classification of two main subgroups with different age and structural characteristics. New facts on HERV-W genomic context of insertion and co-localization with sequences putatively involved in disease development are also reported. The present work is a detailed overview of the HERV-W contribution to the human genome and provides a robust genetic background useful to clarify HERV-W role in pathologies with poorly understood etiology, representing, to our knowledge, the most complete and exhaustive HERV-W dataset up to date.
A segmentation/clustering model for the analysis of array CGH data.
Picard, F; Robin, S; Lebarbier, E; Daudin, J-J
2007-09-01
Microarray-CGH (comparative genomic hybridization) experiments are used to detect and map chromosomal imbalances. A CGH profile can be viewed as a succession of segments that represent homogeneous regions in the genome whose representative sequences share the same relative copy number on average. Segmentation methods constitute a natural framework for the analysis, but they do not provide a biological status for the detected segments. We propose a new model for this segmentation/clustering problem, combining a segmentation model with a mixture model. We present a new hybrid algorithm called dynamic programming-expectation maximization (DP-EM) to estimate the parameters of the model by maximum likelihood. This algorithm combines DP and the EM algorithm. We also propose a model selection heuristic to select the number of clusters and the number of segments. An example of our procedure is presented, based on publicly available data sets. We compare our method to segmentation methods and to hidden Markov models, and we show that the new segmentation/clustering model is a promising alternative that can be applied in the more general context of signal processing.
Xu, Xiang-Ru Shannon; Gantz, Valentino Matteo; Siomava, Natalia; Bier, Ethan
2017-12-23
The knirps ( kni ) locus encodes transcription factors required for induction of the L2 wing vein in Drosophila . Here, we employ diverse CRISPR/Cas9 genome editing tools to generate a series of targeted lesions within the endogenous cis-regulatory module (CRM) required for kni expression in the L2 vein primordium. Phenotypic analysis of these ' in locus ' mutations based on both expression of Kni protein and adult wing phenotypes, reveals novel unexpected features of L2-CRM function including evidence for a chromosome pairing-dependent process that promotes transcription. We also demonstrate that self-propagating active genetic elements (CopyCat elements) can efficiently delete and replace the L2-CRM with orthologous sequences from other divergent fly species. Wing vein phenotypes resulting from these trans-species enhancer replacements parallel features of the respective donor fly species. This highly sensitive phenotypic readout of enhancer function in a native genomic context reveals novel features of CRM function undetected by traditional reporter gene analysis. © 2017, Xu et al.
Siomava, Natalia
2017-01-01
The knirps (kni) locus encodes transcription factors required for induction of the L2 wing vein in Drosophila. Here, we employ diverse CRISPR/Cas9 genome editing tools to generate a series of targeted lesions within the endogenous cis-regulatory module (CRM) required for kni expression in the L2 vein primordium. Phenotypic analysis of these ‘in locus’ mutations based on both expression of Kni protein and adult wing phenotypes, reveals novel unexpected features of L2-CRM function including evidence for a chromosome pairing-dependent process that promotes transcription. We also demonstrate that self-propagating active genetic elements (CopyCat elements) can efficiently delete and replace the L2-CRM with orthologous sequences from other divergent fly species. Wing vein phenotypes resulting from these trans-species enhancer replacements parallel features of the respective donor fly species. This highly sensitive phenotypic readout of enhancer function in a native genomic context reveals novel features of CRM function undetected by traditional reporter gene analysis. PMID:29274230
A comparative cellular and molecular biology of longevity database.
Stuart, Jeffrey A; Liang, Ping; Luo, Xuemei; Page, Melissa M; Gallagher, Emily J; Christoff, Casey A; Robb, Ellen L
2013-10-01
Discovering key cellular and molecular traits that promote longevity is a major goal of aging and longevity research. One experimental strategy is to determine which traits have been selected during the evolution of longevity in naturally long-lived animal species. This comparative approach has been applied to lifespan research for nearly four decades, yielding hundreds of datasets describing aspects of cell and molecular biology hypothesized to relate to animal longevity. Here, we introduce a Comparative Cellular and Molecular Biology of Longevity Database, available at ( http://genomics.brocku.ca/ccmbl/ ), as a compendium of comparative cell and molecular data presented in the context of longevity. This open access database will facilitate the meta-analysis of amalgamated datasets using standardized maximum lifespan (MLSP) data (from AnAge). The first edition contains over 800 data records describing experimental measurements of cellular stress resistance, reactive oxygen species metabolism, membrane composition, protein homeostasis, and genome homeostasis as they relate to vertebrate species MLSP. The purpose of this review is to introduce the database and briefly demonstrate its use in the meta-analysis of combined datasets.
Comparative genomics approaches to understanding and manipulating plant metabolism.
Bradbury, Louis M T; Niehaus, Tom D; Hanson, Andrew D
2013-04-01
Over 3000 genomes, including numerous plant genomes, are now sequenced. However, their annotation remains problematic as illustrated by the many conserved genes with no assigned function, vague annotations such as 'kinase', or even wrong ones. Around 40% of genes of unknown function that are conserved between plants and microbes are probably metabolic enzymes or transporters; finding functions for these genes is a major challenge. Comparative genomics has correctly predicted functions for many such genes by analyzing genomic context, and gene fusions, distributions and co-expression. Comparative genomics complements genetic and biochemical approaches to dissect metabolism, continues to increase in power and decrease in cost, and has a pivotal role in modeling and engineering by helping identify functions for all metabolic genes. Copyright © 2012 Elsevier Ltd. All rights reserved.
Advances in targeted genome editing.
Perez-Pinera, Pablo; Ousterout, David G; Gersbach, Charles A
2012-08-01
New technologies have recently emerged that enable targeted editing of genomes in diverse systems. This includes precise manipulation of gene sequences in their natural chromosomal context and addition of transgenes to specific genomic loci. This progress has been facilitated by advances in engineering targeted nucleases with programmable, site-specific DNA-binding domains, including zinc finger proteins and transcription activator-like effectors (TALEs). Recent improvements have enhanced nuclease performance, accelerated nuclease assembly, and lowered the cost of genome editing. These advances are driving new approaches to many areas of biotechnology, including biopharmaceutical production, agriculture, creation of transgenic organisms and cell lines, and studies of genome structure, regulation, and function. Genome editing is also being investigated in preclinical and clinical gene therapies for many diseases. Copyright © 2012 Elsevier Ltd. All rights reserved.
A mechanistic link between gene regulation and genome architecture in mammalian development.
Bonora, Giancarlo; Plath, Kathrin; Denholtz, Matthew
2014-08-01
The organization of chromatin within the nucleus and the regulation of transcription are tightly linked. Recently, mechanisms underlying this relationship have been uncovered. By defining the organizational hierarchy of the genome, determining changes in chromatin organization associated with changes in cell identity, and describing chromatin organization within the context of linear genomic features (such as chromatin modifications and transcription factor binding) and architectural proteins (including Cohesin, CTCF, and Mediator), a new paradigm in genome biology was established wherein genomes are organized around gene regulatory factors that govern cell identity. As such, chromatin organization plays a central role in establishing and maintaining cell state during development, with gene regulation and genome organization being mutually dependent effectors of cell identity. Copyright © 2014 Elsevier Ltd. All rights reserved.
Longworth, Michelle S; Laimins, Laimonis A
2004-04-01
The E7 oncoprotein of high-risk human papillomaviruses (HPVs) binds to and alters the action of cell cycle regulatory proteins such as members of the retinoblastoma (Rb) family of proteins as well as the histone deacetylases (HDACs). To examine the significance of the binding of E7 to HDACs in the viral life cycle, a mutational analysis of the E7 open reading frame was performed in the context of the complete HPV type 31 (HPV-31) genome. Human foreskin keratinocytes were transfected with wild-type HPV-31 genomes or HPV-31 genomes containing mutations in HDAC binding sequences as well as in the C-terminal zinc finger-like domain, and stable cell lines were isolated. All mutant genomes, except those with E7 mutations in the HDAC binding site, were found to be stably maintained extrachromosomally at an early passage following transfection. Upon further passage in culture, genomes containing mutations to the Rb binding domain as well as the zinc finger-like region quickly lost the ability to maintain episomal genomes. Genomes containing mutations abolishing E7 binding to HDACs or to Rb or mutations to the zinc finger-like motifs failed to extend the life span of transfected keratinocytes and caused cells to arrest at the same time as the untransfected keratinocytes. When induced to differentiate by suspension in methylcellulose, cells maintaining genomes with mutations in the Rb binding domain or the zinc finger-like motifs were impaired in their abilities to activate late viral functions. This study demonstrates that the interaction of E7 with HDACs and the integrity of the zinc finger-like motifs are essential for extending the life span of keratinocytes and for stable maintenance of viral genomes.
Ghosh, Sujoy; Vivar, Juan; Nelson, Christopher P; Willenborg, Christina; Segrè, Ayellet V; Mäkinen, Ville-Petteri; Nikpay, Majid; Erdmann, Jeannette; Blankenberg, Stefan; O'Donnell, Christopher; März, Winfried; Laaksonen, Reijo; Stewart, Alexandre F R; Epstein, Stephen E; Shah, Svati H; Granger, Christopher B; Hazen, Stanley L; Kathiresan, Sekar; Reilly, Muredach P; Yang, Xia; Quertermous, Thomas; Samani, Nilesh J; Schunkert, Heribert; Assimes, Themistocles L; McPherson, Ruth
2015-07-01
Genome-wide association studies have identified multiple genetic variants affecting the risk of coronary artery disease (CAD). However, individually these explain only a small fraction of the heritability of CAD and for most, the causal biological mechanisms remain unclear. We sought to obtain further insights into potential causal processes of CAD by integrating large-scale GWA data with expertly curated databases of core human pathways and functional networks. Using pathways (gene sets) from Reactome, we carried out a 2-stage gene set enrichment analysis strategy. From a meta-analyzed discovery cohort of 7 CAD genome-wide association study data sets (9889 cases/11 089 controls), nominally significant gene sets were tested for replication in a meta-analysis of 9 additional studies (15 502 cases/55 730 controls) from the Coronary ARtery DIsease Genome wide Replication and Meta-analysis (CARDIoGRAM) Consortium. A total of 32 of 639 Reactome pathways tested showed convincing association with CAD (replication P<0.05). These pathways resided in 9 of 21 core biological processes represented in Reactome, and included pathways relevant to extracellular matrix (ECM) integrity, innate immunity, axon guidance, and signaling by PDRF (platelet-derived growth factor), NOTCH, and the transforming growth factor-β/SMAD receptor complex. Many of these pathways had strengths of association comparable to those observed in lipid transport pathways. Network analysis of unique genes within the replicated pathways further revealed several interconnected functional and topologically interacting modules representing novel associations (eg, semaphoring-regulated axonal guidance pathway) besides confirming known processes (lipid metabolism). The connectivity in the observed networks was statistically significant compared with random networks (P<0.001). Network centrality analysis (degree and betweenness) further identified genes (eg, NCAM1, FYN, FURIN, etc) likely to play critical roles in the maintenance and functioning of several of the replicated pathways. These findings provide novel insights into how genetic variation, interpreted in the context of biological processes and functional interactions among genes, may help define the genetic architecture of CAD. © 2015 American Heart Association, Inc.
Molecular biology of Group A Streptococcus and its implications in vaccine strategies.
Brahmadathan, N K
2017-01-01
Infections due to Streptococcus pyogenes and their complications are a problem of major concern in many countries, including India. Primary prophylaxis with benzathine penicillin is the key to control and prevent sequelae such as acute rheumatic fever and rheumatic heart disease (RF/RHD) or post-streptococcal glomerulonephritis (PSGN). Non-compliance to prophylaxis due to fear of injection and anaphylaxis is major issues in RF/RHD control in India and leads to continued high prevalence of infection and post-streptococcal sequelae. Differing reports on the efficacy of two weekly, three weekly or monthly injections raise questions on the actual dosages to be administered. Availability of more effective antibiotics with better dosages has replaced the use of penicillin; hence, companies are reluctant to manufacture penicillin preparations in India. It is in this context that a concept of a Group A streptococci vaccine is looked at and whether or not a globally designed vaccine will be useful in the Indian context. Modern molecular techniques and genomic analysis of S. pyogenes have identified many molecules as vaccine candidates among which the M-protein has attracted the most attention. High diversity of M (emm) types in endemic regions raises questions about the efficacy of such a vaccine. A recent 30-valent M-protein-based vaccine that elicits antibodies to homologous as well as non-vaccine M types looks promising. This review will discuss the genomics of S. pyogenes, the various candidate vaccine molecules and highlight their efficacy in the Indian context where control of post-streptococcal sequelae remains a challenge.
Publics and vaccinomics: beyond public understanding of science.
Einsiedel, Edna F
2011-09-01
Vaccines have been among the most effective tools for addressing global public health challenges. With the advent of genomics, novel approaches for vaccine discovery are opening up new opportunities for vaccine development and applications, particularly with the expectation of personalized vaccines and the possibility of addressing a broader range of infectious diseases. In this context, it is useful to reflect on the social contexts of vaccine development as these have been influenced by social, ethical, political challenges. This article discusses the historical context of vaccine controversies and factors that help explain public acceptance and resistance, illustrating that these challenges go well beyond simple public misunderstandings. The broader vaccine challenges evident along the innovation trajectory, from development to commercialization and implementation include problems in research and development, organizational issues, and legal and regulatory challenges that may collectively contribute to public resistance or confidence. The recent history of genomics provides further lessons that the developing field of vaccinomics can learn from.
A Genome-Wide Linkage Scan for Age at Menarche in Three Populations of European Descent
Anderson, Carl A.; Zhu, Gu; Falchi, Mario; van den Berg, Stéphanie M.; Treloar, Susan A.; Spector, Timothy D.; Martin, Nicholas G.; Boomsma, Dorret I.; Visscher, Peter M.; Montgomery, Grant W.
2008-01-01
Context: Age at menarche (AAM) is an important trait both biologically and socially, a clearly defined event in female pubertal development, and has been associated with many clinically significant phenotypes. Objective: The objective of the study was to identify genetic loci influencing variation in AAM in large population-based samples from three countries. Design/Participants: Recalled AAM data were collected from 13,697 individuals and 4,899 pseudoindependent sister-pairs from three different populations (Australia, The Netherlands, and the United Kingdom) by mailed questionnaire or interview. Genome-wide variance components linkage analysis was implemented on each sample individually and in combination. Results: The mean, sd, and heritability of AAM across the three samples was 13.1 yr, 1.5 yr, and 0.69, respectively. No loci were detected that reached genome-wide significance in the combined analysis, but a suggestive locus was detected on chromosome 12 (logarithm of the odds = 2.0). Three loci of suggestive significance were seen in the U.K. sample on chromosomes 1, 4, and 18 (logarithm of the odds = 2.4, 2.2 and 3.2, respectively). Conclusions: There was no evidence for common highly penetrant variants influencing AAM. Linkage and association suggest that one trait locus for AAM is located on chromosome 12, but further studies are required to replicate these results. PMID:18647812
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krishnakumar, Raga; Sinha, Anupama; Bird, Sara W.
Emerging sequencing technologies are allowing us to characterize environmental, clinical and laboratory samples with increasing speed and detail, including real-time analysis and interpretation of data. One example of this is being able to rapidly and accurately detect a wide range of pathogenic organisms, both in the clinic and the field. Genomes can have radically different GC content however, such that accurate sequence analysis can be challenging depending upon the technology used. Here, we have characterized the performance of the Oxford MinION nanopore sequencer for detection and evaluation of organisms with a range of genomic nucleotide bias. We have diagnosed themore » quality of base-calling across individual reads and discovered that the position within the read affects base-calling and quality scores. Finally, we have evaluated the performance of the current state-of-the-art neural network-based MinION basecaller, characterizing its behavior with respect to systemic errors as well as context- and sequence-specific errors. Overall, we present a detailed characterization the capabilities of the MinION in terms of generating high-accuracy sequence data from genomes with a wide range of nucleotide content. This study provides a framework for designing the appropriate experiments that are the likely to lead to accurate and rapid field-forward diagnostics.« less
Krishnakumar, Raga; Sinha, Anupama; Bird, Sara W.; ...
2018-02-16
Emerging sequencing technologies are allowing us to characterize environmental, clinical and laboratory samples with increasing speed and detail, including real-time analysis and interpretation of data. One example of this is being able to rapidly and accurately detect a wide range of pathogenic organisms, both in the clinic and the field. Genomes can have radically different GC content however, such that accurate sequence analysis can be challenging depending upon the technology used. Here, we have characterized the performance of the Oxford MinION nanopore sequencer for detection and evaluation of organisms with a range of genomic nucleotide bias. We have diagnosed themore » quality of base-calling across individual reads and discovered that the position within the read affects base-calling and quality scores. Finally, we have evaluated the performance of the current state-of-the-art neural network-based MinION basecaller, characterizing its behavior with respect to systemic errors as well as context- and sequence-specific errors. Overall, we present a detailed characterization the capabilities of the MinION in terms of generating high-accuracy sequence data from genomes with a wide range of nucleotide content. This study provides a framework for designing the appropriate experiments that are the likely to lead to accurate and rapid field-forward diagnostics.« less
The UK’s 100,000 Genomes Project: manifesting policymakers’ expectations
Samuel, Gabrielle Natalie; Farsides, Bobbie
2017-01-01
The UK’s 100,000 Genomes Project has the aim of sequencing 100,000 genomes from UK National Health Service (NHS) patients while concomitantly transforming clinical care such that whole genome sequencing becomes routine clinical practice in the UK. Policymakers claim that the project will revolutionize NHS care. We wished to explore the 100,000 Genomes Project, and in particular, the extent to which policymaker claims have helped or hindered the work of those associated with Genomics England – the company established by the Department of Health to deliver the project. We interviewed 20 individuals linked to, or working for Genomics England. Interviewees had double-edged views about the context within which they were working. On the one hand, policymakers’ expectations attached to the venture were considered vacuous “genohype”; on the other hand, they were considered the impetus needed for those trying to advance genomic research into clinical practice. Findings should be considered for future genomes projects. PMID:29238265
Evolution and Diversity of Transposable Elements in Vertebrate Genomes.
Sotero-Caio, Cibele G; Platt, Roy N; Suh, Alexander; Ray, David A
2017-01-01
Transposable elements (TEs) are selfish genetic elements that mobilize in genomes via transposition or retrotransposition and often make up large fractions of vertebrate genomes. Here, we review the current understanding of vertebrate TE diversity and evolution in the context of recent advances in genome sequencing and assembly techniques. TEs make up 4-60% of assembled vertebrate genomes, and deeply branching lineages such as ray-finned fishes and amphibians generally exhibit a higher TE diversity than the more recent radiations of birds and mammals. Furthermore, the list of taxa with exceptional TE landscapes is growing. We emphasize that the current bottleneck in genome analyses lies in the proper annotation of TEs and provide examples where superficial analyses led to misleading conclusions about genome evolution. Finally, recent advances in long-read sequencing will soon permit access to TE-rich genomic regions that previously resisted assembly including the gigantic, TE-rich genomes of salamanders and lungfishes. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Nielsen, Tue Kjærgaard; Rasmussen, Morten; Demanèche, Sandrine; Cecillon, Sébastien; Vogel, Timothy M; Hansen, Lars Hestbjerg
2017-09-01
Bacterial degraders of chlorophenoxy herbicides have been isolated from various ecosystems, including pristine environments. Among these degraders, the sphingomonads constitute a prominent group that displays versatile xenobiotic-degradation capabilities. Four separate sequencing strategies were required to provide the complete sequence of the complex and plastic genome of the canonical chlorophenoxy herbicide-degrading Sphingobium herbicidovorans MH. The genome has an intricate organization of the chlorophenoxy-herbicide catabolic genes sdpA, rdpA, and cadABCD that encode the (R)- and (S)-enantiomer-specific 2,4-dichlorophenoxypropionate dioxygenases and four subunits of a Rieske non-heme iron oxygenase involved in 2-methyl-chlorophenoxyacetic acid degradation, respectively. Several major genomic rearrangements are proposed to help understand the evolution and mobility of these important genes and their genetic context. Single-strain mobilomic sequence analysis uncovered plasmids and insertion sequence-associated circular intermediates in this environmentally important bacterium and enabled the description of evolutionary models for pesticide degradation in strain MH and related organisms. The mobilome presented a complex mosaic of mobile genetic elements including four plasmids and several circular intermediate DNA molecules of insertion-sequence elements and transposons that are central to the evolution of xenobiotics degradation. Furthermore, two individual chromosomally integrated prophages were shown to excise and form free circular DNA molecules. This approach holds great potential for improving the understanding of genome plasticity, evolution, and microbial ecology. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Analyzing the Role of MicroRNAs in Schizophrenia in the Context of Common Genetic Risk Variants.
Hauberg, Mads Engel; Roussos, Panos; Grove, Jakob; Børglum, Anders Dupont; Mattheisen, Manuel
2016-04-01
The recent implication of 108 genomic loci in schizophrenia marked a great advancement in our understanding of the disease. Against the background of its polygenic nature there is a necessity to identify how schizophrenia risk genes interplay. As regulators of gene expression, microRNAs (miRNAs) have repeatedly been implicated in schizophrenia etiology. It is therefore of interest to establish their role in the regulation of schizophrenia risk genes in disease-relevant biological processes. To examine the role of miRNAs in schizophrenia in the context of disease-associated genetic variation. The basis of this study was summary statistics from the largest schizophrenia genome-wide association study meta-analysis to date (83 550 individuals in a meta-analysis of 52 genome-wide association studies) completed in 2014 along with publicly available data for predicted miRNA targets. We examined whether schizophrenia risk genes were more likely to be regulated by miRNA. Further, we used gene set analyses to identify miRNAs that are regulators of schizophrenia risk genes. Results from association tests for miRNA targetomes and related analyses. In line with previous studies, we found that similar to other complex traits, schizophrenia risk genes were more likely to be regulated by miRNAs (P < 2 × 10-16). Further, the gene set analyses revealed several miRNAs regulating schizophrenia risk genes, with the strongest enrichment for targets of miR-9-5p (P = .0056 for enrichment among the top 1% most-associated single-nucleotide polymorphisms, corrected for multiple testing). It is further of note that MIR9-2 is located in a genomic region showing strong evidence for association with schizophrenia (P = 7.1 × 10-8). The second and third strongest gene set signals were seen for the targets of miR-485-5p and miR-137, respectively. This study provides evidence for a role of miR-9-5p in the etiology of schizophrenia. Its implication is of particular interest as the functions of this neurodevelopmental miRNA tie in with established disease biology: it has a regulatory loop with the fragile X mental retardation homologue FXR1 and regulates dopamine D2 receptor density.
Solutions for data integration in functional genomics: a critical assessment and case study.
Smedley, Damian; Swertz, Morris A; Wolstencroft, Katy; Proctor, Glenn; Zouberakis, Michael; Bard, Jonathan; Hancock, John M; Schofield, Paul
2008-11-01
The torrent of data emerging from the application of new technologies to functional genomics and systems biology can no longer be contained within the traditional modes of data sharing and publication with the consequence that data is being deposited in, distributed across and disseminated through an increasing number of databases. The resulting fragmentation poses serious problems for the model organism community which increasingly rely on data mining and computational approaches that require gathering of data from a range of sources. In the light of these problems, the European Commission has funded a coordination action, CASIMIR (coordination and sustainability of international mouse informatics resources), with a remit to assess the technical and social aspects of database interoperability that currently prevent the full realization of the potential of data integration in mouse functional genomics. In this article, we assess the current problems with interoperability, with particular reference to mouse functional genomics, and critically review the technologies that can be deployed to overcome them. We describe a typical use-case where an investigator wishes to gather data on variation, genomic context and metabolic pathway involvement for genes discovered in a genome-wide screen. We go on to develop an automated approach involving an in silico experimental workflow tool, Taverna, using web services, BioMart and MOLGENIS technologies for data retrieval. Finally, we focus on the current impediments to adopting such an approach in a wider context, and strategies to overcome them.
Lopes, Anne; Amarir-Bouhram, Jihane; Faure, Guilhem; Petit, Marie-Agnès; Guerois, Raphaël
2010-01-01
Homologous recombination is a key in contributing to bacteriophages genome repair, circularization and replication. No less than six kinds of recombinase genes have been reported so far in bacteriophage genomes, two (UvsX and Gp2.5) from virulent, and four (Sak, Redβ, Erf and Sak4) from temperate phages. Using profile–profile comparisons, structure-based modelling and gene-context analyses, we provide new views on the global landscape of recombinases in 465 bacteriophages. We show that Sak, Redβ and Erf belong to a common large superfamily adopting a shortcut Rad52-like fold. Remote homologs of Sak4 are predicted to adopt a shortcut Rad51/RecA fold and are discovered widespread among phage genomes. Unexpectedly, within temperate phages, gene-context analyses also pinpointed the presence of distant Gp2.5 homologs, believed to be restricted to virulent phages. All in all, three major superfamilies of phage recombinases emerged either related to Rad52-like, Rad51-like or Gp2.5-like proteins. For two newly detected recombinases belonging to the Sak4 and Gp2.5 families, we provide experimental evidence of their recombination activity in vivo. Temperate versus virulent lifestyle together with the importance of genome mosaicism is discussed in the light of these novel recombinases. Screening for these recombinases in genomes can be performed at http://biodev.extra.cea.fr/virfam. PMID:20194117
The UCSC Genome Browser: What Every Molecular Biologist Should Know
Mangan, Mary E.; Williams, Jennifer M.; Kuhn, Robert M.; Lathe, Warren C.
2016-01-01
Electronic data resources can enable molecular biologists to query and display many useful features that make benchwork more efficient and drive new discoveries. The UCSC Genome Browser provides a wealth of data and tools that advance one’s understanding of genomic context for many species, enable detailed understanding of data, and provide the ability to interrogate regions of interest. Researchers can also supplement the standard display with their own data to query and share with others. Effective use of these resources has become crucial to biological research today, and this unit describes some practical applications of the UCSC Genome Browser. PMID:19816931
Ethical, legal, social, and policy issues in the use of genomic technology by the U.S. Military
Mehlman, Maxwell J.; Li, Tracy Yeheng
2014-01-01
Advances in genomic science are attracting the interest of the U.S. military for their potential to improve medical care for members of the military and to aid in military recruitment, training, specialization, and mission accomplishment. While researchers have explored the ethical, legal, and social issues raised by the use of genomic science in a wide variety of contexts, there has been virtually no examination of these issues in connection with the use of genomics by the military. This article identifies potential uses of genomic science by the military, proposes an applicable ethical and legal framework, and applies the framework to provide ethical and legal guidance for military decision-makers. PMID:25937933
Regulatory RNA-assisted genome engineering in microorganisms.
Si, Tong; HamediRad, Mohammad; Zhao, Huimin
2015-12-01
Regulatory RNAs are increasingly recognized and utilized as key modulators of gene expression in diverse organisms. Thanks to their modular and programmable nature, trans-acting regulatory RNAs are especially attractive in genome-scale applications. Here we discuss the recent examples in microbial genome engineering implementing various trans-acting RNA platforms, including sRNA, RNAi, asRNA and CRISRP-Cas. In particular, we focus on how the scalable and multiplex nature of trans-acting RNAs has been used to tackle the challenges in creating genome-wide and combinatorial diversity for functional genomics and metabolic engineering applications. Advances in computational design and context-dependent regulation are also discussed for their contribution in improving fine-tuning capabilities of trans-acting RNAs. Copyright © 2015 Elsevier Ltd. All rights reserved.
A lncRNA Perspective into (Re)Building the Heart.
Frank, Stefan; Aguirre, Aitor; Hescheler, Juergen; Kurian, Leo
2016-01-01
Our conception of the human genome, long focused on the 2% that codes for proteins, has profoundly changed since its first draft assembly in 2001. Since then, an unanticipatedly expansive functionality and convolution has been attributed to the majority of the genome that is transcribed in a cell-type/context-specific manner into transcripts with no apparent protein coding ability. While the majority of these transcripts, currently annotated as long non-coding RNAs (lncRNAs), are functionally uncharacterized, their prominent role in embryonic development and tissue homeostasis, especially in the context of the heart, is emerging. In this review, we summarize and discuss the latest advances in understanding the relevance of lncRNAs in (re)building the heart.
Expertise for Teaching Biology Situated in the Context of Genetic Testing
ERIC Educational Resources Information Center
van der Zande, Paul; Akkerman, Sanne F.; Brekelmans, Mieke; Waarlo, Arend Jan; Vermunt, Jan D.
2012-01-01
Contemporary genomics research will impact the daily practice of biology teachers who want to teach up-to-date genetics in secondary education. This article reports on a research project aimed at enhancing biology teachers' expertise for teaching genetics situated in the context of genetic testing. The increasing body of scientific knowledge…
Context-based retrieval of functional modules in protein-protein interaction networks.
Dobay, Maria Pamela; Stertz, Silke; Delorenzi, Mauro
2017-03-27
Various techniques have been developed for identifying the most probable interactants of a protein under a given biological context. In this article, we dissect the effects of the choice of the protein-protein interaction network (PPI) and the manipulation of PPI settings on the network neighborhood of the influenza A virus (IAV) network, as well as hits in genome-wide small interfering RNA screen results for IAV host factors. We investigate the potential of context filtering, which uses text mining evidence linked to PPI edges, as a complement to the edge confidence scores typically provided in PPIs for filtering, for obtaining more biologically relevant network neighborhoods. Here, we estimate the maximum performance of context filtering to isolate a Kyoto Encyclopedia of Genes and Genomes (KEGG) network Ki from a union of KEGG networks and its network neighborhood. The work gives insights on the use of human PPIs in network neighborhood approaches for functional inference. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Wasabi: An Integrated Platform for Evolutionary Sequence Analysis and Data Visualization.
Veidenberg, Andres; Medlar, Alan; Löytynoja, Ari
2016-04-01
Wasabi is an open source, web-based environment for evolutionary sequence analysis. Wasabi visualizes sequence data together with a phylogenetic tree within a modern, user-friendly interface: The interface hides extraneous options, supports context sensitive menus, drag-and-drop editing, and displays additional information, such as ancestral sequences, associated with specific tree nodes. The Wasabi environment supports reproducibility by automatically storing intermediate analysis steps and includes built-in functions to share data between users and publish analysis results. For computational analysis, Wasabi supports PRANK and PAGAN for phylogeny-aware alignment and alignment extension, and it can be easily extended with other tools. Along with drag-and-drop import of local files, Wasabi can access remote data through URL and import sequence data, GeneTrees and EPO alignments directly from Ensembl. To demonstrate a typical workflow using Wasabi, we reproduce key findings from recent comparative genomics studies, including a reanalysis of the EGLN1 gene from the tiger genome study: These case studies can be browsed within Wasabi at http://wasabiapp.org:8000?id=usecases. Wasabi runs inside a web browser and does not require any installation. One can start using it at http://wasabiapp.org. All source code is licensed under the AGPLv3. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
The path to enlightenment: making sense of genomic and proteomic information.
Maurer, Martin H
2004-05-01
Whereas genomics describes the study of genome, mainly represented by its gene expression on the DNA or RNA level, the term proteomics denotes the study of the proteome, which is the protein complement encoded by the genome. In recent years, the number of proteomic experiments increased tremendously. While all fields of proteomics have made major technological advances, the biggest step was seen in bioinformatics. Biological information management relies on sequence and structure databases and powerful software tools to translate experimental results into meaningful biological hypotheses and answers. In this resource article, I provide a collection of databases and software available on the Internet that are useful to interpret genomic and proteomic data. The article is a toolbox for researchers who have genomic or proteomic datasets and need to put their findings into a biological context.
Reyes-Velasco, Jacobo; Card, Daren C; Andrew, Audra L; Shaney, Kyle J; Adams, Richard H; Schield, Drew R; Casewell, Nicholas R; Mackessy, Stephen P; Castoe, Todd A
2015-01-01
Snake venom gene evolution has been studied intensively over the past several decades, yet most previous studies have lacked the context of complete snake genomes and the full context of gene expression across diverse snake tissues. We took a novel approach to studying snake venom evolution by leveraging the complete genome of the Burmese python, including information from tissue-specific patterns of gene expression. We identified the orthologs of snake venom genes in the python genome, and conducted detailed analysis of gene expression of these venom homologs to identify patterns that differ between snake venom gene families and all other genes. We found that venom gene homologs in the python are expressed in many different tissues outside of oral glands, which illustrates the pitfalls of using transcriptomic data alone to define "venom toxins." We hypothesize that the python may represent an ancestral state prior to major venom development, which is supported by our finding that the expansion of venom gene families is largely restricted to highly venomous caenophidian snakes. Therefore, the python provides insight into biases in which genes were recruited for snake venom systems. Python venom homologs are generally expressed at lower levels, have higher variance among tissues, and are expressed in fewer organs compared with all other python genes. We propose a model for the evolution of snake venoms in which venom genes are recruited preferentially from genes with particular expression profile characteristics, which facilitate a nearly neutral transition toward specialized venom system expression. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Shinkai, Yoichi; Kuramochi, Masahiro; Doi, Motomichi
2018-05-03
Recently, advances in next-generation sequencing technologies have enabled genome-wide analyses of epigenetic modifications; however, it remains difficult to analyze the states of histone modifications at a single-cell resolution in living multicellular organisms because of the heterogeneity within cellular populations. Here we describe a simple method to visualize histone modifications on the specific sequence of target locus at a single-cell resolution in living Caenorhabditis elegans , by combining the LacO/LacI system and a genetically-encoded H4K20me1-specific probe, "mintbody". We demonstrate that Venus-labeled mintbody and mTurquoise2-labeled LacI can co-localize on an artificial chromosome carrying both the target locus and LacO sequences, where H4K20me1 marks the target locus. We demonstrate that our visualization method can precisely detect H4K20me1 depositions on the her-1 gene sequences on the artificial chromosome, to which the dosage compensation complex binds to regulate sex determination. The degree of H4K20me1 deposition on the her-1 sequences on the artificial chromosome correlated strongly with sex, suggesting that, using the artificial chromosome, this method can reflect context-dependent changes of H4K20me1 on endogenous genomes. Furthermore, we demonstrate live imaging of H4K20me1 depositions on the artificial chromosome. Combined with ChIP assays, this mintbody-LacO/LacI visualization method will enable analysis of developmental and context-dependent alterations of locus-specific histone modifications in specific cells and elucidation of the underlying molecular mechanisms. Copyright © 2018, G3: Genes, Genomes, Genetics.
Insights into structural variations and genome rearrangements in prokaryotic genomes.
Periwal, Vinita; Scaria, Vinod
2015-01-01
Structural variations (SVs) are genomic rearrangements that affect fairly large fragments of DNA. Most of the SVs such as inversions, deletions and translocations have been largely studied in context of genetic diseases in eukaryotes. However, recent studies demonstrate that genome rearrangements can also have profound impact on prokaryotic genomes, leading to altered cell phenotype. In contrast to single-nucleotide variations, SVs provide a much deeper insight into organization of bacterial genomes at a much better resolution. SVs can confer change in gene copy number, creation of new genes, altered gene expression and many other functional consequences. High-throughput technologies have now made it possible to explore SVs at a much refined resolution in bacterial genomes. Through this review, we aim to highlight the importance of the less explored field of SVs in prokaryotic genomes and their impact. We also discuss its potential applicability in the emerging fields of synthetic biology and genome engineering where targeted SVs could serve to create sophisticated and accurate genome editing. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Biogeography of the Sulfolobus islandicus pan-genome
Reno, Michael L.; Held, Nicole L.; Fields, Christopher J.; Burke, Patricia V.; Whitaker, Rachel J.
2009-01-01
Variation in gene content has been hypothesized to be the primary mode of adaptive evolution in microorganisms; however, very little is known about the spatial and temporal distribution of variable genes. Through population-scale comparative genomics of 7 Sulfolobus islandicus genomes from 3 locations, we demonstrate the biogeographical structure of the pan-genome of this species, with no evidence of gene flow between geographically isolated populations. The evolutionary independence of each population allowed us to assess genome dynamics over very recent evolutionary time, beginning ≈910,000 years ago. On this time scale, genome variation largely consists of recent strain-specific integration of mobile elements. Localized sectors of parallel gene loss are identified; however, the balance between the gain and loss of genetic material suggests that S. islandicus genomes acquire material slowly over time, primarily from closely related Sulfolobus species. Examination of the genome dynamics through population genomics in S. islandicus exposes the process of allopatric speciation in thermophilic Archaea and brings us closer to a generalized framework for understanding microbial genome evolution in a spatial context. PMID:19435847
Wei, Yingying; Wu, George; Ji, Hongkai
2013-05-01
Mapping genome-wide binding sites of all transcription factors (TFs) in all biological contexts is a critical step toward understanding gene regulation. The state-of-the-art technologies for mapping transcription factor binding sites (TFBSs) couple chromatin immunoprecipitation (ChIP) with high-throughput sequencing (ChIP-seq) or tiling array hybridization (ChIP-chip). These technologies have limitations: they are low-throughput with respect to surveying many TFs. Recent advances in genome-wide chromatin profiling, including development of technologies such as DNase-seq, FAIRE-seq and ChIP-seq for histone modifications, make it possible to predict in vivo TFBSs by analyzing chromatin features at computationally determined DNA motif sites. This promising new approach may allow researchers to monitor the genome-wide binding sites of many TFs simultaneously. In this article, we discuss various experimental design and data analysis issues that arise when applying this approach. Through a systematic analysis of the data from the Encyclopedia Of DNA Elements (ENCODE) project, we compare the predictive power of individual and combinations of chromatin marks using supervised and unsupervised learning methods, and evaluate the value of integrating information from public ChIP and gene expression data. We also highlight the challenges and opportunities for developing novel analytical methods, such as resolving the one-motif-multiple-TF ambiguity and distinguishing functional and non-functional TF binding targets from the predicted binding sites. The online version of this article (doi:10.1007/s12561-012-9066-5) contains supplementary material, which is available to authorized users.
Identification of 15 candidate structured noncoding RNA motifs in fungi by comparative genomics.
Li, Sanshu; Breaker, Ronald R
2017-10-13
With the development of rapid and inexpensive DNA sequencing, the genome sequences of more than 100 fungal species have been made available. This dataset provides an excellent resource for comparative genomics analyses, which can be used to discover genetic elements, including noncoding RNAs (ncRNAs). Bioinformatics tools similar to those used to uncover novel ncRNAs in bacteria, likewise, should be useful for searching fungal genomic sequences, and the relative ease of genetic experiments with some model fungal species could facilitate experimental validation studies. We have adapted a bioinformatics pipeline for discovering bacterial ncRNAs to systematically analyze many fungal genomes. This comparative genomics pipeline integrates information on conserved RNA sequence and structural features with alternative splicing information to reveal fungal RNA motifs that are candidate regulatory domains, or that might have other possible functions. A total of 15 prominent classes of structured ncRNA candidates were identified, including variant HDV self-cleaving ribozyme representatives, atypical snoRNA candidates, and possible structured antisense RNA motifs. Candidate regulatory motifs were also found associated with genes for ribosomal proteins, S-adenosylmethionine decarboxylase (SDC), amidase, and HexA protein involved in Woronin body formation. We experimentally confirm that the variant HDV ribozymes undergo rapid self-cleavage, and we demonstrate that the SDC RNA motif reduces the expression of SAM decarboxylase by translational repression. Furthermore, we provide evidence that several other motifs discovered in this study are likely to be functional ncRNA elements. Systematic screening of fungal genomes using a computational discovery pipeline has revealed the existence of a variety of novel structured ncRNAs. Genome contexts and similarities to known ncRNA motifs provide strong evidence for the biological and biochemical functions of some newly found ncRNA motifs. Although initial examinations of several motifs provide evidence for their likely functions, other motifs will require more in-depth analysis to reveal their functions.
CRISPR/Cas9: From Genome Engineering to Cancer Drug Discovery
Luo, Ji
2016-01-01
Advances in translational research are often driven by new technologies. The advent of microarrays, next-generation sequencing, proteomics and RNA interference (RNAi) have led to breakthroughs in our understanding of the mechanisms of cancer and the discovery of new cancer drug targets. The discovery of the bacterial clustered regularly interspaced palindromic repeat (CRISPR) system and its subsequent adaptation as a tool for mammalian genome engineering has opened up new avenues for functional genomics studies. This review will focus on the utility of CRISPR in the context of cancer drug target discovery. PMID:28603775
Ai, Yuncan; Ai, Hannan; Meng, Fanmei; Zhao, Lei
2013-01-01
No attention has been paid on comparing a set of genome sequences crossing genetic components and biological categories with far divergence over large size range. We define it as the systematic comparative genomics and aim to develop the methodology. First, we create a method, GenomeFingerprinter, to unambiguously produce a set of three-dimensional coordinates from a sequence, followed by one three-dimensional plot and six two-dimensional trajectory projections, to illustrate the genome fingerprint of a given genome sequence. Second, we develop a set of concepts and tools, and thereby establish a method called the universal genome fingerprint analysis (UGFA). Particularly, we define the total genetic component configuration (TGCC) (including chromosome, plasmid, and phage) for describing a strain as a systematic unit, the universal genome fingerprint map (UGFM) of TGCC for differentiating strains as a universal system, and the systematic comparative genomics (SCG) for comparing a set of genomes crossing genetic components and biological categories. Third, we construct a method of quantitative analysis to compare two genomes by using the outcome dataset of genome fingerprint analysis. Specifically, we define the geometric center and its geometric mean for a given genome fingerprint map, followed by the Euclidean distance, the differentiate rate, and the weighted differentiate rate to quantitatively describe the difference between two genomes of comparison. Moreover, we demonstrate the applications through case studies on various genome sequences, giving tremendous insights into the critical issues in microbial genomics and taxonomy. We have created a method, GenomeFingerprinter, for rapidly computing, geometrically visualizing, intuitively comparing a set of genomes at genome fingerprint level, and hence established a method called the universal genome fingerprint analysis, as well as developed a method of quantitative analysis of the outcome dataset. These have set up the methodology of systematic comparative genomics based on the genome fingerprint analysis.
Funding Opportunity: Genomic Data Centers
Funding Opportunity CCG, Funding Opportunity Center for Cancer Genomics, CCG, Center for Cancer Genomics, CCG RFA, Center for cancer genomics rfa, genomic data analysis network, genomic data analysis network centers,
Usability study of clinical exome analysis software: top lessons learned and recommendations.
Shyr, Casper; Kushniruk, Andre; Wasserman, Wyeth W
2014-10-01
New DNA sequencing technologies have revolutionized the search for genetic disruptions. Targeted sequencing of all protein coding regions of the genome, called exome analysis, is actively used in research-oriented genetics clinics, with the transition to exomes as a standard procedure underway. This transition is challenging; identification of potentially causal mutation(s) amongst ∼10(6) variants requires specialized computation in combination with expert assessment. This study analyzes the usability of user interfaces for clinical exome analysis software. There are two study objectives: (1) To ascertain the key features of successful user interfaces for clinical exome analysis software based on the perspective of expert clinical geneticists, (2) To assess user-system interactions in order to reveal strengths and weaknesses of existing software, inform future design, and accelerate the clinical uptake of exome analysis. Surveys, interviews, and cognitive task analysis were performed for the assessment of two next-generation exome sequence analysis software packages. The subjects included ten clinical geneticists who interacted with the software packages using the "think aloud" method. Subjects' interactions with the software were recorded in their clinical office within an urban research and teaching hospital. All major user interface events (from the user interactions with the packages) were time-stamped and annotated with coding categories to identify usability issues in order to characterize desired features and deficiencies in the user experience. We detected 193 usability issues, the majority of which concern interface layout and navigation, and the resolution of reports. Our study highlights gaps in specific software features typical within exome analysis. The clinicians perform best when the flow of the system is structured into well-defined yet customizable layers for incorporation within the clinical workflow. The results highlight opportunities to dramatically accelerate clinician analysis and interpretation of patient genomic data. We present the first application of usability methods to evaluate software interfaces in the context of exome analysis. Our results highlight how the study of user responses can lead to identification of usability issues and challenges and reveal software reengineering opportunities for improving clinical next-generation sequencing analysis. While the evaluation focused on two distinctive software tools, the results are general and should inform active and future software development for genome analysis software. As large-scale genome analysis becomes increasingly common in healthcare, it is critical that efficient and effective software interfaces are provided to accelerate clinical adoption of the technology. Implications for improved design of such applications are discussed. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Human body epigenome maps reveal noncanonical DNA methylation variation.
Schultz, Matthew D; He, Yupeng; Whitaker, John W; Hariharan, Manoj; Mukamel, Eran A; Leung, Danny; Rajagopal, Nisha; Nery, Joseph R; Urich, Mark A; Chen, Huaming; Lin, Shin; Lin, Yiing; Jung, Inkyung; Schmitt, Anthony D; Selvaraj, Siddarth; Ren, Bing; Sejnowski, Terrence J; Wang, Wei; Ecker, Joseph R
2015-07-09
Understanding the diversity of human tissues is fundamental to disease and requires linking genetic information, which is identical in most of an individual's cells, with epigenetic mechanisms that could have tissue-specific roles. Surveys of DNA methylation in human tissues have established a complex landscape including both tissue-specific and invariant methylation patterns. Here we report high coverage methylomes that catalogue cytosine methylation in all contexts for the major human organ systems, integrated with matched transcriptomes and genomic sequence. By combining these diverse data types with each individuals' phased genome, we identified widespread tissue-specific differential CG methylation (mCG), partially methylated domains, allele-specific methylation and transcription, and the unexpected presence of non-CG methylation (mCH) in almost all human tissues. mCH correlated with tissue-specific functions, and using this mark, we made novel predictions of genes that escape X-chromosome inactivation in specific tissues. Overall, DNA methylation in several genomic contexts varies substantially among human tissues.
FISH Oracle: a web server for flexible visualization of DNA copy number data in a genomic context.
Mader, Malte; Simon, Ronald; Steinbiss, Sascha; Kurtz, Stefan
2011-07-28
The rapidly growing amount of array CGH data requires improved visualization software supporting the process of identifying candidate cancer genes. Optimally, such software should work across multiple microarray platforms, should be able to cope with data from different sources and should be easy to operate. We have developed a web-based software FISH Oracle to visualize data from multiple array CGH experiments in a genomic context. Its fast visualization engine and advanced web and database technology supports highly interactive use. FISH Oracle comes with a convenient data import mechanism, powerful search options for genomic elements (e.g. gene names or karyobands), quick navigation and zooming into interesting regions, and mechanisms to export the visualization into different high quality formats. These features make the software especially suitable for the needs of life scientists. FISH Oracle offers a fast and easy to use visualization tool for array CGH and SNP array data. It allows for the identification of genomic regions representing minimal common changes based on data from one or more experiments. FISH Oracle will be instrumental to identify candidate onco and tumor suppressor genes based on the frequency and genomic position of DNA copy number changes. The FISH Oracle application and an installed demo web server are available at http://www.zbh.uni-hamburg.de/fishoracle.
FISH Oracle: a web server for flexible visualization of DNA copy number data in a genomic context
2011-01-01
Background The rapidly growing amount of array CGH data requires improved visualization software supporting the process of identifying candidate cancer genes. Optimally, such software should work across multiple microarray platforms, should be able to cope with data from different sources and should be easy to operate. Results We have developed a web-based software FISH Oracle to visualize data from multiple array CGH experiments in a genomic context. Its fast visualization engine and advanced web and database technology supports highly interactive use. FISH Oracle comes with a convenient data import mechanism, powerful search options for genomic elements (e.g. gene names or karyobands), quick navigation and zooming into interesting regions, and mechanisms to export the visualization into different high quality formats. These features make the software especially suitable for the needs of life scientists. Conclusions FISH Oracle offers a fast and easy to use visualization tool for array CGH and SNP array data. It allows for the identification of genomic regions representing minimal common changes based on data from one or more experiments. FISH Oracle will be instrumental to identify candidate onco and tumor suppressor genes based on the frequency and genomic position of DNA copy number changes. The FISH Oracle application and an installed demo web server are available at http://www.zbh.uni-hamburg.de/fishoracle. PMID:21884636
The evolution of early cellular systems viewed through the lens of biological interactions.
Poole, Anthony M; Lundin, Daniel; Rytkönen, Kalle T
2015-01-01
The minimal cell concept represents a pragmatic approach to the question of how few genes are required to run a cell. This is a helpful way to build a parts-list, and has been more successful than attempts to deduce a minimal gene set for life by inferring the gene repertoire of the last universal common ancestor, as few genes trace back to this hypothetical ancestral state. However, the study of minimal cellular systems is the study of biological outliers where, by practical necessity, coevolutionary interactions are minimized or ignored. In this paper, we consider the biological context from which minimal genomes have been removed. For instance, some of the most reduced genomes are from endosymbionts and are the result of coevolutionary interactions with a host; few such organisms are "free-living." As few, if any, biological systems exist in complete isolation, we expect that, as with modern life, early biological systems were part of an ecosystem, replete with organismal interactions. We favor refocusing discussions of the evolution of cellular systems on processes rather than gene counts. We therefore draw a distinction between a pragmatic minimal cell (an interesting engineering problem), a distributed genome (a system resulting from an evolutionary transition involving more than one cell) and the looser coevolutionary interactions that are ubiquitous in ecosystems. Finally, we consider the distributed genome and coevolutionary interactions between genomic entities in the context of early evolution.
Rudd, Stephen
2005-01-01
The public expressed sequence tag collections are continually being enriched with high-quality sequences that represent an ever-expanding range of taxonomically diverse plant species. While these sequence collections provide biased insight into the populations of expressed genes available within individual species and their associated tissues, the information is conceivably of wider relevance in a comparative context. When we consider the available expressed sequence tag (EST) collections of summer 2004, most of the major plant taxonomic clades are at least superficially represented. Investigation of the five million available plant ESTs provides a wealth of information that has applications in modelling the routes of plant genome evolution and the identification of lineage-specific genes and gene families. Over four million ESTs from over 50 distinct plant species have been collated within an EST analysis pipeline called openSputnik. The ESTs were resolved down into approximately one million unigene sequences. These have been annotated using orthology-based annotation transfer from reference plant genomes and using a variety of contemporary bioinformatics methods to assign peptide, structural and functional attributes. The openSputnik database is available at http://sputnik.btk.fi.
AnnotCompute: annotation-based exploration and meta-analysis of genomics experiments
Zheng, Jie; Stoyanovich, Julia; Manduchi, Elisabetta; Liu, Junmin; Stoeckert, Christian J.
2011-01-01
The ever-increasing scale of biological data sets, particularly those arising in the context of high-throughput technologies, requires the development of rich data exploration tools. In this article, we present AnnotCompute, an information discovery platform for repositories of functional genomics experiments such as ArrayExpress. Our system leverages semantic annotations of functional genomics experiments with controlled vocabulary and ontology terms, such as those from the MGED Ontology, to compute conceptual dissimilarities between pairs of experiments. These dissimilarities are then used to support two types of exploratory analysis—clustering and query-by-example. We show that our proposed dissimilarity measures correspond to a user's intuition about conceptual dissimilarity, and can be used to support effective query-by-example. We also evaluate the quality of clustering based on these measures. While AnnotCompute can support a richer data exploration experience, its effectiveness is limited in some cases, due to the quality of available annotations. Nonetheless, tools such as AnnotCompute may provide an incentive for richer annotations of experiments. Code is available for download at http://www.cbil.upenn.edu/downloads/AnnotCompute. Database URL: http://www.cbil.upenn.edu/annotCompute/ PMID:22190598
Dereeper, Alexis; Nicolas, Stéphane; Le Cunff, Loïc; Bacilieri, Roberto; Doligez, Agnès; Peros, Jean-Pierre; Ruiz, Manuel; This, Patrice
2011-05-05
High-throughput re-sequencing, new genotyping technologies and the availability of reference genomes allow the extensive characterization of Single Nucleotide Polymorphisms (SNPs) and insertion/deletion events (indels) in many plant species. The rapidly increasing amount of re-sequencing and genotyping data generated by large-scale genetic diversity projects requires the development of integrated bioinformatics tools able to efficiently manage, analyze, and combine these genetic data with genome structure and external data. In this context, we developed SNiPlay, a flexible, user-friendly and integrative web-based tool dedicated to polymorphism discovery and analysis. It integrates:1) a pipeline, freely accessible through the internet, combining existing softwares with new tools to detect SNPs and to compute different types of statistical indices and graphical layouts for SNP data. From standard sequence alignments, genotyping data or Sanger sequencing traces given as input, SNiPlay detects SNPs and indels events and outputs submission files for the design of Illumina's SNP chips. Subsequently, it sends sequences and genotyping data into a series of modules in charge of various processes: physical mapping to a reference genome, annotation (genomic position, intron/exon location, synonymous/non-synonymous substitutions), SNP frequency determination in user-defined groups, haplotype reconstruction and network, linkage disequilibrium evaluation, and diversity analysis (Pi, Watterson's Theta, Tajima's D).Furthermore, the pipeline allows the use of external data (such as phenotype, geographic origin, taxa, stratification) to define groups and compare statistical indices.2) a database storing polymorphisms, genotyping data and grapevine sequences released by public and private projects. It allows the user to retrieve SNPs using various filters (such as genomic position, missing data, polymorphism type, allele frequency), to compare SNP patterns between populations, and to export genotyping data or sequences in various formats. Our experiments on grapevine genetic projects showed that SNiPlay allows geneticists to rapidly obtain advanced results in several key research areas of plant genetic diversity. Both the management and treatment of large amounts of SNP data are rendered considerably easier for end-users through automation and integration. Current developments are taking into account new advances in high-throughput technologies.SNiPlay is available at: http://sniplay.cirad.fr/.
Smith, Adam Alexander Thil; Belda, Eugeni; Viari, Alain; Medigue, Claudine; Vallenet, David
2012-05-01
Of all biochemically characterized metabolic reactions formalized by the IUBMB, over one out of four have yet to be associated with a nucleic or protein sequence, i.e. are sequence-orphan enzymatic activities. Few bioinformatics annotation tools are able to propose candidate genes for such activities by exploiting context-dependent rather than sequence-dependent data, and none are readily accessible and propose result integration across multiple genomes. Here, we present CanOE (Candidate genes for Orphan Enzymes), a four-step bioinformatics strategy that proposes ranked candidate genes for sequence-orphan enzymatic activities (or orphan enzymes for short). The first step locates "genomic metabolons", i.e. groups of co-localized genes coding proteins catalyzing reactions linked by shared metabolites, in one genome at a time. These metabolons can be particularly helpful for aiding bioanalysts to visualize relevant metabolic data. In the second step, they are used to generate candidate associations between un-annotated genes and gene-less reactions. The third step integrates these gene-reaction associations over several genomes using gene families, and summarizes the strength of family-reaction associations by several scores. In the final step, these scores are used to rank members of gene families which are proposed for metabolic reactions. These associations are of particular interest when the metabolic reaction is a sequence-orphan enzymatic activity. Our strategy found over 60,000 genomic metabolons in more than 1,000 prokaryote organisms from the MicroScope platform, generating candidate genes for many metabolic reactions, of which more than 70 distinct orphan reactions. A computational validation of the approach is discussed. Finally, we present a case study on the anaerobic allantoin degradation pathway in Escherichia coli K-12.
Genomic characterization of two large Alu-mediated rearrangements of the BRCA1 gene.
Peixoto, Ana; Pinheiro, Manuela; Massena, Lígia; Santos, Catarina; Pinto, Pedro; Rocha, Patrícia; Pinto, Carla; Teixeira, Manuel R
2013-02-01
To determine whether a large genomic rearrangement is actually novel and to gain insight about the mutational mechanism responsible for its occurrence, molecular characterization with breakpoint identification is mandatory. We here report the characterization of two large deletions involving the BRCA1 gene. The first rearrangement harbored a 89,664-bp deletion comprising exon 7 of the BRCA1 gene to exon 11 of the NBR1 gene (c.441+1724_oNBR1:c.1073+480del). Two highly homologous Alu elements were found in the genomic sequences flanking the deletion breakpoints. Furthermore, a 20-bp overlapping sequence at the breakpoint junction was observed, suggesting that the most likely mechanism for the occurrence of this rearrangement was nonallelic homologous recombination. The second rearrangement fully characterized at the nucleotide level was a BRCA1 exons 11-15 deletion (c.671-319_4677-578delinsAlu). The case harbored a 23,363-bp deletion with an Alu element inserted at the breakpoints of the deleted region. As the Alu element inserted belongs to a still active AluY family, the observed rearrangement could be due to an insertion-mediated deletion mechanism caused by Alu retrotransposition. To conclude, we describe the breakpoints of two novel large deletions involving the BRCA1 gene and analysis of their genomic context allowed us to gain insight about the respective mutational mechanism.
Ludwig, A; Belfiore, N M; Pitra, C; Svirsky, V; Jenneckens, I
2001-07-01
Sturgeon (order Acipenserformes) provide an ideal taxonomic context for examination of genome duplication events. Multiple levels of ploidy exist among these fish. In a novel microsatellite approach, data from 962 fish from 20 sturgeon species were used for analysis of ploidy in sturgeon. Allele numbers in a sample of individuals were assessed at six microsatellite loci. Species with approximately 120 chromosomes are classified as functional diploid species, species with approximately 250 chromosomes as functional tetraploid species, and with approximately 500 chromosomes as functional octaploids. A molecular phylogeny of the sturgeon was determined on the basis of sequences of the entire mitochondrial cytochrome b gene. By mapping the estimated levels of ploidy on this proposed phylogeny we demonstrate that (I) polyploidization events independently occurred in the acipenseriform radiation; (II) the process of functional genome reduction is nearly finished in species with approximately 120 chromosomes and more active in species with approximately 250 chromosomes and approximately 500 chromosomes; and (III) species with approximately 250 and approximately 500 chromosomes arose more recently than those with approximately 120 chromosomes. These results suggest that gene silencing, chromosomal rearrangements, and transposition events played an important role in the acipenseriform genome formation. Furthermore, this phylogeny is broadly consistent with previous hypotheses but reveals a highly supported oceanic (Atlantic-Pacific) subdivision within the Acipenser/Huso complex.
Ludwig, A; Belfiore, N M; Pitra, C; Svirsky, V; Jenneckens, I
2001-01-01
Sturgeon (order Acipenserformes) provide an ideal taxonomic context for examination of genome duplication events. Multiple levels of ploidy exist among these fish. In a novel microsatellite approach, data from 962 fish from 20 sturgeon species were used for analysis of ploidy in sturgeon. Allele numbers in a sample of individuals were assessed at six microsatellite loci. Species with approximately 120 chromosomes are classified as functional diploid species, species with approximately 250 chromosomes as functional tetraploid species, and with approximately 500 chromosomes as functional octaploids. A molecular phylogeny of the sturgeon was determined on the basis of sequences of the entire mitochondrial cytochrome b gene. By mapping the estimated levels of ploidy on this proposed phylogeny we demonstrate that (I) polyploidization events independently occurred in the acipenseriform radiation; (II) the process of functional genome reduction is nearly finished in species with approximately 120 chromosomes and more active in species with approximately 250 chromosomes and approximately 500 chromosomes; and (III) species with approximately 250 and approximately 500 chromosomes arose more recently than those with approximately 120 chromosomes. These results suggest that gene silencing, chromosomal rearrangements, and transposition events played an important role in the acipenseriform genome formation. Furthermore, this phylogeny is broadly consistent with previous hypotheses but reveals a highly supported oceanic (Atlantic-Pacific) subdivision within the Acipenser/Huso complex. PMID:11454768
Patel, Isha R.; Gangiredla, Jayanthi; Lacher, David W.; Mammel, Mark K.; Jackson, Scott A.; Lampel, Keith A.
2016-01-01
ABSTRACT Most Escherichia coli strains are nonpathogenic. However, for clinical diagnosis and food safety analysis, current identification methods for pathogenic E. coli either are time-consuming and/or provide limited information. Here, we utilized a custom DNA microarray with informative genetic features extracted from 368 sequence sets for rapid and high-throughput pathogen identification. The FDA Escherichia coli Identification (FDA-ECID) platform contains three sets of molecularly informative features that together stratify strain identification and relatedness. First, 53 known flagellin alleles, 103 alleles of wzx and wzy, and 5 alleles of wzm provide molecular serotyping utility. Second, 41,932 probe sets representing the pan-genome of E. coli provide strain-level gene content information. Third, approximately 125,000 single nucleotide polymorphisms (SNPs) of available whole-genome sequences (WGS) were distilled to 9,984 SNPs capable of recapitulating the E. coli phylogeny. We analyzed 103 diverse E. coli strains with available WGS data, including those associated with past foodborne illnesses, to determine robustness and accuracy. The array was able to accurately identify the molecular O and H serotypes, potentially correcting serological failures and providing better resolution for H-nontypeable/nonmotile phenotypes. In addition, molecular risk assessment was possible with key virulence marker identifications. Epidemiologically, each strain had a unique comparative genomic fingerprint that was extended to an additional 507 food and clinical isolates. Finally, a 99.7% phylogenetic concordance was established between microarray analysis and WGS using SNP-level data for advanced genome typing. Our study demonstrates FDA-ECID as a powerful tool for epidemiology and molecular risk assessment with the capacity to profile the global landscape and diversity of E. coli. IMPORTANCE This study describes a robust, state-of-the-art platform developed from available whole-genome sequences of E. coli and Shigella spp. by distilling useful signatures for epidemiology and molecular risk assessment into one assay. The FDA-ECID microarray contains features that enable comprehensive molecular serotyping and virulence profiling along with genome-scale genotyping and SNP analysis. Hence, it is a molecular toolbox that stratifies strain identification and pathogenic potential in the contexts of epidemiology and phylogeny. We applied this tool to strains from food, environmental, and clinical sources, resulting in significantly greater phylogenetic and strain-specific resolution than previously reported for available typing methods. PMID:27037122
Fuller, Zachary L; Niño, Elina L; Patch, Harland M; Bedoya-Reina, Oscar C; Baumgarten, Tracey; Muli, Elliud; Mumoki, Fiona; Ratan, Aakrosh; McGraw, John; Frazier, Maryann; Masiga, Daniel; Schuster, Stephen; Grozinger, Christina M; Miller, Webb
2015-07-10
With the development of inexpensive, high-throughput sequencing technologies, it has become feasible to examine questions related to population genetics and molecular evolution of non-model species in their ecological contexts on a genome-wide scale. Here, we employed a newly developed suite of integrated, web-based programs to examine population dynamics and signatures of selection across the genome using several well-established tests, including F ST, pN/pS, and McDonald-Kreitman. We applied these techniques to study populations of honey bees (Apis mellifera) in East Africa. In Kenya, there are several described A. mellifera subspecies, which are thought to be localized to distinct ecological regions. We performed whole genome sequencing of 11 worker honey bees from apiaries distributed throughout Kenya and identified 3.6 million putative single-nucleotide polymorphisms. The dense coverage allowed us to apply several computational procedures to study population structure and the evolutionary relationships among the populations, and to detect signs of adaptive evolution across the genome. While there is considerable gene flow among the sampled populations, there are clear distinctions between populations from the northern desert region and those from the temperate, savannah region. We identified several genes showing population genetic patterns consistent with positive selection within African bee populations, and between these populations and European A. mellifera or Asian Apis florea. These results lay the groundwork for future studies of adaptive ecological evolution in honey bees, and demonstrate the use of new, freely available web-based tools and workflows ( http://usegalaxy.org/r/kenyanbee ) that can be applied to any model system with genomic information.
Baynam, Gareth; Pachter, Nicholas; McKenzie, Fiona; Townshend, Sharon; Slee, Jennie; Kiraly-Borri, Cathy; Vasudevan, Anand; Hawkins, Anne; Broley, Stephanie; Schofield, Lyn; Verhoef, Hedwig; Walker, Caroline E; Molster, Caron; Blackwell, Jenefer M; Jamieson, Sarra; Tang, Dave; Lassmann, Timo; Mina, Kym; Beilby, John; Davis, Mark; Laing, Nigel; Murphy, Lesley; Weeramanthri, Tarun; Dawkins, Hugh; Goldblatt, Jack
2016-06-11
The Rare and Undiagnosed Diseases Diagnostic Service (RUDDS) refers to a genomic diagnostic platform operating within the Western Australian Government clinical services delivered through Genetic Services of Western Australia (GSWA). GSWA has provided a state-wide service for clinical genetic care for 28 years and it serves a population of 2.5 million people across a geographical area of 2.5milion Km(2). Within this context, GSWA has established a clinically integrated genomic diagnostic platform in partnership with other public health system managers and service providers, including but not limited to the Office of Population Health Genomics, Diagnostic Genomics (PathWest Laboratories) and with executive level support from the Department of Health. Herein we describe report presents the components of this service that are most relevant to the heterogeneity of paediatric clinical genetic care. Briefly the platform : i) offers multiple options including non-genetic testing; monogenic and genomic (targeted in silico filtered and whole exome) analysis; and matchmaking; ii) is delivered in a patient-centric manner that is resonant with the patient journey, it has multiple points for entry, exit and re-entry to allow people access to information they can use, when they want to receive it; iii) is synchronous with precision phenotyping methods; iv) captures new knowledge, including multiple expert review; v) is integrated with current translational genomic research activities and best practice; and vi) is designed for flexibility for interactive generation of, and integration with, clinical research for diagnostics, community engagement, policy and models of care. The RUDDS has been established as part of routine clinical genetic services and is thus sustainable, equitably managed and seeks to translate new knowledge into efficient diagnostics and improved health for the whole community.
Gujaria-Verma, Neha; Ramsay, Larissa; Sharpe, Andrew G; Sanderson, Lacey-Anne; Debouck, Daniel G; Tar'an, Bunyamin; Bett, Kirstin E
2016-03-15
Common bean (Phaseolus vulgaris) is an important grain legume and there has been a recent resurgence in interest in its relative, tepary bean (P. acutifolius), owing to this species' ability to better withstand abiotic stresses. Genomic resources are scarce for this minor crop species and a better knowledge of the genome-level relationship between these two species would facilitate improvement in both. High-throughput genotyping has facilitated large-scale single nucleotide polymorphism (SNP) identification leading to the development of molecular markers with associated sequence information that can be used to place them in the context of a full genome assembly. Transcript-based SNPs were identified from six common bean and two tepary bean accessions and a subset were used to generate a 768-SNP Illumina GoldenGate assay for each species. The tepary bean assay was used to assess diversity in wild and cultivated tepary bean and to generate the first gene-based map of the tepary bean genome. Genotypic analyses of the diversity panel showed a clear separation between domesticated and cultivated tepary beans, two distinct groups within the domesticated types, and P. parvifolius was confirmed to be distinct. The genetic map of tepary bean was compared to the common bean genome assembly to demonstrate high levels of collinearity between the two species with differences limited to a few intra-chromosomal rearrangements. The development of the first set of genomic resources specifically for tepary bean has allowed for greater insight into the structure of this species and its relationship to its agriculturally more prominent relative, common bean. These resources will be helpful in the development of efficient breeding strategies for both species and will facilitate the introgression of agriculturally important traits from one crop into the other.
TreeQ-VISTA: An Interactive Tree Visualization Tool withFunctional Annotation Query Capabilities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gu, Shengyin; Anderson, Iain; Kunin, Victor
2007-05-07
Summary: We describe a general multiplatform exploratorytool called TreeQ-Vista, designed for presenting functional annotationsin a phylogenetic context. Traits, such as phenotypic and genomicproperties, are interactively queried from a relational database with auser-friendly interface which provides a set of tools for users with orwithout SQL knowledge. The query results are projected onto aphylogenetic tree and can be displayed in multiple color groups. A richset of browsing, grouping and query tools are provided to facilitatetrait exploration, comparison and analysis.Availability: The program,detailed tutorial and examples are available online athttp://genome-test.lbl.gov/vista/TreeQVista.
2013-01-01
Background Cytokine-activated transcription factors from the STAT (Signal Transducers and Activators of Transcription) family control common and context-specific genetic programs. It is not clear to what extent cell-specific features determine the binding capacity of seven STAT members and to what degree they share genetic targets. Molecular insight into the biology of STATs was gained from a meta-analysis of 29 available ChIP-seq data sets covering genome-wide occupancy of STATs 1, 3, 4, 5A, 5B and 6 in several cell types. Results We determined that the genomic binding capacity of STATs is primarily defined by the cell type and to a lesser extent by individual family members. For example, the overlap of shared binding sites between STATs 3 and 5 in T cells is greater than that between STAT5 in T cells and non-T cells. Even for the top 1,000 highly enriched STAT binding sites, ~15% of STAT5 binding sites in mouse female liver are shared by other STATs in different cell types while in T cells ~90% of STAT5 binding sites are co-occupied by STAT3, STAT4 and STAT6. In addition, we identified 116 cis-regulatory modules (CRM), which are recognized by all STAT members across cell types defining a common JAK-STAT signature. Lastly, in liver STAT5 binding significantly coincides with binding of the cell-specific transcription factors HNF4A, FOXA1 and FOXA2 and is associated with cell-type specific gene transcription. Conclusions Our results suggest that genomic binding of STATs is primarily determined by the cell type and further specificity is achieved in part by juxtaposed binding of cell-specific transcription factors. PMID:23324445
Lebeko, Kamogelo; Manyisa, Noluthando; Chimusa, Emile R; Mulder, Nicola; Dandara, Collet; Wonkam, Ambroise
2017-02-01
Hearing impairment (HI) is one of the leading causes of disability in the world, impacting the social, economic, and psychological well-being of the affected individual. This is particularly true in sub-Saharan Africa, which carries one of the highest burdens of this condition. Despite this, there are limited data on the most prevalent genes or mutations that cause HI among sub-Saharan Africans. Next-generation technologies, such as targeted genomic enrichment and massively parallel sequencing, offer new promise in this context. This study reports, for the first time to the best of our knowledge, on the prevalence of novel mutations identified through a platform of 116 HI genes (OtoSCOPE ® ), among 82 African probands with HI. Only variants OTOF NM_194248.2:c.766-2A>G and MYO7A NM_000260.3:c.1996C>T, p.Arg666Stop were found in 3 (3.7%) and 5 (6.1%) patients, respectively. In addition and uniquely, the analysis of protein-protein interactions (PPI), through interrogation of gene subnetworks, using a custom script and two databases (Enrichr and PANTHER), and an algorithm in the igraph package of R, identified the enrichment of sensory perception and mechanical stimulus biological processes, and the most significant molecular functions of these variants pertained to binding or structural activity. Furthermore, 10 genes (MYO7A, MYO6, KCTD3, NUMA1, MYH9, KCNQ1, UBC, DIAPH1, PSMC2, and RDX) were identified as significant hubs within the subnetworks. Results reveal that the novel variants identified among familial cases of HI in Cameroon are not common, and PPI analysis has highlighted the role of 10 genes, potentially important in understanding HI genomics among Africans.
Feliziani, Sofía; Moyano, Alejandro J.; Di Rienzo, Julio A.; Krogh Johansen, Helle; Molin, Søren; Smania, Andrea M.
2014-01-01
The advent of high-throughput sequencing techniques has made it possible to follow the genomic evolution of pathogenic bacteria by comparing longitudinally collected bacteria sampled from human hosts. Such studies in the context of chronic airway infections by Pseudomonas aeruginosa in cystic fibrosis (CF) patients have indicated high bacterial population diversity. Such diversity may be driven by hypermutability resulting from DNA mismatch repair system (MRS) deficiency, a common trait evolved by P. aeruginosa strains in CF infections. No studies to date have utilized whole-genome sequencing to investigate within-host population diversity or long-term evolution of mutators in CF airways. We sequenced the genomes of 13 and 14 isolates of P. aeruginosa mutator populations from an Argentinian and a Danish CF patient, respectively. Our collection of isolates spanned 6 and 20 years of patient infection history, respectively. We sequenced 11 isolates from a single sample from each patient to allow in-depth analysis of population diversity. Each patient was infected by clonal populations of bacteria that were dominated by mutators. The in vivo mutation rate of the populations was ∼100 SNPs/year–∼40-fold higher than rates in normo-mutable populations. Comparison of the genomes of 11 isolates from the same sample showed extensive within-patient genomic diversification; the populations were composed of different sub-lineages that had coexisted for many years since the initial colonization of the patient. Analysis of the mutations identified genes that underwent convergent evolution across lineages and sub-lineages, suggesting that the genes were targeted by mutation to optimize pathogenic fitness. Parallel evolution was observed in reduction of overall catabolic capacity of the populations. These findings are useful for understanding the evolution of pathogen populations and identifying new targets for control of chronic infections. PMID:25330091
Lee, Je Hyuk; Daugharthy, Evan R.; Scheiman, Jonathan; Kalhor, Reza; Ferrante, Thomas C.; Terry, Richard; Turczyk, Brian M.; Yang, Joyce L.; Lee, Ho Suk; Aach, John; Zhang, Kun; Church, George M.
2014-01-01
RNA sequencing measures the quantitative change in gene expression over the whole transcriptome, but it lacks spatial context. On the other hand, in situ hybridization provides the location of gene expression, but only for a small number of genes. Here we detail a protocol for genome-wide profiling of gene expression in situ in fixed cells and tissues, in which RNA is converted into cross-linked cDNA amplicons and sequenced manually on a confocal microscope. Unlike traditional RNA-seq our method enriches for context-specific transcripts over house-keeping and/or structural RNA, and it preserves the tissue architecture for RNA localization studies. Our protocol is written for researchers experienced in cell microscopy with minimal computing skills. Library construction and sequencing can be completed within 14 d, with image analysis requiring an additional 2 d. PMID:25675209
Pediatric Issues in Return of Results and Incidental Findings: Weighing Autonomy and Best Interests.
Holm, Ingrid A
2017-03-01
Nowhere are the ethical issues in genomic research more complex than in pediatrics. Balancing the sometime conflicting autonomy of the parent and the child, and the best interest of the family and the child, brings up many challenging issues. Addressing this balance, especially in the context of the child's developing maturity and comprehension, requires deep analysis and discussion. Issues discussed include the impact of genetic information on the family, parental versus the child's autonomy, the best interests of the child versus the family, potential limitations on the parents' right to know or not know information about their child, and changing role of the developing child in return of research results. Finally, a dynamic model will be proposed that takes into consideration the child's evolving role in consenting and return of results that can be adapted in different national contexts.
A Knowledge Base for Teaching Biology Situated in the Context of Genetic Testing
ERIC Educational Resources Information Center
van der Zande, Paul; Waarlo, Arend Jan; Brekelmans, Mieke; Akkerman, Sanne F.; Vermunt, Jan D.
2011-01-01
Recent developments in the field of genomics will impact the daily practice of biology teachers who teach genetics in secondary education. This study reports on the first results of a research project aimed at enhancing biology teacher knowledge for teaching genetics in the context of genetic testing. The increasing body of scientific knowledge…
Genomicus 2018: karyotype evolutionary trees and on-the-fly synteny computing
Nguyen, Nga Thi Thuy; Vincens, Pierre
2018-01-01
Abstract Since 2010, the Genomicus web server is available online at http://genomicus.biologie.ens.fr/genomicus. This graphical browser provides access to comparative genomic analyses in four different phyla (Vertebrate, Plants, Fungi, and non vertebrate Metazoans). Users can analyse genomic information from extant species, as well as ancestral gene content and gene order for vertebrates and flowering plants, in an integrated evolutionary context. New analyses and visualization tools have recently been implemented in Genomicus Vertebrate. Karyotype structures from several genomes can now be compared along an evolutionary pathway (Multi-KaryotypeView), and synteny blocks can be computed and visualized between any two genomes (PhylDiagView). PMID:29087490
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas
The number of genomes from uncultivated microbes will soon surpass the number of isolate genomes in public databases (Hugenholtz, Skarshewski, & Parks, 2016). Technological advancements in high-throughput sequencing and assembly, including single-cell genomics and the computational extraction of genomes from metagenomes (GFMs), are largely responsible. Here we propose community standards for reporting the Minimum Information about a Single-Cell Genome (MIxS-SCG) and Minimum Information about Genomes extracted From Metagenomes (MIxS-GFM) specific for Bacteria and Archaea. The standards have been developed in the context of the International Genomics Standards Consortium (GSC) community (Field et al., 2014) and can be viewed as amore » supplement to other GSC checklists including the Minimum Information about a Genome Sequence (MIGS), Minimum information about a Metagenomic Sequence(s) (MIMS) (Field et al., 2008) and Minimum Information about a Marker Gene Sequence (MIMARKS) (P. Yilmaz et al., 2011). Community-wide acceptance of MIxS-SCG and MIxS-GFM for Bacteria and Archaea will enable broad comparative analyses of genomes from the majority of taxa that remain uncultivated, improving our understanding of microbial function, ecology, and evolution.« less
Butler, J B; Vaillancourt, R E; Potts, B M; Lee, D J; King, G J; Baten, A; Shepherd, M; Freeman, J S
2017-05-22
Previous studies suggest genome structure is largely conserved between Eucalyptus species. However, it is unknown if this conservation extends to more divergent eucalypt taxa. We performed comparative genomics between the eucalypt genera Eucalyptus and Corymbia. Our results will facilitate transfer of genomic information between these important taxa and provide further insights into the rate of structural change in tree genomes. We constructed three high density linkage maps for two Corymbia species (Corymbia citriodora subsp. variegata and Corymbia torelliana) which were used to compare genome structure between both species and Eucalyptus grandis. Genome structure was highly conserved between the Corymbia species. However, the comparison of Corymbia and E. grandis suggests large (from 1-13 MB) intra-chromosomal rearrangements have occurred on seven of the 11 chromosomes. Most rearrangements were supported through comparisons of the three independent Corymbia maps to the E. grandis genome sequence, and to other independently constructed Eucalyptus linkage maps. These are the first large scale chromosomal rearrangements discovered between eucalypts. Nonetheless, in the general context of plants, the genomic structure of the two genera was remarkably conserved; adding to a growing body of evidence that conservation of genome structure is common amongst woody angiosperms.
Integrated genome browser: visual analytics platform for genomics.
Freese, Nowlan H; Norris, David C; Loraine, Ann E
2016-07-15
Genome browsers that support fast navigation through vast datasets and provide interactive visual analytics functions can help scientists achieve deeper insight into biological systems. Toward this end, we developed Integrated Genome Browser (IGB), a highly configurable, interactive and fast open source desktop genome browser. Here we describe multiple updates to IGB, including all-new capabilities to display and interact with data from high-throughput sequencing experiments. To demonstrate, we describe example visualizations and analyses of datasets from RNA-Seq, ChIP-Seq and bisulfite sequencing experiments. Understanding results from genome-scale experiments requires viewing the data in the context of reference genome annotations and other related datasets. To facilitate this, we enhanced IGB's ability to consume data from diverse sources, including Galaxy, Distributed Annotation and IGB-specific Quickload servers. To support future visualization needs as new genome-scale assays enter wide use, we transformed the IGB codebase into a modular, extensible platform for developers to create and deploy all-new visualizations of genomic data. IGB is open source and is freely available from http://bioviz.org/igb aloraine@uncc.edu. © The Author 2016. Published by Oxford University Press.
Pao, Sheng-Ying; Lin, Win-Li; Hwang, Ming-Jing
2006-01-01
Background Screening for differentially expressed genes on the genomic scale and comparative analysis of the expression profiles of orthologous genes between species to study gene function and regulation are becoming increasingly feasible. Expressed sequence tags (ESTs) are an excellent source of data for such studies using bioinformatic approaches because of the rich libraries and tremendous amount of data now available in the public domain. However, any large-scale EST-based bioinformatics analysis must deal with the heterogeneous, and often ambiguous, tissue and organ terms used to describe EST libraries. Results To deal with the issue of tissue source, in this work, we carefully screened and organized more than 8 million human and mouse ESTs into 157 human and 108 mouse tissue/organ categories, to which we applied an established statistic test using different thresholds of the p value to identify genes differentially expressed in different tissues. Further analysis of the tissue distribution and level of expression of human and mouse orthologous genes showed that tissue-specific orthologs tended to have more similar expression patterns than those lacking significant tissue specificity. On the other hand, a number of orthologs were found to have significant disparity in their expression profiles, hinting at novel functions, divergent regulation, or new ortholog relationships. Conclusion Comprehensive statistics on the tissue-specific expression of human and mouse genes were obtained in this very large-scale, EST-based analysis. These statistical results have been organized into a database, freely accessible at our website , for easy searching of human and mouse tissue-specific genes and for investigating gene expression profiles in the context of comparative genomics. Comparative analysis showed that, although highly tissue-specific genes tend to exhibit similar expression profiles in human and mouse, there are significant exceptions, indicating that orthologous genes, while sharing basic genomic properties, could result in distinct phenotypes. PMID:16626500
Chemical biology on the genome.
Balasubramanian, Shankar
2014-08-15
In this article I discuss studies towards understanding the structure and function of DNA in the context of genomes from the perspective of a chemist. The first area I describe concerns the studies that led to the invention and subsequent development of a method for sequencing DNA on a genome scale at high speed and low cost, now known as Solexa/Illumina sequencing. The second theme will feature the four-stranded DNA structure known as a G-quadruplex with a focus on its fundamental properties, its presence in cellular genomic DNA and the prospects for targeting such a structure in cels with small molecules. The final topic for discussion is naturally occurring chemically modified DNA bases with an emphasis on chemistry for decoding (or sequencing) such modifications in genomic DNA. The genome is a fruitful topic to be further elucidated by the creation and application of chemical approaches. Copyright © 2014 Elsevier Ltd. All rights reserved.
Genome flux and stasis in a five millennium transect of European prehistory
Gamba, Cristina; Jones, Eppie R.; Teasdale, Matthew D.; McLaughlin, Russell L.; Gonzalez-Fortes, Gloria; Mattiangeli, Valeria; Domboróczki, László; Kővári, Ivett; Pap, Ildikó; Anders, Alexandra; Whittle, Alasdair; Dani, János; Raczky, Pál; Higham, Thomas F. G.; Hofreiter, Michael; Bradley, Daniel G; Pinhasi, Ron
2014-01-01
The Great Hungarian Plain was a crossroads of cultural transformations that have shaped European prehistory. Here we analyse a 5,000-year transect of human genomes, sampled from petrous bones giving consistently excellent endogenous DNA yields, from 13 Hungarian Neolithic, Copper, Bronze and Iron Age burials including two to high (~22 × ) and seven to ~1 × coverage, to investigate the impact of these on Europe’s genetic landscape. These data suggest genomic shifts with the advent of the Neolithic, Bronze and Iron Ages, with interleaved periods of genome stability. The earliest Neolithic context genome shows a European hunter-gatherer genetic signature and a restricted ancestral population size, suggesting direct contact between cultures after the arrival of the first farmers into Europe. The latest, Iron Age, sample reveals an eastern genomic influence concordant with introduced Steppe burial rites. We observe transition towards lighter pigmentation and surprisingly, no Neolithic presence of lactase persistence. PMID:25334030
Genome flux and stasis in a five millennium transect of European prehistory.
Gamba, Cristina; Jones, Eppie R; Teasdale, Matthew D; McLaughlin, Russell L; Gonzalez-Fortes, Gloria; Mattiangeli, Valeria; Domboróczki, László; Kővári, Ivett; Pap, Ildikó; Anders, Alexandra; Whittle, Alasdair; Dani, János; Raczky, Pál; Higham, Thomas F G; Hofreiter, Michael; Bradley, Daniel G; Pinhasi, Ron
2014-10-21
The Great Hungarian Plain was a crossroads of cultural transformations that have shaped European prehistory. Here we analyse a 5,000-year transect of human genomes, sampled from petrous bones giving consistently excellent endogenous DNA yields, from 13 Hungarian Neolithic, Copper, Bronze and Iron Age burials including two to high (~22 × ) and seven to ~1 × coverage, to investigate the impact of these on Europe's genetic landscape. These data suggest genomic shifts with the advent of the Neolithic, Bronze and Iron Ages, with interleaved periods of genome stability. The earliest Neolithic context genome shows a European hunter-gatherer genetic signature and a restricted ancestral population size, suggesting direct contact between cultures after the arrival of the first farmers into Europe. The latest, Iron Age, sample reveals an eastern genomic influence concordant with introduced Steppe burial rites. We observe transition towards lighter pigmentation and surprisingly, no Neolithic presence of lactase persistence.
CircosVCF: circos visualization of whole-genome sequence variations stored in VCF files.
Drori, E; Levy, D; Smirin-Yosef, P; Rahimi, O; Salmon-Divon, M
2017-05-01
Visualization of whole-genomic variations in a meaningful manner assists researchers in gaining new insights into the underlying data, especially when it comes in the context of whole genome comparisons. CircosVCF is a web based visualization tool for genome-wide variant data described in VCF files, using circos plots. The user friendly interface of CircosVCF supports an interactive design of the circles in the plot, and the integration of additional information such as experimental data or annotations. The provided visualization capabilities give a broad overview of the genomic relationships between genomes, and allow identification of specific meaningful SNPs regions. CircosVCF was implemented in JavaScript and is available at http://www.ariel.ac.il/research/fbl/software. malisa@ariel.ac.il. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Retterer, Kyle; Scuffins, Julie; Schmidt, Daniel; Lewis, Rachel; Pineda-Alvarez, Daniel; Stafford, Amanda; Schmidt, Lindsay; Warren, Stephanie; Gibellini, Federica; Kondakova, Anastasia; Blair, Amanda; Bale, Sherri; Matyakhina, Ludmila; Meck, Jeanne; Aradhya, Swaroop; Haverfield, Eden
2015-08-01
Detection of copy-number variation (CNV) is important for investigating many genetic disorders. Testing a large clinical cohort by array comparative genomic hybridization provides a deep perspective on the spectrum of pathogenic CNV. In this context, we describe a bioinformatics approach to extract CNV information from whole-exome sequencing and demonstrate its utility in clinical testing. Exon-focused arrays and whole-genome chromosomal microarray analysis were used to test 14,228 and 14,000 individuals, respectively. Based on these results, we developed an algorithm to detect deletions/duplications in whole-exome sequencing data and a novel whole-exome array. In the exon array cohort, we observed a positive detection rate of 2.4% (25 duplications, 318 deletions), of which 39% involved one or two exons. Chromosomal microarray analysis identified 3,345 CNVs affecting single genes (18%). We demonstrate that our whole-exome sequencing algorithm resolves CNVs of three or more exons. These results demonstrate the clinical utility of single-exon resolution in CNV assays. Our whole-exome sequencing algorithm approaches this resolution but is complemented by a whole-exome array to unambiguously identify intragenic CNVs and single-exon changes. These data illustrate the next advancements in CNV analysis through whole-exome sequencing and whole-exome array.Genet Med 17 8, 623-629.
BioSurfDB: knowledge and algorithms to support biosurfactants and biodegradation studies
Oliveira, Jorge S.; Araújo, Wydemberg; Lopes Sales, Ana Isabela; de Brito Guerra, Alaine; da Silva Araújo, Sinara Carla; de Vasconcelos, Ana Tereza Ribeiro; Agnez-Lima, Lucymara F.; Freitas, Ana Teresa
2015-01-01
Crude oil extraction, transportation and use provoke the contamination of countless ecosystems. Therefore, bioremediation through surfactants mobilization or biodegradation is an important subject, both economically and environmentally. Bioremediation research had a great boost with the recent advances in Metagenomics, as it enabled the sequencing of uncultured microorganisms providing new insights on surfactant-producing and/or oil-degrading bacteria. Many research studies are making available genomic data from unknown organisms obtained from metagenomics analysis of oil-contaminated environmental samples. These new datasets are presently demanding the development of new tools and data repositories tailored for the biological analysis in a context of bioremediation data analysis. This work presents BioSurfDB, www.biosurfdb.org, a curated relational information system integrating data from: (i) metagenomes; (ii) organisms; (iii) biodegradation relevant genes; proteins and their metabolic pathways; (iv) bioremediation experiments results, with specific pollutants treatment efficiencies by surfactant producing organisms; and (v) a biosurfactant-curated list, grouped by producing organism, surfactant name, class and reference. The main goal of this repository is to gather information on the characterization of biological compounds and mechanisms involved in biosurfactant production and/or biodegradation and make it available in a curated way and associated with a number of computational tools to support studies of genomic and metagenomic data. Database URL: www.biosurfdb.org PMID:25833955
Focusing on function to mine cancer genome data | Center for Cancer Research
CCR scientists have devised a strategy to sift through the tens of thousands of mutations in cancer genome data to find mutations that actually drive the disease. They have used the method to discover that the JNK signaling pathway, which in different contexts can either spur cancerous growth or rein it in, acts as a tumor suppressor in gastric cancers.
Merlet, Benjamin; Paulhe, Nils; Vinson, Florence; Frainay, Clément; Chazalviel, Maxime; Poupin, Nathalie; Gloaguen, Yoann; Giacomoni, Franck; Jourdan, Fabien
2016-01-01
This article describes a generic programmatic method for mapping chemical compound libraries on organism-specific metabolic networks from various databases (KEGG, BioCyc) and flat file formats (SBML and Matlab files). We show how this pipeline was successfully applied to decipher the coverage of chemical libraries set up by two metabolomics facilities MetaboHub (French National infrastructure for metabolomics and fluxomics) and Glasgow Polyomics (GP) on the metabolic networks available in the MetExplore web server. The present generic protocol is designed to formalize and reduce the volume of information transfer between the library and the network database. Matching of metabolites between libraries and metabolic networks is based on InChIs or InChIKeys and therefore requires that these identifiers are specified in both libraries and networks. In addition to providing covering statistics, this pipeline also allows the visualization of mapping results in the context of metabolic networks. In order to achieve this goal, we tackled issues on programmatic interaction between two servers, improvement of metabolite annotation in metabolic networks and automatic loading of a mapping in genome scale metabolic network analysis tool MetExplore. It is important to note that this mapping can also be performed on a single or a selection of organisms of interest and is thus not limited to large facilities.
Nakamura, Yukio; de Paiva Alves, Eduardo; Veenstra, Gert Jan C; Hoppler, Stefan
2016-06-01
Key signalling pathways, such as canonical Wnt/β-catenin signalling, operate repeatedly to regulate tissue- and stage-specific transcriptional responses during development. Although recruitment of nuclear β-catenin to target genomic loci serves as the hallmark of canonical Wnt signalling, mechanisms controlling stage- or tissue-specific transcriptional responses remain elusive. Here, a direct comparison of genome-wide occupancy of β-catenin with a stage-matched Wnt-regulated transcriptome reveals that only a subset of β-catenin-bound genomic loci are transcriptionally regulated by Wnt signalling. We demonstrate that Wnt signalling regulates β-catenin binding to Wnt target genes not only when they are transcriptionally regulated, but also in contexts in which their transcription remains unaffected. The transcriptional response to Wnt signalling depends on additional mechanisms, such as BMP or FGF signalling for the particular genes we investigated, which do not influence β-catenin recruitment. Our findings suggest a more general paradigm for Wnt-regulated transcriptional mechanisms, which is relevant for tissue-specific functions of Wnt/β-catenin signalling in embryonic development but also for stem cell-mediated homeostasis and cancer. Chromatin association of β-catenin, even to functional Wnt-response elements, can no longer be considered a proxy for identifying transcriptionally Wnt-regulated genes. Context-dependent mechanisms are crucial for transcriptional activation of Wnt/β-catenin target genes subsequent to β-catenin recruitment. Our conclusions therefore also imply that Wnt-regulated β-catenin binding in one context can mark Wnt-regulated transcriptional target genes for different contexts. © 2016. Published by The Company of Biologists Ltd.
acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data
Lux, Markus; Kruger, Jan; Rinke, Christian; ...
2016-12-20
A major obstacle in single-cell sequencing is sample contamination with foreign DNA. To guarantee clean genome assemblies and to prevent the introduction of contamination into public databases, considerable quality control efforts are put into post-sequencing analysis. Contamination screening generally relies on reference-based methods such as database alignment or marker gene search, which limits the set of detectable contaminants to organisms with closely related reference species. As genomic coverage in the tree of life is highly fragmented, there is an urgent need for a reference-free methodology for contaminant identification in sequence data. We present acdc, a tool specifically developed to aidmore » the quality control process of genomic sequence data. By combining supervised and unsupervised methods, it reliably detects both known and de novo contaminants. First, 16S rRNA gene prediction and the inclusion of ultrafast exact alignment techniques allow sequence classification using existing knowledge from databases. Second, reference-free inspection is enabled by the use of state-of-the-art machine learning techniques that include fast, non-linear dimensionality reduction of oligonucleotide signatures and subsequent clustering algorithms that automatically estimate the number of clusters. The latter also enables the removal of any contaminant, yielding a clean sample. Furthermore, given the data complexity and the ill-posedness of clustering, acdc employs bootstrapping techniques to provide statistically profound confidence values. Tested on a large number of samples from diverse sequencing projects, our software is able to quickly and accurately identify contamination. Results are displayed in an interactive user interface. Acdc can be run from the web as well as a dedicated command line application, which allows easy integration into large sequencing project analysis workflows. Acdc can reliably detect contamination in single-cell genome data. In addition to database-driven detection, it complements existing tools by its unsupervised techniques, which allow for the detection of de novo contaminants. Our contribution has the potential to drastically reduce the amount of resources put into these processes, particularly in the context of limited availability of reference species. As single-cell genome data continues to grow rapidly, acdc adds to the toolkit of crucial quality assurance tools.« less
acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lux, Markus; Kruger, Jan; Rinke, Christian
A major obstacle in single-cell sequencing is sample contamination with foreign DNA. To guarantee clean genome assemblies and to prevent the introduction of contamination into public databases, considerable quality control efforts are put into post-sequencing analysis. Contamination screening generally relies on reference-based methods such as database alignment or marker gene search, which limits the set of detectable contaminants to organisms with closely related reference species. As genomic coverage in the tree of life is highly fragmented, there is an urgent need for a reference-free methodology for contaminant identification in sequence data. We present acdc, a tool specifically developed to aidmore » the quality control process of genomic sequence data. By combining supervised and unsupervised methods, it reliably detects both known and de novo contaminants. First, 16S rRNA gene prediction and the inclusion of ultrafast exact alignment techniques allow sequence classification using existing knowledge from databases. Second, reference-free inspection is enabled by the use of state-of-the-art machine learning techniques that include fast, non-linear dimensionality reduction of oligonucleotide signatures and subsequent clustering algorithms that automatically estimate the number of clusters. The latter also enables the removal of any contaminant, yielding a clean sample. Furthermore, given the data complexity and the ill-posedness of clustering, acdc employs bootstrapping techniques to provide statistically profound confidence values. Tested on a large number of samples from diverse sequencing projects, our software is able to quickly and accurately identify contamination. Results are displayed in an interactive user interface. Acdc can be run from the web as well as a dedicated command line application, which allows easy integration into large sequencing project analysis workflows. Acdc can reliably detect contamination in single-cell genome data. In addition to database-driven detection, it complements existing tools by its unsupervised techniques, which allow for the detection of de novo contaminants. Our contribution has the potential to drastically reduce the amount of resources put into these processes, particularly in the context of limited availability of reference species. As single-cell genome data continues to grow rapidly, acdc adds to the toolkit of crucial quality assurance tools.« less
Detection of Bacillus anthracis DNA in Complex Soil and Air Samples Using Next-Generation Sequencing
Be, Nicholas A.; Thissen, James B.; Gardner, Shea N.; McLoughlin, Kevin S.; Fofanov, Viacheslav Y.; Koshinsky, Heather; Ellingson, Sally R.; Brettin, Thomas S.; Jackson, Paul J.; Jaing, Crystal J.
2013-01-01
Bacillus anthracis is the potentially lethal etiologic agent of anthrax disease, and is a significant concern in the realm of biodefense. One of the cornerstones of an effective biodefense strategy is the ability to detect infectious agents with a high degree of sensitivity and specificity in the context of a complex sample background. The nature of the B. anthracis genome, however, renders specific detection difficult, due to close homology with B. cereus and B. thuringiensis. We therefore elected to determine the efficacy of next-generation sequencing analysis and microarrays for detection of B. anthracis in an environmental background. We applied next-generation sequencing to titrated genome copy numbers of B. anthracis in the presence of background nucleic acid extracted from aerosol and soil samples. We found next-generation sequencing to be capable of detecting as few as 10 genomic equivalents of B. anthracis DNA per nanogram of background nucleic acid. Detection was accomplished by mapping reads to either a defined subset of reference genomes or to the full GenBank database. Moreover, sequence data obtained from B. anthracis could be reliably distinguished from sequence data mapping to either B. cereus or B. thuringiensis. We also demonstrated the efficacy of a microbial census microarray in detecting B. anthracis in the same samples, representing a cost-effective and high-throughput approach, complementary to next-generation sequencing. Our results, in combination with the capacity of sequencing for providing insights into the genomic characteristics of complex and novel organisms, suggest that these platforms should be considered important components of a biosurveillance strategy. PMID:24039948
Binary Interval Search: a scalable algorithm for counting interval intersections.
Layer, Ryan M; Skadron, Kevin; Robins, Gabriel; Hall, Ira M; Quinlan, Aaron R
2013-01-01
The comparison of diverse genomic datasets is fundamental to understand genome biology. Researchers must explore many large datasets of genome intervals (e.g. genes, sequence alignments) to place their experimental results in a broader context and to make new discoveries. Relationships between genomic datasets are typically measured by identifying intervals that intersect, that is, they overlap and thus share a common genome interval. Given the continued advances in DNA sequencing technologies, efficient methods for measuring statistically significant relationships between many sets of genomic features are crucial for future discovery. We introduce the Binary Interval Search (BITS) algorithm, a novel and scalable approach to interval set intersection. We demonstrate that BITS outperforms existing methods at counting interval intersections. Moreover, we show that BITS is intrinsically suited to parallel computing architectures, such as graphics processing units by illustrating its utility for efficient Monte Carlo simulations measuring the significance of relationships between sets of genomic intervals. https://github.com/arq5x/bits.
Characterization of noncoding regulatory DNA in the human genome.
Elkon, Ran; Agami, Reuven
2017-08-08
Genetic variants associated with common diseases are usually located in noncoding parts of the human genome. Delineation of the full repertoire of functional noncoding elements, together with efficient methods for probing their biological roles, is therefore of crucial importance. Over the past decade, DNA accessibility and various epigenetic modifications have been associated with regulatory functions. Mapping these features across the genome has enabled researchers to begin to document the full complement of putative regulatory elements. High-throughput reporter assays to probe the functions of regulatory regions have also been developed but these methods separate putative regulatory elements from the chromosome so that any effects of chromatin context and long-range regulatory interactions are lost. Definitive assignment of function(s) to putative cis-regulatory elements requires perturbation of these elements. Genome-editing technologies are now transforming our ability to perturb regulatory elements across entire genomes. Interpretation of high-throughput genetic screens that incorporate genome editors might enable the construction of an unbiased map of functional noncoding elements in the human genome.
UCSC genome browser: deep support for molecular biomedical research.
Mangan, Mary E; Williams, Jennifer M; Lathe, Scott M; Karolchik, Donna; Lathe, Warren C
2008-01-01
The volume and complexity of genomic sequence data, and the additional experimental data required for annotation of the genomic context, pose a major challenge for display and access for biomedical researchers. Genome browsers organize this data and make it available in various ways to extract useful information to advance research projects. The UCSC Genome Browser is one of these resources. The official sequence data for a given species forms the framework to display many other types of data such as expression, variation, cross-species comparisons, and more. Visual representations of the data are available for exploration. Data can be queried with sequences. Complex database queries are also easily achieved with the Table Browser interface. Associated tools permit additional query types or access to additional data sources such as images of in situ localizations. Support for solving researcher's issues is provided with active discussion mailing lists and by providing updated training materials. The UCSC Genome Browser provides a source of deep support for a wide range of biomedical molecular research (http://genome.ucsc.edu).
MassTRIX: mass translator into pathways.
Suhre, Karsten; Schmitt-Kopplin, Philippe
2008-07-01
Recent technical advances in mass spectrometry (MS) have brought the field of metabolomics to a point where large numbers of metabolites from numerous prokaryotic and eukaryotic organisms can now be easily and precisely detected. The challenge today lies in the correct annotation of these metabolites on the basis of their accurate measured masses. Assignment of bulk chemical formula is generally possible, but without consideration of the biological and genomic context, concrete metabolite annotations remain difficult and uncertain. MassTRIX responds to this challenge by providing a hypothesis-driven approach to high precision MS data annotation. It presents the identified chemical compounds in their genomic context as differentially colored objects on KEGG pathway maps. Information on gene transcription or differences in the gene complement (e.g. samples from different bacterial strains) can be easily added. The user can thus interpret the metabolic state of the organism in the context of its potential and, in the case of submitted transcriptomics data, real enzymatic capacities. The MassTRIX web server is freely accessible at http://masstrix.org.
Rapid Identification of Sequences for Orphan Enzymes to Power Accurate Protein Annotation
Ojha, Sunil; Watson, Douglas S.; Bomar, Martha G.; Galande, Amit K.; Shearer, Alexander G.
2013-01-01
The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the “back catalog” of enzymology – “orphan enzymes,” those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme “back catalog” is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology’s “back catalog” another powerful tool to drive accurate genome annotation. PMID:24386392
Rapid identification of sequences for orphan enzymes to power accurate protein annotation.
Ramkissoon, Kevin R; Miller, Jennifer K; Ojha, Sunil; Watson, Douglas S; Bomar, Martha G; Galande, Amit K; Shearer, Alexander G
2013-01-01
The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the "back catalog" of enzymology--"orphan enzymes," those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme "back catalog" is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology's "back catalog" another powerful tool to drive accurate genome annotation.
Muthamilarasan, Mehanathan; Khandelwal, Rohit; Yadav, Chandra Bhan; Bonthala, Venkata Suresh; Khan, Yusuf; Prasad, Manoj
2014-01-01
MYB proteins represent one of the largest transcription factor families in plants, playing important roles in diverse developmental and stress-responsive processes. Considering its significance, several genome-wide analyses have been conducted in almost all land plants except foxtail millet. Foxtail millet (Setaria italica L.) is a model crop for investigating systems biology of millets and bioenergy grasses. Further, the crop is also known for its potential abiotic stress-tolerance. In this context, a comprehensive genome-wide survey was conducted and 209 MYB protein-encoding genes were identified in foxtail millet. All 209 S. italica MYB (SiMYB) genes were physically mapped onto nine chromosomes of foxtail millet. Gene duplication study showed that segmental- and tandem-duplication have occurred in genome resulting in expansion of this gene family. The protein domain investigation classified SiMYB proteins into three classes according to number of MYB repeats present. The phylogenetic analysis categorized SiMYBs into ten groups (I-X). SiMYB-based comparative mapping revealed a maximum orthology between foxtail millet and sorghum, followed by maize, rice and Brachypodium. Heat map analysis showed tissue-specific expression pattern of predominant SiMYB genes. Expression profiling of candidate MYB genes against abiotic stresses and hormone treatments using qRT-PCR revealed specific and/or overlapping expression patterns of SiMYBs. Taken together, the present study provides a foundation for evolutionary and functional characterization of MYB TFs in foxtail millet to dissect their functions in response to environmental stimuli.
Updated regulation curation model at the Saccharomyces Genome Database
Engel, Stacia R; Skrzypek, Marek S; Hellerstedt, Sage T; Wong, Edith D; Nash, Robert S; Weng, Shuai; Binkley, Gail; Sheppard, Travis K; Karra, Kalpana; Cherry, J Michael
2018-01-01
Abstract The Saccharomyces Genome Database (SGD) provides comprehensive, integrated biological information for the budding yeast Saccharomyces cerevisiae, along with search and analysis tools to explore these data, enabling the discovery of functional relationships between sequence and gene products in fungi and higher organisms. We have recently expanded our data model for regulation curation to address regulation at the protein level in addition to transcription, and are presenting the expanded data on the ‘Regulation’ pages at SGD. These pages include a summary describing the context under which the regulator acts, manually curated and high-throughput annotations showing the regulatory relationships for that gene and a graphical visualization of its regulatory network and connected networks. For genes whose products regulate other genes or proteins, the Regulation page includes Gene Ontology enrichment analysis of the biological processes in which those targets participate. For DNA-binding transcription factors, we also provide other information relevant to their regulatory function, such as DNA binding site motifs and protein domains. As with other data types at SGD, all regulatory relationships and accompanying data are available through YeastMine, SGD’s data warehouse based on InterMine. Database URL: http://www.yeastgenome.org PMID:29688362
Chang, Yue; Feng, LiFang; Miao, Wei
2011-07-01
Dichlorodiphenyltrichloroethane (DDT), tributyltin (TBT), and 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) are persistent in the environment and cause continuous toxic effects in humans and aquatic life. Tetrahymena thermophila has the potential for use as a model for research regarding toxicants. In this study, this organism was used to analyze a genome-wide microarray generated from cells exposed to DDT, TBT and TCDD. To accomplish this, genes differentially expressed when treated with each toxicant were identified, after which their functions were categorized using GO enrichment analysis. The results suggested that the responses of T. thermophila were similar to those of multicellular organisms. Additionally, the context likelihood of relatedness method (CLR) was applied to construct a TCDD-relevant network. The T-shaped network obtained could be functionally divided into two subnetworks. The general functions of both subnetworks were related to the epigenetic mechanism of TCDD. Based on analysis of the networks, a model of the TCDD effect on T. thermophila was inferred. Thus, Tetrahymena has the potential to be a good unicellular eukaryotic model for toxic mechanism research at the genome level.
Biodiversity of genes encoding anti-microbial traits within plant associated microbes
Mousa, Walaa K.; Raizada, Manish N.
2015-01-01
The plant is an attractive versatile home for diverse associated microbes. A subset of these microbes produces a diversity of anti-microbial natural products including polyketides, non-ribosomal peptides, terpenoids, heterocylic nitrogenous compounds, volatile compounds, bacteriocins, and lytic enzymes. In recent years, detailed molecular analysis has led to a better understanding of the underlying genetic mechanisms. New genomic and bioinformatic tools have permitted comparisons of orthologous genes between species, leading to predictions of the associated evolutionary mechanisms responsible for diversification at the genetic and corresponding biochemical levels. The purpose of this review is to describe the biodiversity of biosynthetic genes of plant-associated bacteria and fungi that encode selected examples of antimicrobial natural products. For each compound, the target pathogen and biochemical mode of action are described, in order to draw attention to the complexity of these phenomena. We review recent information of the underlying molecular diversity and draw lessons through comparative genomic analysis of the orthologous coding sequences (CDS). We conclude by discussing emerging themes and gaps, discuss the metabolic pathways in the context of the phylogeny and ecology of their microbial hosts, and discuss potential evolutionary mechanisms that led to the diversification of biosynthetic gene clusters. PMID:25914708
Genomic Biomarkers for Breast Cancer Risk
Walsh, Michael F.; Nathanson, Katherine L.; Couch, Fergus J.
2016-01-01
Clinical risk assessment for cancer predisposition includes a three-generation pedigree and physical examination to identify inherited syndromes. Additionally genetic and genomic biomarkers may identify individuals with a constitutional basis for their disease that may not be evident clinically. Genomic biomarker testing may detect molecular variations in single genes, panels of genes, or entire genomes. The strength of evidence for the association of a genomic biomarker with disease risk may be weak or strong. The factors contributing to clinical validity and utility of genomic biomarkers include functional laboratory analyses and genetic epidemiologic evidence. Genomic biomarkers may be further classified as low, moderate or highly penetrant based on the likelihood of disease. Genomic biomarkers for breast cancer are comprised of rare highly penetrant mutations of genes such as BRCA1 or BRCA2, moderately penetrant mutations of genes such as CHEK2, as well as more common genomic variants, including single nucleotide polymorphisms, associated with modest effect sizes. When applied in the context of appropriate counseling and interpretation, identification of genomic biomarkers of inherited risk for breast cancer may decrease morbidity and mortality, allow for definitive prevention through assisted reproduction, and serve as a guide to targeted therapy. PMID:26987529
Comparative analysis and visualization of multiple collinear genomes
2012-01-01
Background Genome browsers are a common tool used by biologists to visualize genomic features including genes, polymorphisms, and many others. However, existing genome browsers and visualization tools are not well-suited to perform meaningful comparative analysis among a large number of genomes. With the increasing quantity and availability of genomic data, there is an increased burden to provide useful visualization and analysis tools for comparison of multiple collinear genomes such as the large panels of model organisms which are the basis for much of the current genetic research. Results We have developed a novel web-based tool for visualizing and analyzing multiple collinear genomes. Our tool illustrates genome-sequence similarity through a mosaic of intervals representing local phylogeny, subspecific origin, and haplotype identity. Comparative analysis is facilitated through reordering and clustering of tracks, which can vary throughout the genome. In addition, we provide local phylogenetic trees as an alternate visualization to assess local variations. Conclusions Unlike previous genome browsers and viewers, ours allows for simultaneous and comparative analysis. Our browser provides intuitive selection and interactive navigation about features of interest. Dynamic visualizations adjust to scale and data content making analysis at variable resolutions and of multiple data sets more informative. We demonstrate our genome browser for an extensive set of genomic data sets composed of almost 200 distinct mouse laboratory strains. PMID:22536897
Surviving an Identity Crisis: A Revised View of Chromatin Insulators in the Genomics Era
Matzat, Leah H.; Lei, Elissa P.
2013-01-01
The control of complex, developmentally regulated loci and partitioning of the genome into active and silent domains is in part accomplished through the activity of DNA-protein complexes termed chromatin insulators. Together, the multiple, well-studied classes of insulators in Drosophila melanogaster appear to be generally functionally conserved. In this review, we discuss recent genomic-scale experiments and attempt to reconcile these newer findings in the context of previously defined insulator characteristics based on classical genetic analyses and transgenic approaches. Finally, we discuss the emerging understanding of mechanisms of chromatin insulator regulation. PMID:24189492
2011-01-01
Background High-throughput SNP genotyping has become an essential requirement for molecular breeding and population genomics studies in plant species. Large scale SNP developments have been reported for several mainstream crops. A growing interest now exists to expand the speed and resolution of genetic analysis to outbred species with highly heterozygous genomes. When nucleotide diversity is high, a refined diagnosis of the target SNP sequence context is needed to convert queried SNPs into high-quality genotypes using the Golden Gate Genotyping Technology (GGGT). This issue becomes exacerbated when attempting to transfer SNPs across species, a scarcely explored topic in plants, and likely to become significant for population genomics and inter specific breeding applications in less domesticated and less funded plant genera. Results We have successfully developed the first set of 768 SNPs assayed by the GGGT for the highly heterozygous genome of Eucalyptus from a mixed Sanger/454 database with 1,164,695 ESTs and the preliminary 4.5X draft genome sequence for E. grandis. A systematic assessment of in silico SNP filtering requirements showed that stringent constraints on the SNP surrounding sequences have a significant impact on SNP genotyping performance and polymorphism. SNP assay success was high for the 288 SNPs selected with more rigorous in silico constraints; 93% of them provided high quality genotype calls and 71% of them were polymorphic in a diverse panel of 96 individuals of five different species. SNP reliability was high across nine Eucalyptus species belonging to three sections within subgenus Symphomyrtus and still satisfactory across species of two additional subgenera, although polymorphism declined as phylogenetic distance increased. Conclusions This study indicates that the GGGT performs well both within and across species of Eucalyptus notwithstanding its nucleotide diversity ≥2%. The development of a much larger array of informative SNPs across multiple Eucalyptus species is feasible, although strongly dependent on having a representative and sufficiently deep collection of sequences from many individuals of each target species. A higher density SNP platform will be instrumental to undertake genome-wide phylogenetic and population genomics studies and to implement molecular breeding by Genomic Selection in Eucalyptus. PMID:21492434
10KP: A phylodiverse genome sequencing plan.
Cheng, Shifeng; Melkonian, Michael; Smith, Stephen A; Brockington, Samuel; Archibald, John M; Delaux, Pierre-Marc; Li, Fay-Wei; Melkonian, Barbara; Mavrodiev, Evgeny V; Sun, Wenjing; Fu, Yuan; Yang, Huanming; Soltis, Douglas E; Graham, Sean W; Soltis, Pamela S; Liu, Xin; Xu, Xun; Wong, Gane Ka-Shu
2018-03-01
Understanding plant evolution and diversity in a phylogenomic context is an enormous challenge due, in part, to limited availability of genome-scale data across phylodiverse species. The 10KP (10,000 Plants) Genome Sequencing Project will sequence and characterize representative genomes from every major clade of embryophytes, green algae, and protists (excluding fungi) within the next 5 years. By implementing and continuously improving leading-edge sequencing technologies and bioinformatics tools, 10KP will catalogue the genome content of plant and protist diversity and make these data freely available as an enduring foundation for future scientific discoveries and applications. 10KP is structured as an international consortium, open to the global community, including botanical gardens, plant research institutes, universities, and private industry. Our immediate goal is to establish a policy framework for this endeavor, the principles of which are outlined here.